Xss filtering
In the process of filling in the form, we use textarea, a rich text edit box, which requires the user to enter relevant content. If some people want to be funny, write some js code in it or modify the source code when editing and editing, then after submitting it, the page will be displayed incorrectly. At this time, we should filter when submitting to the database. Filter out or escape code such as js.
There is a module in python: beautifulsoup4
use:
content=''' <h1>Xss filtering</h1> <p><span style="width:100px;" name="haha">In the process of filling out the form, we use textarea, rich text edit box, </span></p> <p> <strong>There is a module in python:</strong> <script>alert(111)</scripy> </p> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(content,"html.parser") tag = soup.find( " script " ) #Find the tag and content you are looking for tag.hidden=True #Hide the tag tag.clear() #Clear the content in the tag span = soup.find( ' span ' ) del span.attrs[ ' style ' ] #Get the style tag in the span content = soup.decode() print (content)
use:
- First import the module from bs4 import BeautifulSoup
- Create an object, pass in the content to be filtered and the parser, there is a built-in parser html.parser in python
- Find the corresponding label content through soup.find
- If you want to delete the tag use: tag.hidden=True
- If you want to clear the tag: tag.clear
- If you want to find the attributes in the tag: span.attrs , get a dictionary, such as: {'style': 'width:100px;', 'name': 'haha'}
- If you want to delete an attribute in the tag: del span.attrs['attribute name']
- Convert object to string form: soup.decode()
The above is to find the special tag, and then delete the special tag, then can we set up a whitelist, only allow the tags in the list to pass, and filter the others?
content=''' <h1>Xss filtering</h1> <p><span style="width:100px;" name="haha">In the process of filling out the form, we use textarea, rich text edit box, </span></p> <p> <strong>There is a module in python:</strong> <script>alert(111)</scripy> </p> ''' tags = ['p','strong'] from bs4 import BeautifulSoup soup = BeautifulSoup(content, " html.parser " ) for tag in soup. find_all() : # tag.name is only the name of the tag if tag.name in tags: continue else : tag.hidden = True tag.clear() content = soup.decode() print(content)
In this way, the result we get is that there are no p tags and strong tags.
If you want to also delete the class selector in the p tag, and the id selector in strong:
1 content= ''' 2 <h1>Xss filter</h1> 3 <p class="c1" id="i1"><span style="width:100px;" name="haha">Filled in the form In the process, we use textarea, rich text editing box, </span></p> 4 <p> 5 <strong class="c2" id="i2">There is a module in python:</strong> 6 <script>alert(111)</scripy> 7 </p> 8 ''' 9 # tags = ['p','strong'] 10 tags = { 11 ' p ' :{ ' class ' }, 12 'strong':{'id'} 13 } 14 from bs4 import BeautifulSoup 15 soup = BeautifulSoup(content, " html.parser " ) 16 for tag in soup.find_all(): # tag.name is just the name of the tag 17 if tag.name in tags: 18 input_attrs = tag.attrs #All attributes of tags submitted by users {'class':'c1','id':'i1'} 19 valid_attrs = tags[tag.name] # class 20 for k in list(input_attrs.keys() ) :# Turn the dictionary into a list type and delete it, otherwise it will report error 21 if you delete it directly in the generator if k in valid_attrs: 22 continue 23 else: 24 del tag.attrs[k] 25 else: 26 tag.hidden = True 27 tag.clear() 28 content = soup.decode() 29 print(content)
In this way, the id selector and class selector are deleted, and simultaneous filtering at the tag level and attribute level is realized.
After that, if we want to use it, we can encapsulate this into a class and use that one.
singleton pattern
Always use an object instance:
Let's first look at a common singleton pattern:
class Foo(object): instance=None def __init__(self): pass @classmethod def get_instance(cls): if Foo.instance: return Foo.instance else: Foo.instance=Foo() return Foo.instance def process(self): return '123' obj1 = Foo() obj2 = Foo() print(id(obj1),id(obj2)) obj1 = Foo.get_instance() obj2 = Foo.get_instance() print(id(obj1),id(obj2))
The second method, implemented with __new__():
class Foo(object): instance=None def __init__(self): pass def __new__(cls, *args, **kwargs): if Foo.instance: return Foo.instance else: Foo.instance = object.__new__(cls,*args,**kwargs) return Foo.instance obj1 = Foo() obj2 = Foo() print(obj1,obj2)