re.S usage
The role of re.S:
When re.S is not used, it will only match within each line . If there is no line, change the next line and start again. After using the re.S parameter, the regular expression will treat this string as a whole, in the whole Matching is generally used frequently in crawler projects.
Example:
import re
a = """This is
a*webspider*item!
maoyanmovierank"""
b = re.findall('a(.*?)item',a)
c = re.findall('a(.*?)item',a,re.S)
print (b)
print(c)
Output result:
b:[]
c:['webspider']
Here re.S . "" On behalf of the regular expression ( '.? Item a (* )') matches , including line breaks , including all characters (not including the newline itself:. \ N \ r)
The following are some modifiers of the re module:
Regular expressions can contain some optional flag modifiers to control the matching pattern. The modifier is specified as an optional flag. Multiple flags can be specified by bitwise OR (|) them. For example,
re.I | re.M is set to I and M flags
- re.I ignore case
- re.L means the special character set \w, \W, \b, \B, \s, \S depends on the current environment
- re.M multi-line mode
- re.S is. and any character including the newline character (. does not include the newline character)
- re.U means the special character set \w, \W, \b, \B, \d, \D, \s, \S depends on the Unicode character attribute database
- In order to increase readability, re.X ignores spaces and comments after #