About Regular Expressions in Python

1. About the usage of r

import re
...
re.findall(r'这里填写需要查找的字符',text)

Where r means raw, plus "r" means don't escape any characters in the string, keep it as it is.
Why add this? You can try the difference between print "\bhi" and r"\bhi".

 >>>print("\bhi")
hi
>>>print(r"\bhi")
\bhi

As you can see, if you don't add r, there will be no \b. Because python's string encounters "\" and escapes the characters after it. If you want to type "\" in a string, you must type "\".

>>> print "\\bhi"
\bhi

2. Generic String

"." means any character except a newline in a regular expression

try:
    import re
    text = 'Hi, I am lily, I am his wife.'
    m = re.findall(r'i.', text)
    if m:
        print(m)
    else:
        print('Not match!')
except:
    print('Can not run!\nDone!')

shown as:

['i,', 'il', 'is', 'if']

Likewise, a symbol is "\S", which represents any character other than whitespace. Note the capital S. In many searches, "?" is used to represent any character, and " " to represent any number of consecutive characters, which are called wildcards. But in regular expressions, any character is represented by ".", and " " is not a character, but a quantity: it means that the preceding character can be repeated any number of times (including 0), as long as such conditions are met , will be matched by the expression.

Because " " will match as long as possible when matching. If you want him to match the shortest and stop, you need to use ". ?". Such as "I.*?e", will get the second result. This way of matching is called lazy matching, and the way that is originally as long as possible is called greedy matching.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325810834&siteId=291194637
Recommended