Notes on the use of regular expressions for "*", ".", "?" in python

1. About the usage of r

import re
...
re.findall(r'这里填写需要查找的字符',text)

Where r means raw, plus "r" means don't escape any characters in the string, keep it as it is.
Why add this? You can try the difference between print ('\bhi') and print (r'\bhi').

 >>>print("\bhi")
hi
>>>print(r"\bhi")
\bhi

As you can see, if you don't add r, there will be no \b. Because python's string encounters "\" and escapes the characters after it. If you want to type "\" in a string, you must type "\".

>>> print "\\bhi"
\bhi

2. Generic String

"." means any character except a newline in a regular expression

try:
    import re
    text = 'Hi, I am lily, I am his wife.'
    m = re.findall(r'i.', text)
    if m:
        print(m)
    else:
        print('Not match!')
except:
    print('Can not run!\nDone!')

shown as:

['i,', 'il', 'is', 'if']

Likewise, one symbol is "\S", which represents any character other than whitespace . Note the capital S.
In many searches, "?" is used to represent any character, and "*" is used to represent any number of consecutive characters. This is called a wildcard. But in regular expressions, any character is represented by ".", and "" is not a character, but a quantity: it means that the preceding character can be repeated any number of times (including 0), as long as such conditions are met , will be matched by the expression.

Because "" matches as long as possible. If you want him to match the shortest and stop, you need to use ".?". Such as "I.*?e", will get the second result. This way of matching is called lazy matching, and the way that is originally as long as possible is called greedy matching.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325810821&siteId=291194637