Basic knowledge, matching special characters in Python Meaning of \: Mark the next character as a special character, or a literal character, or a backreference, or an octal escape character. For example, 'n' matches the character "n". '\n' matches a newline. The sequence '\\' matches "\" and "\(" matches "(". \s matches any whitespace character, including spaces, tabs, form feeds, etc. Equivalent to [ \f\n\r\t\v]. escape symbol Regular expressions also support escapes for most Python strings: \a, \b, \f, \n, \r, \t, \u, \U, \v, \x, \\ Note 1: \b is usually used to match a word boundary, it means "backspace" only in character classes Note 2: \u and \U are only recognized in Unicode mode Note 3: The octal escape (\digit) is limited if the first digit is 0, or if there are 3 octal digits, then it is considered an octal number; otherwise it is considered a subgroup reference; as for strings, octal escapes are always limited to a maximum of 3 digits in length Question 1: Why do I get '\x07' when re.match("\a","\a") matches? '\x07' is the corresponding ASCII encoding. The default encoding format of python is utf-8. If sometimes the encoding in utf-8 also overlaps with ASCII, this will happen. Here is the code on the terminal where ipython3 enters: re.match("a","a") Out[6]: <_sre.SRE_Match object; span=(0, 1), match='a'> In [7]: re.match("\a","\a")#It stands to reason that "\a" matches "\a", it should be "\a", why is it '\x07'? Out[7]: <_sre.SRE_Match object; span=(0, 1), match='\x07'> In [8]: re.match(r"\a","\a") Out[8]: <_sre.SRE_Match object; span=(0, 1), match='\x07'> In [183]: len("abc")#The length of ordinary characters is 3 Out[183]: 3 In [9]: rs = r"\a" In [10]: len("\x07") #The length is 1, not the length 3, which proves that it is not an ordinary character Out[1]: 1 The following proves that A can match \x41, is \x41 the corresponding A in ascii? In [103]: re.match(r"\x41","A") Out[103]: <_sre.SRE_Match object; span=(0, 1), match='A'> In [104]: re.match("\x41","A") Out[104]: <_sre.SRE_Match object; span=(0, 1), match='A'> The following proof \x07 is \a In [106]: re.match("\x07","\a") Out[106]: <_sre.SRE_Match object; span=(0, 1), match='\x07'> In [107]: re.match(r"\x07","\a") Out[107]: <_sre.SRE_Match object; span=(0, 1), match='\x07'> Question 2. Why does re.match(r"\\w","\w") match '\\w'? Reason: When a Python regular expression matches the special character \w, it is converted to \\w and then matched The following two fail to match In [134]: re.match("\w","\w")#The match failed, why? In [135]: re.match(r"\w","\w")#The match failed, why? In [142]: re.match("\\w","\w")#match failed, why? In [136]: re.match(r"\\w","\w") Out[136]: <_sre.SRE_Match object; span=(0, 2), match='\\w'> Description '\\w' How come there is an extra '\' in the matching result '\\w'? The reason is that different tools display different content. Change a tool, pycharm, to have a look. import re result = re.match(r"\\w","\w").group() print(result)#output\w It is found that what is printed is \w, that is to say, the terminal \\w is to display \w, escape and add \, and the output in pycharm is already escaped. That is to say, \\w wants to output \w, and the unique addition of \ becomes \\w Prove the idea: s = "\\w" print(s)# It is found that \w is printed regardless of the terminal and pycharm running results, which proves the idea. Next prove In [136]: re.match(r"\\w","\w") <_sre.SRE_Match object; span=(0, 2), match='\\w'> should be r"\\w" is equivalent to "\\\\w"; So the question is, how can "\\\\w" match "\w"? My guess, the "\w" of the string is estimated not to be the "\w" of the string, but "\\w" Is my guess correct? n [143]: re.match("\\\\w","\\w") Out[143]: <_sre.SRE_Match object; span=(0, 2), match='\\w'> It is found that this is all the case, that is to say, \w in the string is equivalent to \\w, and is matched according to \\w during the matching process. According to this principle, the following situation can be explained Here is the code on the terminal where ipython3 enters: In [147]: re.match(r"\\d","\d") Out[147]: <_sre.SRE_Match object; span=(0, 2), match='\\d'> In [152]: re.match(r"\\s","\s") Out[152]: <_sre.SRE_Match object; span=(0, 2), match='\\s'> n [153]: re.match(r"\\W","\W") Out[153]: <_sre.SRE_Match object; span=(0, 2), match='\\W'> In [154]: re.match(r"\\S","\S") Out[154]: <_sre.SRE_Match object; span=(0, 2), match='\\S'> In [155]: re.match(r"\\D","\D") Out[155]: <_sre.SRE_Match object; span=(0, 2), match='\\D'> In [177]: re.match(r"\\B","\B") Out[177]: <_sre.SRE_Match object; span=(0, 2), match='\\B'> Out[35]re.match(r"\\j","\j") Out[36]: <_sre.SRE_Match object; span=(0, 2), match='\\j'> Question 3, it means that \b in the boundary, when it is matched as a string, is the reason why \b itself and re.match(r"\\b","\b") did not match successfully In [166]: re.match(r"\\w","\w")#Successful match The reason has been confirmed above Out[166]: <_sre.SRE_Match object; span=(0, 2), match='\\w'> In [167]: re.match(r"\\b","\b")#Why doesn't this match? In [169]: re.match(r"\b","\b")#match failed It should be that "\b" is not equivalent to "\\b", it is itself "\b", and because r"\b" is equivalent to "\\b" The regular expression is: "\\b", the string to be matched is: "\b", they are all ordinary characters, they are not matched, think about it first, and then confirm it. To match the normal word "\b", the regular expression just needs to write: "\b". Prove my idea as follows: In [170]: re.match("\b","\b") Out[170]: <_sre.SRE_Match object; span=(0, 1), match='\x08'> In [172]: re.match("\x08","\b") Out[172]: <_sre.SRE_Match object; span=(0, 1), match='\x08'> In [174]: re.match(r"\b","\b")#The match is unsuccessful In summary, the \b in the boundary represents the \b itself when it is matched as a string to the regular Therefore, the reason why re.match(r"\\b","\b") does not match is also the above principle. According to the principle just now, it can also be explained In [179]: re.match(r"\\$","\$") failed to match In [179]: re.match(r"\\^","\^") failed to match can also explain In [55]: re.match("\n","\n") Out[55]: <_sre.SRE_Match object; span=(0, 1), match='\n'> In [56]: re.match("\na","\na") Out[56]: <_sre.SRE_Match object; span=(0, 2), match='\na'> In [59]: re.match("\\\\nabc","\\nabc") Out[59]: <_sre.SRE_Match object; span=(0, 5), match='\\nabc'> In [60]: re.match(r"\\nabc","\\nabc") Out[60]: <_sre.SRE_Match object; span=(0, 5), match='\\nabc'> r"\\nabc" is equivalent to '\\\\nabc' references: https://www.zhihu.com/question/23374078 http://blog.csdn.net/l347129991/article/details/70257704 http://www.cnblogs.com/jingleguo/archive/2008/06/02/1211820.html http://www.360doc.com/content/13/0125/13/3046928_262317374.shtml