[python road 38] Python regular expression matching backslash "\"

1. Introduction

 

After learning about Python special characters and raw strings, I think the answer should be something like this:

1) Normal string: '\\'
2) Raw string: r'\'
But in fact when extracting numbers such as "3\8" before the backslash, I hit a wall many times and never got a result. In the end, I found that I had misunderstood it. It turned out that the original string has nothing to do with "regular escape"; I will talk about it in detail below.

 

 

2. String escaping

 

The backslash is special in Python, that is, it can be used to form some special characters, such as "\n" for newline, "\t" for tab. Here's a line of code that uses "\n":

[python]  view plain copy  
 
  1. print 'Hello\World\nPython'  

The result is:
"Hello\World
Python"

It can be seen that "\n" has been escaped as a newline character, while "\W" has not been escaped. The reason is that "\W" does not correspond to special characters in "string escape" and has no special meaning. .

 

What if the requirement has changed now, and instead of escaping "\n" as a newline, it is required to output it as "Hello\World\nPython" as it is?

 

1) You can write "Hello\World\\nPython" like this, so that when outputting, "string escape" will escape "\\" to "\";

2) Another method can also be used: raw strings; raw strings (ie r'...'): All characters in the string are used literally, without escaping special characters.

Here is the code using raw strings:

[python]  view plain copy  
 
  1. print r'Hello\World\nPython'  

The result is:
"Hello\World\nPython"
It can be clearly seen that after using the original string, "\n" is not escaped as a newline, but is directly output.

 

 

3. Regular escaping

 

Well, the above is just "string escaping". In the same way, there are also escapes in regular expressions. Let's call it "regular escape", which is completely different from "string escape", such as "\d" for numbers and "\s" for blanks symbol. Let's write the first example first, and then analyze it.

Extract the number before the "3\8" backslash:

[python]  view plain copy  
 
  1. #!/usr/bin/env python  
  2. # coding=utf-8  
  3.   
  4. import re  
  5.   
  6. string = '3\8'  
  7. m = re.search('(\d+)\\\\', string)  
  8.   
  9. if m is not None:  
  10.     print m.group( 1)   # The result is: 3  
  11.   
  12. n = re.search(r'(\d+)\\', string)  
  13.   
  14. if n is not None:  
  15.     print n.group( 1)   # The result is: 3  


The regular expression string needs to be escaped twice. These two times are the above "string escape" and "regular escape". I personally think that "string escape" must precede "regular escape".

1) The process of '\\\\':
first perform "string escape", the first two backslashes and the last two backslashes are respectively escaped into one backslash; that is, "\\|\ \" is converted to "\|\" ("|" is automatically ignored for the sake of clarity). After "string escape", "regular escape" is performed immediately, and "\\" is escaped to "\", indicating that the regular expression needs to match a backslash.

2) The process of r'\\':
Since all characters in the original string are used literally, no special characters are escaped, so no "string escape" is performed, and the second step "regular escape" is directly entered , in the regular escape "\\" is escaped to "\", indicating that the regular expression needs to match a backslash.

 

 

4. Conclusion

 

That is to say, the original string (ie r'...') has nothing to do with "regular escape", the original string only works in "string escape", so that the string is exempt from one escape.


Maybe some buddies will ask why the "\d+" in "\d+\\\\" doesn't have any problems even if the original string is not used. That's because "\d" does not correspond to special characters when doing "string escape", so it is successfully left to "regular escape", where it represents numbers.

 

 

The reference is from the second edition of "Python Core Programming". If there is any inappropriate place, please include and point out, thank you.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324773204&siteId=291194637