Common matching and combination of Python regular expressions

Regular syntax: Use meta-characters to arrange and combine to match strings. Online test expressions can directly click the link below to test online regular expressions. OSCHINA.NET online tool, ostools provides online tools for developers and designers, and provides jsbin online CSS , JS debugging, online Java API documentation, online PHP API documentation, online Node.js API documentation, Less CSS compiler, MarkDown compiler and other online tools https://tool.oschina.net/regex

1. Installation of regular expressions

Open the local terminal and import the following code

pip install re

2. Regular expression object

        2.1 re.RegexObject

                re.compile() returns a RegexObject object.

        2.2re.MatchObject

                group() returns the strings matched by the RE.

  • start() returns the position where the match started

  • end() returns the position where the match ends

  • span() returns a tuple containing the position of the match (start, end)

3. Regular expression modifiers - optional flags

A regular expression can contain some optional flag modifiers to control the pattern matched. Modifiers are specified as an optional flag. Multiple flags can be specified by bitwise OR(|) them. If re.I | re.M is set to the I and M flags:

Modifier describe
re.I Make matching case insensitive
re.L Do locale-aware matching
re.M multiline match, affects ^ and $
re.S make . match all characters including newlines
re.U Parse characters according to the Unicode character set. This flag affects \w, \W, \b, \B.
re.X This flag allows you to write regular expressions that are easier to understand by giving you more flexible formatting.

4. Regular Expression Metacharacters

model describe
^ Match the beginning of the string (with what)
$ Matches the end of a string. (ends with what)
. Matches any character except newline.
[...] Used to represent a group of characters, listed separately: [amk] matches 'a', 'm' or 'k'
[^...] Characters not in []: [^abc] matches characters other than a, b, c.
* Matches 0 or more expressions.
+ Matches 1 or more expressions.
? Match 0 or 1 fragment defined by the preceding regular expression, non-greedy
{ n} Matches n occurrences of the previous expression. For example, "o{2}" would not match the "o" in "Bob", but would match both o's in "food".
{ n,} Matches exactly n preceding expressions. For example, "o{2,}" will not match the "o" in "Bob", but will match all o's in "foooood". "o{1,}" is equivalent to "o+". "o{0,}" is equivalent to "o*".
{n,m} Match n to m times the segment defined by the preceding regular expression, greedily
a| b match a or b
() matches an expression enclosed in parentheses, also denoting a group
(?>) Independent pattern for matching, omitting backtracking.
\w match alphanumeric underscore
\W matches non-numeric letters with underscores
\s Matches any whitespace character, equivalent to [\t\n\r\f].
\S matches any non-null character
\d Match any number, equivalent to [0-9].
\D matches any non-digit
\A match string starts
\Z Match the end of the string, if there is a newline, only match the end string before the newline.
\z match end of string
\G Match where the last match was done.
\b Matches a word boundary, that is, the position between a word and a space. For example, 'er\b' would match 'er' in "never", but not 'er' in "verb".
\B 匹配非单词边界。'er\B' 能匹配 "verb" 中的 'er',但不能匹配 "never" 中的 'er'。
\n, \t, 等。 匹配一个换行符。匹配一个制表符, 等
\1...\9 匹配第n个分组的内容。
\10 匹配第n个分组的内容,如果它经匹配。否则指的是八进制字符码的表达式。

正则表达式:常用元字符

.    //匹配除换行符以外的任意字符
\w   //匹配字母或数字
\s   //匹配任意的空白字符 
\d   //匹配数字 
\n   //匹配一个换行符
\t   //匹配一个制表符

//用于校验
^    //匹配字符串的开始
$    //匹配字符串的结尾

\W   //匹配非字母或数字或下划线
\D   //匹配非数字
\S   //匹配非空白符
a|b  //匹配字符a或字符b
()   //匹配括号内的表达式,也表示一个组 
[...]//匹配字符组中的字符
[^...]// 匹配除了字符组中的字符的所有字符
a-zA-Z0-9 //匹配所有的数字和字母

量词:控制前面的元字符出现的次数

*    //重复0次或多次
+    //重复一次或更多次
?    //重复0次或一次
{n}  //重复n次
{n,} //重复n次或更多次
{n,m}//重复n到m次

 *贪婪匹配和惰性匹配

.*    //贪婪匹配  (.*默认往多的去找)
.*?   //惰性匹配  (?让*尽可能少的匹配结果)

【了解贪心匹配和惰性匹配】

 

惰性匹配是指尽可能少的去匹配

贪心匹配是指尽可能多的去匹配


 

简单案例一:' . '的应用

         

几个点就表示匹配几个字符 

 


简单案例二:输出所有的数字

 

如果使用\w的话,输出的是包含数字字母和字符串的

如果使用\d的话,输出的则是10个单数字,并不是我们想要的结果

 

所以可以使用这个元字符来匹配

  

简单案例三:校验:要求只能输入11位的电话号码时

如果使用11个\d来确定11位电话号码的话,当前面和后面有字母时则也能通过

 

所以需要使用到^ 这个元字符,如果后面也存在字母,则需要使用$这个元字符

  

简单案例四:熟悉[...]

观察可以知道,只匹配[xxxx]中的值

匹配字符串中所有的数字和字母[a-zA-Z0-9],该中括号里的'-'已经不是减的意思了,而是谁到谁的意思,表示a到z,A到Z,0到9。[a-zA-Z0-9_]==\w

成功匹配到了所有到数字和字母 

5.组合字符

.*? 只匹配符合条件的最少字符,尽可能少的匹配
.* 任意一个字符 出现0次或多次 尽可能多的匹配
[^0-9] 取反,匹配数字之外
[0-9] 匹配任何数字。类似于[0123456789]
[a-z] 匹配任何小写字母
[A-Z] 匹配任何大写字母
[ab]cde 匹配acde 或者 bcde
abc[de] 匹配abcd 或 abce
[abcdef] 匹配中括号内的任意字符

 

Combination case 1 : (\d*): * means repeating 0 or more times, so "I" in the title is not a number, return an empty result, and continue. Until a number, output a string of numbers.

(\d+): + is repeated once or more times, so \d must be at least once in \d+

In the case, "I" is not a number, skip it, and jump to 10010, because + is repeated once or more times, so after reaching 1, look back to see if it is a number, continue if it is a number, and return to the previous long if it is not string.

 

Guess you like

Origin blog.csdn.net/m0_48936146/article/details/124451777