and regular re module

 

 

A. Regular

Regular screening is used in a string of specific content

Regular expressions relationship with the re module:

  1. A regular expression is an independent technology, can be used in any language

  2.python Chinese want to use regular expressions by invoking the need to re module

Regular application scenarios:

  1. reptiles

  2. Data analysis

Regular characters:

Metacharacters Matched content
. Any character except newline
\n Newline
\w Letters or numbers or an underscore
\s Any whitespace characters
\d digital
\W Not letters or numbers or an underscore
\S Non-whitespace characters
\D Non-numeric
\t Tabs
^ Beginning of the string
$ End of the string
\b End of a word
a|b Character a or b
() Matching expression in parentheses, also represents a group
[...] Matches the character set of characters
[^...] All characters match the characters in the

 

 

 

 

 

 

 

 

quantifier Explanation
* Repeated zero or more times
+ Repeated one or more times
? Repeat zero or one time
{n} N times                                                                                  
{n,} Repeat n or more times
{n,m} Repeated n times to m

Character set of concepts

It may appear in the same location of the various characters of a group of characters in the regular expression with [] said, you can only match a character within a character set, character [] or within a relationship

Example:

  • Match 0-9: [0123456789] or [1-9] 
  • AZ matching letters: ditto
  • Matching az letters: ditto

  ps: in the range from small to large groups of characters must be sorted in ASCII code table

  • ^: What begins with

    •   ^ [1-9]: whether matching characters beginning with the numbers 1-9 wherein
  • $: To what end
    •   [Az] $: whether the character matches the letters az which end
  • ^ ... $: precise target fixed-length character match
    •   ^ Waller $: match waller
    •  

  •  |: Or

    •   ab | abc priority match | foregoing, if the contents match the front | no longer matches the contents of the back ( at the time of writing the long write in | front )

      

  •  [^...] : 除括号内的内容其他都匹配
    •  

       

量词 只能和元字符配合使用,并且是贪婪匹配

 

  • + :  匹配1次或多次

    •   匹配 13555555555 ,用 \d 每次只能匹配单个数字,要想把数字一次都匹配上要用 \d+ 
    • 匹配到 1 条结果
  •  * : 匹配零次或多次
    •   匹配到 2 条结果

       

  •  {n} : 指明重复个数

 题:

  • 轨道 通道 地道 魔道 人道 
  • 逐个匹配出后面的道:
    •  

       

       
  • 取出词:
    • [^\s]{2} : 去掉空格,每次取两个字符   
    • .道 : 取出 .和道组成的词
    • [^\s]. :  [^\s]本身占了一个位置 加 . 共取两个位置
    •  
       

 

  •  匹配人名: 海燕海娇海东
    •   
      海. 取 海和.组成的词 海燕海娇海东   匹配所有"海."的字符
      ^海. 只取以 海和.为开头 海燕 只从开头匹配"海."
        海.$  只取以 海和.为结尾 海东 只匹配结尾的"海.$"

 

 

正则 待匹配字符 匹配
结果
说明
李.? 李杰和李莲英和李二棍子

李杰
李莲
李二

 
?表示重复零次或一次,即只匹配"李"后面一个任意字符
 
李.* 李杰和李莲英和李二棍子 李杰和李莲英和李二棍子
*表示重复零次或多次,即匹配"李"后面0或多个任意字符
李.+ 李杰和李莲英和李二棍子 李杰和李莲英和李二棍子
+表示重复一次或多次,即只匹配"李"后面1个或多个任意字符
李.{1,2} 李杰和李莲英和李二棍子

李杰和
李莲英
李二棍

{1,2}匹配1到2次任意字符

Guess you like

Origin www.cnblogs.com/waller/p/11203007.html