A. Regular
Regular screening is used in a string of specific content
Regular expressions relationship with the re module:
1. A regular expression is an independent technology, can be used in any language
2.python Chinese want to use regular expressions by invoking the need to re module
Regular application scenarios:
1. reptiles
2. Data analysis
Regular characters:
Metacharacters | Matched content |
. | Any character except newline |
\n | Newline |
\w | Letters or numbers or an underscore |
\s | Any whitespace characters |
\d | digital |
\W | Not letters or numbers or an underscore |
\S | Non-whitespace characters |
\D | Non-numeric |
\t | Tabs |
^ | Beginning of the string |
$ | End of the string |
\b | End of a word |
a|b | Character a or b |
() | Matching expression in parentheses, also represents a group |
[...] | Matches the character set of characters |
[^...] | All characters match the characters in the |
quantifier | Explanation |
* | Repeated zero or more times |
+ | Repeated one or more times |
? | Repeat zero or one time |
{n} | N times |
{n,} | Repeat n or more times |
{n,m} | Repeated n times to m |
Character set of concepts
It may appear in the same location of the various characters of a group of characters in the regular expression with [] said, you can only match a character within a character set, character [] or within a relationship
Example:
- Match 0-9: [0123456789] or [1-9]
- AZ matching letters: ditto
- Matching az letters: ditto
ps: in the range from small to large groups of characters must be sorted in ASCII code table
-
^: What begins with
- ^ [1-9]: whether matching characters beginning with the numbers 1-9 wherein
- $: To what end
- [Az] $: whether the character matches the letters az which end
- ^ ... $: precise target fixed-length character match
- ^ Waller $: match waller
-
-
|: Or
- ab | abc priority match | foregoing, if the contents match the front | no longer matches the contents of the back ( at the time of writing the long write in | front )
- [^...] : 除括号内的内容其他都匹配
-
量词 只能和元字符配合使用,并且是贪婪匹配
-
+ : 匹配1次或多次
- 匹配 13555555555 ,用 \d 每次只能匹配单个数字,要想把数字一次都匹配上要用 \d+
- 匹配到 1 条结果
- * : 匹配零次或多次
- 匹配到 2 条结果
- 匹配到 2 条结果
- {n} : 指明重复个数
题:
- 轨道 通道 地道 魔道 人道
- 逐个匹配出后面的道:
-
- 取出词:
- [^\s]{2} : 去掉空格,每次取两个字符
- .道 : 取出 .和道组成的词
- [^\s]. : [^\s]本身占了一个位置 加 . 共取两个位置
-
- 匹配人名: 海燕海娇海东
-
海. 取 海和.组成的词 海燕海娇海东 匹配所有"海."的字符 ^海. 只取以 海和.为开头 海燕 只从开头匹配"海." 海.$ 只取以 海和.为结尾 海东 只匹配结尾的"海.$"
-
正则 | 待匹配字符 | 匹配 结果 |
说明 |
李.? | 李杰和李莲英和李二棍子 | 李杰 |
?表示重复零次或一次,即只匹配"李"后面一个任意字符 |
李.* | 李杰和李莲英和李二棍子 | 李杰和李莲英和李二棍子 | *表示重复零次或多次,即匹配"李"后面0或多个任意字符 |
李.+ | 李杰和李莲英和李二棍子 | 李杰和李莲英和李二棍子 | +表示重复一次或多次,即只匹配"李"后面1个或多个任意字符 |
李.{1,2} | 李杰和李莲英和李二棍子 | 李杰和 |
{1,2}匹配1到2次任意字符 |