python basis - three minutes to get the interviewer frequently asked [regular expression]


Regular expressions (Regular Expression) is a string that may represent a regular period information, Python comes with a regular expression module, this module can search through, extracted, replaced after regular information.
Program development, let the computer find the text you want from a large section of text, you need to use regular expressions to achieve. Using regular expressions. The following step. 1. 2. Find the law of regular notation 3. Law extract information.

First, the basic symbols of regular expressions

1. No point. ""

点号:可以代表任何除了换行符(\n)以外 的字符
	       一个点代表一个字符
	      点号就事宜个占位符(和%s有异曲同工之处)

Click to view video

2. An asterisk "*"

星号更有意思了,星号可以代表前面的一个子表达式0次到∞次
举个例子:xixihahahhahhaha -> xixi.*  #xixi后面有12 个点 代表字符
				  xixiaaaaaaaaaaaa-> xixi.*  # 代表前面的a
星号和点号都可以标识前面的字符,星号可以标识中间任意多个除了换行符意外的任意字符。

3. The question mark "?"

问号标识它前面的子表达式0次或者1次的。
注意:这里问号是英文问号。

After backslash, backslash, and behind it a character form a whole, so "\ n" should be seen as a character, instead of two characters.

4. backslash "\"

反斜杠在正则表达式里面不能单独使用,甚至在整个python里都不能单独使用。反斜杠需要和其他的字符配合使用来把特殊字符号变成普通符号。
常见的字符转义
Escape character significance
\n Newline
\t Tabs
\ Ordinary backslash
apostrophe
" Double quotes
\d digital

5. Digital "\ d"

Regular expressions which use "\ d" to represent a single digit, "\ d" Although the backslash and the letter d composed, but should "\ d" as wide open expression
to extract a numerical example:
↑↑ click on the text to see ↑↑

6. parentheses "()"

Extracting a portion of content from a text on the need to use the parentheses.
Example of use parentheses:
↑↑ ↑↑ click on the text to see

Second, the use of regular expressions in python Chiang Kai-shek

python已经自带了一个功能强大的正则表达式模块,使用这个模式可以非常方便的通过正则表达式来从一段文字中提取有规律的信息。
python中正则表达式的模块是 re 在python中先导入模块,然后使用这个模块
import  re

1.dfindall

findall (pattern, string, flags = 0)
to a string returns a list of non-repeating pattern of the match, the scan string from left to right, matching the order to return found. If there exists a pattern into a plurality of groups, it returns a list of combinations; is a list of tuples (if there is more than one combination of styles to it). Empty match will also be included in the result.
Changes in version 3.7: Now there is non-empty match after a previous empty can match.

re.finditer (pattern, string, the flags = 0)
pattern of all non-repetitive string match, the iterator returns an iterator to save the matching objects. string scan from left to right, matching sequential. Empty matches are also included in the result.

2.search

re.search (pattern, string, flags = 0)
scans the entire string to find the position of a first matching pattern and returns a corresponding matching object. If there is no match, it returns None; pay attention to this and to find a zero-length match is different

3. "." And the difference. "*?" In

"*.": Greedy, meet the conditions for obtaining the longest string
. " ?": Non-greedy mode: Get to meet the minimum conditions of string
Notes:
?, + ?, ??
'
', '+' , and modifiers are greedy '?'; they are as many matches in the string. Sometimes this behavior is not required. If the regular expression <.
> Hope to find ' B', it will match the entire string, not just ' '. After modifier will add style to non-greedy? 方式或者 :dfn:Minimally match; as few characters will be matched. Use regular expression <. *?> Will only match ' '.

Third, the regular expression extraction techniques

1. do not need complie

re.compile简介:
re.compile(pattern, flags=0)
将正则表达式的样式编译为一个 正则表达式对象 (正则对象),可以用于匹配,通过这个对象的方法 	match(), search() 以及其他如下描述。

这个表达式的行为可以通过指定 标记 的值来改变。值可以是以下任意变量,可以通过位的OR操作来	结合( | 操作符)。

序列

prog = re.compile(pattern)
result = prog.match(string)
等价于

result = re.match(pattern, string)
如果需要多次使用这个正则表达式的话,使用 re.compile() 和保存这个正则对象以便复用,可以让程	序更加高效。
注解 :通过 re.compile() 编译后的样式,和模块级的函数会被缓存, 所以少数的正则表达式使用无需考虑编译的问题。

When using re.compile (), the internal calling program is _complime () method, was used re.finall when inside the module automatically first call _compile () This method, then call findall () method . re.findall () comes re.compile () function, no need to use re.compile () method.

2. The first big catch caught in small

Mixed together valid and invalid content content, we need to solve this problem you need to use to catch another big catch small tips, useful information first came in the whole match, but the match a valid user inside people. First the large and small ideas will then grab reptiles throughout the development process.

3. The inner brackets and outer brackets

I brackets there are other common characters, then these ordinary characters will appear in the results obtained inside. For example, if "between the left and right" generally refers to this part of the trunk, if "includes left and right between the left and right," then refers to the whole person children turn ordinary character in parentheses inside, the results indicate the need to include them.

Python explain attached video link:

Detailed Python Regular Expressions

发布了65 篇原创文章 · 获赞 351 · 访问量 2万+

Guess you like

Origin blog.csdn.net/weixin_42767604/article/details/105237853