Python reptile of regular expressions and re module

What is a regular expression:

Popular understand: according to certain rules, matching the desired data from a string. This rule is a regular expression.
Standard answer: https://baike.baidu.com/item/ regular expression / 1700215 fr = aladdin?

A piece:

The world is divided into two kinds of people, one is to understand regular expressions, one is do not understand regular expressions.

Regular expressions are used to match rules:

Matches a string:

text = 'hello'
ret = re.match('he',text)
print(ret.group())
>> he

More than they can helloin the match he.

(.) Matches any character points:

text = "ab"
ret = re.match('.',text)
print(ret.group())
>> a

But the point (.) Can not match a newline. Sample code is as follows:

text = "ab"
ret = re.match('.',text)
print(ret.group())
>> AttributeError: 'NoneType' object has no attribute 'group'

\ D matches any of the numbers:

text = "123"
ret = re.match('\d',text)
print(ret.group())
>> 1

\ D matches any non-numeric:

text = "a"
ret = re.match('\D',text)
print(ret.group())
>> a

And if the text is equal to a number, then the match is not successful. Sample code is as follows:

text = "1"
ret = re.match('\D',text)
print(ret.group())
>> AttributeError: 'NoneType' object has no attribute 'group'

\ S matches whitespace (including: \ n, \ t, \ r and spaces):

text = "\t"
ret = re.match('\s',text)
print(ret.group())
>> 空白

\ w match is a-zand A-Zas well as digital and underlined:

text = "_"
ret = re.match('\w',text)
print(ret.group())
>> _

And if you want to match one of the other characters, then not match. Sample code is as follows:

text = "+"
ret = re.match('\w',text)
print(ret.group())
>> AttributeError: 'NoneType' object has no attribute

\ W is matched and \ w the opposite:

text = "+"
ret = re.match('\W',text)
print(ret.group())
>> +

And if your text is in English or an underscore character, then not match up. Sample code is as follows:

text = "_"
ret = re.match('\W',text)
print(ret.group())
>> AttributeError: 'NoneType' object has no attribute

[] Combined manner, as long as an item in parentheses are considered matching is successful:

text = "0731-88888888"
ret = re.match('[\d\-]+',text)
print(ret.group())
>> 0731-88888888

Several matching rules mentioned before, in fact, can be used in the form of brackets to replace:

  • \d:[0-9]
  • \D:0-9
  • \ In [0-9a-zA-Z_]
  • \ W [^ 0-9a-zA-Z_]

Matching multiple characters:

  1. *: 0 or match any number of characters. Sample code is as follows:

     text = "0731"
     ret = re.match('\d*',text)
     print(ret.group())
     >> 0731
    

    Because the above requirements are matched \d, then it requires a number, followed by an asterisk, 0731 can be matched to four characters.

  2. +: Matches one or more characters. A minimum. Sample code is as follows:

     text = "abc"
     ret = re.match('\w+',text)
     print(ret.group())
     >> abc
    

    Because the match is \w, then it requires English characters, followed by a plus sign, which means there must be at least a satisfying \wcharacter to be able to match. If the text is a blank character or is not satisfied with a \ w character, then it will error. Sample code is as follows:

     text = ""
     ret = re.match('\w+',text)
     print(ret.group())
     >> AttributeError: 'NoneType' object has no attribute
    
  3. ?: Matching characters may appear once or not present (0 or 1). Sample code is as follows:

     text = "123"
     ret = re.match('\d?',text)
     print(ret.group())
     >> 1
    
  4. {m}: M matching characters. Sample code is as follows:

     text = "123"
     ret = re.match('\d{2}',text)
     print(ret.group())
     >> 12
    
  5. {m,n}: Match mn characters. In the middle of this character can be matched. Sample code is as follows:

     text = "123"
     ret = re.match('\d{1,2}',text)
     prit(ret.group())
     >> 12
    

    If the text is only one character, you can also match it. Sample code is as follows:

     text = "1"
     ret = re.match('\d{1,2}',text)
     prit(ret.group())
     >> 1
    

Small case:

  1. Verify the phone number: the phone number of the rules is 1the beginning, second place may be 34587, the man behind 9 can be relaxed. Sample code is as follows:

     text = "18570631587"
     ret = re.match('1[34587]\d{9}',text)
     print(ret.group())
     >> 18570631587
    

    And if it is not satisfied with the condition of a cell phone number. Then it can not match up. Sample code is as follows:

     text = "1857063158"
     ret = re.match('1[34587]\d{9}',text)
     print(ret.group())
     >> AttributeError: 'NoneType' object has no attribute
    
  2. Verify E-mail: mailbox name mailbox rule is 数字、数字、下划线composed of, then @sign back is the domain name. Sample code is as follows:

     text = "[email protected]"
     ret = re.match('\w+@\w+\.[a-zA-Z\.]+',text)
     print(ret.group())
    
  3. Verify URL: URL rule is in front httpor httpsor ftpthen add a colon, coupled with a slash, then the back is any non-whitespace characters may appear. Sample code is as follows:

     text = "http://www.baidu.com/"
     ret = re.match('(http|https|ftp)://[^\s]+',text)
     print(ret.group())
    
  4. Verify ID: ID rule is that a total of 18, in front of 17 are numbers, may be behind a number that can be lowercase x, can be capitalized X. Sample code is as follows:

     text = "3113111890812323X"
     ret = re.match('\d{17}[\dxX]',text)
     print(ret.group())
    

^ (Caret): Indicates the start to ...:

text = "hello"
ret = re.match('^h',text)
print(ret.group())

If it is in parentheses, it represents the inverse operation.

$: Indicates to ... end:

# 匹配163.com的邮箱
text = "[email protected]"
ret = re.search('\w+@163\.com$',text)
print(ret.group())
>> xxx@163.com

|: Match multiple expressions or string:

text = "hello|world"
ret = re.search('hello',text)
print(ret.group())
>> hello

Greedy and non-greedy mode:

Greedy: the regular expression will match as many characters. The default mode is greedy.
Non-greedy modes: regular expression matching characters will be as little as possible.
Sample code is as follows:

text = "0123456"
ret = re.match('\d+',text)
print(ret.group())
# 因为默认采用贪婪模式,所以会输出0123456
>> 0123456

Non-greedy mode can be changed, then it will only match to 0. Sample code is as follows:

text = "0123456"
ret = re.match('\d+?',text)
print(ret.group())

Case: Matching 0-100number between:

text = '99'
ret = re.match('[1-9]?\d$|100$',text)
print(ret.group())
>> 99

And if text=101, it will throw an exception. Sample code is as follows:

text = '101'
ret = re.match('[1-9]?\d$|100$',text)
print(ret.group())
>> AttributeError: 'NoneType' object has no attribute 'group'

Escape character and native string:

In a regular expression, some characters have a special meaning characters. So if you want to match these characters, then it must be escaped with a backslash. For example, $on behalf of the end ... is, if you want to match $, then it must be used \$. Sample code is as follows:

text = "apple price is \$99,orange paice is $88"
ret = re.search('\$(\d+)',text)
print(ret.group())
>> $99

Native String:
In a regular expression, \is designed to make escape. In Python \it is also used for escape. So if you want to match a string in general \, it should be given four \. Sample code is as follows:

text = "apple \c"
ret = re.search('\\\\c',text)
print(ret.group())

Therefore, to use the native string can solve this problem:

text = "apple \c"
ret = re.search(r'\\c',text)
print(ret.group())

re module commonly used functions:

match:

Match from the start position. If the position is not the beginning of the match. Directly failed. Sample code is as follows:

text = 'hello'
ret = re.match('h',text)
print(ret.group())
>> h

If the first letter is not h, it will fail. Sample code is as follows:

text = 'ahello'
ret = re.match('h',text)
print(ret.group())
>> AttributeError: 'NoneType' object has no attribute 'group'

If you want to match data wrap, then they would pass a flag=re.DOTALL, you can match a newline. Sample code is as follows:

text = "abc\nabc"
ret = re.match('abc.*abc',text,re.DOTALL)
print(ret.group())

search:

Looking for character meet the conditions in the string. If found, it returns. To put it plainly, only to find that first meet the conditions.

text = 'apple price $99 orange price $88'
ret = re.search('\d+',text)
print(ret.group())
>> 99

Grouping:

In the regular expression, the filter may be grouped into a string. Use grouping parentheses way.

  1. group: And group(0)are equivalent, the whole return string satisfying the condition.
  2. groups: Returns inside the subgroups. Index starts at 1.
  3. group(1): Returns the first subgroup, you can pass a plurality.
    Sample code is as follows:
text = "apple price is $99,orange price is $10"
ret = re.search(r".*(\$\d+).*(\$\d+)",text)
print(ret.group())
print(ret.group(0))
print(ret.group(1))
print(ret.group(2)) print(ret.groups()) 

findall:

Identify all meet the conditions, returns a list.

text = 'apple price $99 orange price $88'
ret = re.findall('\d+',text)
print(ret)
>> ['99', '88']

sub:

To replace the string. The matched string with the other string.

text = 'apple price $99 orange price $88'
ret = re.sub('\d+','0',text)
print(ret)
>> apple price $0 orange price $0 

subCase function, pull hook obtain data network:

html = """
<div>
<p>基本要求:</p>
<p>1、精通HTML5、CSS3、 JavaScript等Web前端开发技术,对html5页面适配充分了解,熟悉不同浏览器间的差异,熟练写出兼容各种浏览器的代码;</p>
<p>2、熟悉运用常见JS开发框架,如JQuery、vue、angular,能快速高效实现各种交互效果;</p>
<p>3、熟悉编写能够自动适应HTML5界面,能让网页格式自动适应各款各大小的手机;</p>
<p>4、利用HTML5相关技术开发移动平台、PC终端的前端页面,实现HTML5模板化;</p>
<p>5、熟悉手机端和PC端web实现的差异,有移动平台web前端开发经验,了解移动互联网产品和行业,有在Android,iOS等平台下HTML5+CSS+JavaScript(或移动JS框架)开发经验者优先考虑;6、良好的沟通能力和团队协作精神,对移动互联网行业有浓厚兴趣,有较强的研究能力和学习能力;</p>
<p>7、能够承担公司前端培训工作,对公司各业务线的前端(HTML5\CSS3)工作进行支撑和指导。</p>
<p><br></p>
<p>岗位职责:</p>
<p>1、利用html5及相关技术开发移动平台、微信、APP等前端页面,各类交互的实现;</p>
<p>2、持续的优化前端体验和页面响应速度,并保证兼容性和执行效率;</p>
<p>3、根据产品需求,分析并给出最优的页面前端结构解决方案;</p>
<p>4、协助后台及客户端开发人员完成功能开发和调试;</p>
<p>5、移动端主流浏览器的适配、移动端界面自适应研发。</p>
</div>
"""

ret = re.sub('</?[a-zA-Z0-9]+>',"",html)
print(ret)

split:

Use regular expressions to split the string.

text = "hello world ni hao"
ret = re.split('\W',text)
print(ret)
>> ["hello","world","ni","hao"] 

compile:

For some regular use of regular expressions can be used compileto compile, when the latter can then be used to take over direct use, efficiency will be faster. But compilealso can be specified flag=re.VERBOSE, in writing regular expressions when you can make a comment. Sample code is as follows:

text = "the number is 20.50"
r = re.compile(r"""
                \d+ # 小数点前面的数字
                \.? # 小数点
                \d* # 小数点后面的数字
                """,re.VERBOSE)
ret = re.search(r,text)
print(ret.group())

Guess you like

Origin www.cnblogs.com/csnd/p/11469351.html