Learn python regular expressions is that simple

I. Introduction

Getting started with this article we use regular expressions, regular expressions rules apply not only python language, basic programming languages ​​are most suitable, extremely widespread in everyday use, readers need to learn regular expressions. After reading this article, readers should understand what a regular expression, the rules of regular expressions, regular expressions common example of how to use regular expressions operating functions in python;

The concept of two regular expressions

Regular expressions refers to the use of a special pattern matching string in a string of substrings obtained, we can obtain the substring extraction, replacement and other operations ;

For example zszxz666the string, and now the knowledge seekers want to get a substring zszxz, you need to go through to get a pattern matching substring, regular expression in this mode can be a variety of formats, knowledge seekers here in the simplest positive mode [a-z]*, and then through the python regular expression matching substring function can be obtained zszxz; string functions operating in this manner easier than usual for a wide range;

Three commonly used regular pattern matching

Common regular expression pattern is as follows, if there is doubt on these patterns can refer to the regular expression manual ; canonical example manual also for everyday use, such as user name, password, email, URL pattern matching and so on;

mode meaning
^ Matches the beginning of the string
$ Matches the end of the string
. Match any character except newline
+ Matches the preceding subexpression one or more times
? Matches the preceding subexpression zero or one, or represent a non-greedy qualifiers
* Matches the preceding subexpression zero or more times
\ Escape special characters
\d Match any number, is equivalent to [0-9].
\D Matches any non-digit
\s Matches any whitespace (tab, line feed, carriage return, feed, vertical tab,), equivalent to[^\f\n\r\t\v]
\S Matches any non-whitespace characters. Equivalence[^\f\n\r\t\v]
\w Match alphanumeric underscore
\W Matching non-alphanumeric underscore
[…] It used to represent a set of characters; [AMK] matches 'a', 'm' or 'k'
[^…] Does not match the character [] is; [^amk]mismatch 'a', 'm' or 'k'
{n} Match the preceding subexpression n times
{n,} Preceding subexpression matches at least n times
{n,m} Match the preceding subexpression matches least n times m times and match up
| Represents or; a | b, which matches a or b
\b Matches a word boundary, that is, the location and the space between the word
\B Non-word boundary matches

Python regular four symbols commonly Matthews

re.I The match is not case sensitive
re.L Do identify the localization (locale-aware) Match
re.M Multi-line matching, affecting ^ and $
re.S Make. All matches including newline characters, including
re.U According to parse character Unicode character set. This flag affect \ w, \ W, \ b, \ B.
re.X This flag by giving you more flexibility in format so that you will write regular expressions easier to understand

Five common python regular rows Description

  • pattern represents a regular expression pattern
  • string represents a string to be matched passed
  • flags flag can be defined with the modifier IV
  • count indicates the number of defined match
  • repl string representing alternative, also be a function of
  • starting position pos
  • endpos end position
  • maxsplit maximum number of divisions
Function name Function Meaning
re.findall(string, pos, endpos) Match all substrings, and returns a list of unmatched, empty list is returned
re.match(pattern, string, flags=0) Matching a pattern from a starting position of the string, if the match fails returns None
re.search(pattern, string, flags=0) Scanning the entire first string and returns a successful match; failed return matching None
re.compile(pattern, flags=0) Compiling a regular expression to generate a regular expression (Pattern) objects
re.sub(pattern, repl, string, count=0, flags=0) Find and Replace
re.finditer(pattern, string, flags=0) Findall with similar returns iterators
re.split(pattern, string, maxsplit=0, flags=0]) After the matched substring is divided returns a list of

Six examples of commonly used functions

6.1 match function

group(num=0) 函数表示提取匹配的表达式,可以使用组号提取对应的匹配结果;知识追寻者想要获得字符串中第一个出现的数字串;

import re
# 指定模式 至少匹配一个数字
pattern = re.compile(r'\d+')
# 输入的字符串
mat = pattern.match("451zszxz666")
# 获得第一个匹配到的值
g = mat.group();
# 451
print(g)

6.2search函数

知识追寻者想要获得指定的字符串,第一个匹配的就好;

import re
# 想匹配nhzszxz 或者 nh666 或者 nhnh
pattern = re.compile(r'nh(zszxz|666|nh)')
ser = pattern.search('nhzszxzkkk nh666 llll nhnh')
g_0 = ser.group()
# zszxz
print(g_0)
g_1 = ser.group(1)
# nhzszxz
print(g_1)

6.3 findall函数

知识追寻者想要在字符串中获得所有的数字;

import re
pattern = re.compile(r'\d+')
# 输入的字符串
mat = pattern.fidall("451zszxz666")
# ['451', '666']
print(mat)
# 666
print(mat[1])

6.6 sub函数

知识追寻者想要获得所有非数字的子串;

import re
str = '8556gfggs5555dfg'
# 替换所有数字
result = re.sub(r'\d', '', str)
# gfggsdfg
print(result)

6.7 split函数

知识追寻者想要获得以,分割的字符串;

import re
str = '123,456,zszxz,666'
result = re.split(',',str)
# ['123', '456', 'zszxz', '666']
print(result)

6.8 finditer 函数

知识追寻者想要获得数字451,和666;

import re
pattern = re.compile(r'\d+')
# 输入的字符串
mat = pattern.finditer("451zszxz666")
for it in mat:
    print(it.group())

七 初学者使用正则表达式正确的姿势

初学者在使用正则表达式的时候难免会得到的匹配的结果与自己预期的不符合,可以借助一些在线工具匹配完成后再进行代码编写,常用的在线正则匹配测试如下;

  1. 在线工具
  2. 站长工具

在这里插入图片描述

发布了96 篇原创文章 · 获赞 107 · 访问量 2万+

Guess you like

Origin blog.csdn.net/youku1327/article/details/103935686