Five, regular expressions in Python

This article mainly introduces some Python regular expressions, such as graph databases, regular expressions, etc., as the basis for the construction of knowledge graphs. It is still necessary to know about them. These blogs are all about laying the foundation for the future. Follow the column "Knowledge Graph Series" to learn more about the knowledge graph~


table of Contents

1. Introduction

Two, re.match function

2.1 Function introduction

2.2 Example

Three, re.search function

3.1 Function introduction

3.2 Example

Four, search and replace

4.1 Function introduction

4.2 Examples of non-function repl parameters

4.3 The repl parameter is an example of a function

Five, re.compile function

5.1 Function introduction

5.2 Example

Six, findall function

6.1 Function introduction

6.2 Example

Seven, re.finditer function

7.1 Function introduction

7.2 Example

Eight, re.split function

8.1 Function introduction

9. Other

9.1 Regular expression modifiers-optional flags

9.2 Regular expression patterns


 

1. Introduction

Python has added the re module since version 1.5, which provides Perl-style regular expression patterns. Regular expression is a special sequence of characters, which can help you easily check whether a string matches a certain pattern. The re module makes the Python language have all the regular expression functions. The compile function generates a regular expression object based on a pattern string and optional flag parameters. This object has a series of methods for regular expression matching and replacement. The re module also provides functions that are exactly the same as these methods. These functions use a pattern string as their first parameter. Next, let's take a look at these functions in detail.

Two, re.match function

2.1 Function introduction

The re.match function tries to match a pattern from the beginning of the string. If the match is not successful at the beginning, match() returns none; if the match is successful, it returns a matched object. The syntax format is as follows:

re.match(pattern, string, flags=0)

The related parameters are as follows:

We can use the group(num) or groups() matching object function to get the matching expression.

2.2 Example

import re

if __name__ == '__main__':

    test_data = "This is a test data: My name is xzw."
    match_object = re.match(r'(.*) is (.*?) .*', test_data, re.M | re.I)
    if match_object:
        print("match_object.groups() : ", match_object.groups())
        print("match_object.group() : ", match_object.group())
        print("match_object.group(1) : ", match_object.group(1))
        print("match_object.group(2) : ", match_object.group(2))
    else:
        print("No match!!")

Output this result, we can see the following:

Three, re.search function

3.1 Function introduction

The re.search function will search for pattern matches in the string until the first match is found. The syntax is:

re.search(pattern, string, flags=0)

The function parameters are described as follows:

Similarly, the re.search method returns a matched object if the match is successful, otherwise it returns None.

3.2 Example

import re

if __name__ == '__main__':

    test_data = "This is a test data: My name is xzw."
    match_object = re.search( r'(.*) name (.*?) .*', test_data, re.M|re.I)
    if match_object:
        print("match_object.groups() : ", match_object.groups())
        print("match_object.group() : ", match_object.group())
        print("match_object.group(1) : ", match_object.group(1))
        print("match_object.group(2) : ", match_object.group(2))
    else:
        print("No match!!")

The result of the function is as follows:

Note: The re.match function only matches the beginning of the string. If the string does not match the regular expression at the beginning, the match fails and the function returns None; while the re.search function matches the entire string until a match is found.

Let's look at an example to distinguish the difference between the two.

import re

if __name__ == '__main__':

    test_data = "This is a test data: My name is xzw."
    search_object = re.search(r'name', test_data, re.M | re.I)
    if search_object:
        print("search_object.group() : ", search_object.group())
    else:
        print("No search!!")

    match_object = re.match(r'name', test_data, re.M | re.I)
    if match_object:
        print("match_object.group() : ", match_object.group())
    else:
        print("No match!!")

The output is as follows:

Four, search and replace

4.1 Function introduction

The re.sub function is used to replace matches in a string. The syntax is as follows:

re.sub(pattern, repl, string, max=0)

Among them, the repl parameter can be a function. 

4.2 Examples of non-function repl parameters

import re

if __name__ == '__main__':
    phone = "187-1111-1111  # 这是一个移动的电话号码"

    # 删除字符串中的注释内容
    number = re.sub(r'#.*$', "", phone)
    print("phone: ", number)

    # 删除非数字(-)的字符串
    number = re.sub(r'\D', "", phone)
    print("phone: ", number)

The results of the operation are:

4.3 The repl parameter is an example of a function

import re


def multiply(match_num):
    '''
    将匹配到的数字乘以2
    :param match_num: 匹配到的数字
    :return: 返回处理后的结果
    '''
    value = int(match_num.group('value'))
    return str(value * 2)


if __name__ == '__main__':
    s = '12354HFD567'
    print(re.sub('(?P<value>\d+)', multiply, s))

The results of the operation are as follows:

Five, re.compile function

5.1 Function introduction

The compile function is used to compile regular expressions and generate a regular expression (Pattern) object for use by the match() and search() functions. The syntax is:

re.compile(pattern[, flags])

The parameters are explained as follows:

(1)pattern: 一个字符串形式的正则表达式
(2)flags: 可选,表示匹配模式,比如忽略大小写,多行模式等,具体参数为:
	re.I 忽略大小写
	re.L 表示特殊字符集 \w, \W, \b, \B, \s, \S 依赖于当前环境
	re.M 多行模式
	re.S 即为 . 并且包括换行符在内的任意字符(. 不包括换行符)
	re.U 表示特殊字符集 \w, \W, \b, \B, \d, \D, \s, \S 依赖于Unicode字符属性数据库
	re.X 为了增加可读性,忽略空格和 # 后面的注释

5.2 Example

import re

if __name__ == '__main__':
    pattern = re.compile(r'\d+')  # 用于匹配至少一个数字

    m1 = pattern.match('aaa123bbb456ccc789')  # 查找头部,没有匹配
    print(m1)

    m2 = pattern.match('aaa123bbb456ccc789', 2, 10)  # 从'a'的位置开始匹配,没有匹配
    print(m2)

    m3 = pattern.match('aaa123bbb456ccc789', 3, 10)  # 从'1'的位置开始匹配,正好匹配
    print(m3)

    print(m3.group(), m3.start(), m3.end(), m3.span())

The results of the operation are as follows:

In the above example, a Match object is returned when the match is successful, where: group([group1, …]) method is used to obtain one or more grouped matching strings, when you want to obtain the entire matched substring, you can Use group() or group(0) directly; the start([group]) method is used to obtain the starting position of the matched substring in the entire string (the index of the first character of the substring), and the default value of the parameter is 0; the end([group]) method is used to obtain the end position of the matched substring in the entire string (the index of the last character of the substring +1), the parameter default value is 0; span([group]) method Return (start(group), end(group)).

Six, findall function

6.1 Function introduction

Find all substrings matched by the regular expression in the string and return a list. If no match is found, an empty list is returned. The syntax is:

findall(string[, pos[, endpos]])

The parameter description is as follows:

1、string: 待匹配的字符串。
2、pos: 可选参数,指定字符串的起始位置,默认为0。
3、endpos: 可选参数,指定字符串的结束位置,默认为字符串的长度。

6.2 Example

import re

if __name__ == '__main__':
    pattern = re.compile(r'\d+')  # 查找数字

    result1 = pattern.findall('This is a test data: My name is xzw.')
    result2 = pattern.findall('dhfsa3bn45tdfs', 0, 10)

    print(result1)
    print(result2)

The results are as follows:

Seven, re.finditer function

7.1 Function introduction

Similar to the findall function, it finds all substrings matched by the regular expression in the string and returns them as an iterator. The syntax is:

re.finditer(pattern, string, flags=0)

7.2 Example

import re

if __name__ == '__main__':
    iter = re.finditer(r"\d+", "vfdkl4teree87693n2342ln")
    for match in iter:
        print(match.group())

The results are as follows:

Eight, re.split function

8.1 Function introduction

The split method splits the string according to the substrings that can be matched and returns the list. Its syntax is as follows:

re.split(pattern, string[, maxsplit=0, flags=0])

The parameters are as follows:

9. Other

9.1 Regular expression modifiers-optional flags

Regular expressions can contain some optional flag modifiers to control the matching pattern. The modifier is specified as an optional flag. Multiple flags can be specified by bitwise OR (|) them.

9.2 Regular expression patterns

The pattern string uses a special syntax to represent a regular expression: letters and numbers represent themselves. Letters and numbers in a regular expression pattern match the same string. Most letters and numbers have different meanings when you add a backslash before them. Punctuation marks match themselves only when they are escaped, otherwise they have a special meaning. The backslash itself needs to be escaped with a backslash. Since regular expressions usually contain backslashes, it is best to use raw strings to represent them. Pattern elements (such as r'/t', equivalent to'//t') match the corresponding special characters. The following figure lists the special elements in the regular expression pattern syntax. If you provide optional flag parameters while using a pattern, the meaning of some pattern elements will change.

 

 

This article has come to an end, this article mainly describes Python's regular expressions. Which refers to the W3CSchool tutorial . What problems did you encounter in the process, welcome to leave a message, let me see what problems you all encountered~

Guess you like

Origin blog.csdn.net/gdkyxy2013/article/details/109618809