Detailed explanation of standard library re of Python series modules

    Thanks for likes and attention, a little progress every day! come on!

Table of contents

1. Python regular expressions

1.1 Common operations of re module

1.2 re.match

1.3 re.search

1.4 re.findall

1.5 re.compile function

1.6 re.sub search and replace

1.7 re.split split

1.8 Practical case: match files according to file name and move


Learning record of Python series articles:

Windows environment installation and configuration of Python series

Variables and operators of Python series

Judgment and cycle of Python series - Blog - CSDN Blog

Python series of strings and lists - blog driving tractor home - CSDN blog

File Operations and Functions of the Python Series

Detailed explanation of the standard library OS of Python series modules - Programmer Sought

Detailed explanation of the standard library re of Python series modules - Programmer Sought

Detailed explanation of the standard library json of Python series modules - Programmer Sought

Detailed explanation of the standard library shutil of Python series modules - Programmer Sought


1. Python regular expressions


A regular expression is a special sequence of characters that helps you easily check whether a string matches a certain pattern.

Python has added the re module since version 1.5, which provides Perl-style regular expression patterns.

  • The re module brings full regular expression functionality to the Python language.
  • The compile function generates a regular expression object from a pattern string and optional flags arguments. This object has a set of methods for regular expression matching and replacement.
  • The re module also provides functions that do exactly what these methods do, taking a pattern string as their first argument.

1.1 Common operations of re module


module + function (method)

describe

re.match()

Match at the beginning, similar to the ^ symbol in the shell

re.search()

match the entire line, but only the first

re.findall()

Match all and make a list of all matched strings

re.split()

Use the matching string as the delimiter, and convert the delimited to list type

re.sub()

match and replace

1.2 re.match


re.match Attempts to match a pattern from the beginning of the string, if not the beginning of the match, match() returns none.

Function syntax :

re.match(pattern, string, flags=0)

Description of function parameters :

parameter

describe

pattern

match regular expression

string

The string to match.

flags

The flag bit is used to control the matching mode of the regular expression, such as: whether to be case-sensitive, multi-line matching, etc. See: Regex Modifiers - Optional Flags

The re.match method returns a matching object if the match is successful, otherwise it returns None.

We can use group(num) or groups() match object function to get match expression.

match object method

describe

group(num=0)

The string that matches the entire expression, group() can be fed multiple group numbers at once, in which case it will return a tuple containing the values ​​corresponding to those groups.

groups()

Returns a tuple containing all group strings, from 1 to the included group number.

Example:

import re

print(re.match("aaa", "sdfaaasd"))    # 结果为none,表示匹配未成功
print(re.match("aaa", "aaasd"))    	  # 有结果输出,表示匹配成功

abc = re.match("aaa\d+", "aaa234324bbbbccc")
print(abc.group())  			   # 结果为aaa234324,表示打印出匹配那部分字符串

Results of the:

1.3 re.search


re.search scans the entire string and returns the first successful match.

Function syntax:

re.search(pattern, string, flags=0)

Description of function parameters :

parameter

describe

pattern

match regular expression

string

The string to match.

flags

The flag bit is used to control the matching mode of the regular expression, such as: whether to be case-sensitive, multi-line matching, etc.

The re.search method returns a matching object if the match is successful, otherwise it returns None.

We can use group(num) or groups() match object function to get match expression.

match object method

describe

group(num=0)

The string that matches the entire expression, group() can be fed multiple group numbers at once, in which case it will return a tuple containing the values ​​corresponding to those groups.

groups()

Returns a tuple containing all group strings, from 1 to the included group number.

Example:

import re

# 有结果输出,表示匹配成功;re.search就是全匹配,而不是开头(但只返回一个匹配的结果);想开头匹配的话可以使用^aaa
print(re.search("hadoop", "sdfhadoopsdhadoopwwsdf")) 

# 验证,确实只返回一个匹配的结果,并使用group方法将其匹配结果打印出来
print(re.search("hadoop\d+", "hadoop111222bbbbccchadoop333444").group()) 

Results of the:

1.4 re.findall


Find all substrings matched by the regular expression in a string and return a list , or a list of tuples if there are multiple matching patterns, or an empty list if no matches are found.

Note: match and search match once, findall matches all.

The syntax format is:

findall(string[, pos[, endpos]])

Description of function parameters :

parameter

describe

string

The string to match.

pos

Optional parameter, specify the starting position of the string, the default is 0.

endpos

Optional parameter, specify the end position of the string, the default is the length of the string.

Example:

import re

print(re.findall("hadoop", "sdfhadoopsdhadoopwwsdf")) 
print(re.findall("hadoop\d+", "hadoop111222bbbbccchadoop333444"))  

Results of the:

Summary: re.search() and re.findall()

  • re.match only matches the beginning of the string, if the beginning of the string does not match the regular expression, the match fails and the function returns None
  • neither start matches
  • re.search() only matches the first one in a line, and re.findall() matches all the matches in a line
  • re.search() can print the matching results through group(), re.findall() does not have a group() method, and directly displays all the matching results in the form of a list

1.5 re.compile function


The compile function is used to compile the regular expression and generate a regular expression (Pattern) object for use by the two functions of match() and search().

The syntax format is:

re.compile(pattern[, flags])

Test Data

t1.4301.jyg.qcs.ipva.cn,0,1,1,PsARegion,4#37#6#85#272#70#268#17
t1.4301.jyg.qcs.ipva.cn,0,1,1,PsCRegion,275#94#9#105#13#147#285#140
t1.py.3895.qcs.ipva.cn,0,1,1,PsDRegion,84#86#228#88#216#138#98#139
t1.py.3895.qcs.ipva.cn,1,1,1,PsARegion,77#85#239#87#218#133#99#132
t1.8381.kf.qcs.ipva.cn,0,1,1,PsBRegion,125#145#320#146#330#207#67#210
t1.8381.kf.qcs.ipva.cn,0,1,1,PsCRegion,126#143#322#146#329#208#68#210

Requirement: Line output matching "PsARegion"

import re
f = open("/root/data.txt")  # 返回一个文件对象
line = f.readline()              # 调用文件的 readline()方法
text = ""
pattern = re.compile("PsARegion")
while line:
    if(pattern.search(line)):
       # 拼接
       text += line 
    line = f.readline()
print(text, end='')
f.close()

Execution result: the correct match outputs two rows of data

1.6 re.sub search and replace


Python's re module provides re.sub for replacing occurrences in strings.

grammar:

re.sub(pattern, repl, string, count=0, flags=0)

Description of function parameters :

parameter

describe

pattern

Pattern string in regex.

pos

The replacement string, which can also be a function.

string

The original string to be searched and replaced.

count

The maximum number of replacements after pattern matching, default 0 means replace all matches.

Example:

import re
 
phone = "2004-959-559 # 这是一个国外电话号码"
 
# 删除字符串中的 Python注释 
num = re.sub(r'#.*$', "", phone)
print("电话号码是: ", num)
 
# 删除非数字(-)的字符串 
num = re.sub(r'\D', "", phone)
print("电话号码是: ", num)

Execution result :

1.7 re.split split


The split method splits the string according to the substrings that can be matched and returns a list. Its usage is as follows:

re.split(pattern, string[, maxsplit=0, flags=0])

parameter:

parameter

describe

pattern

match regular expression

string

The string to match.

maxsplit

The number of splits, maxsplit=1 split once, the default is 0, unlimited times.

flags

The flag bit is used to control the matching mode of the regular expression, such as: whether to be case-sensitive, multi-line matching, etc. See: Regex Modifiers - Optional Flags

Test Data:

t1.4301.jyg.qcs.ipva.cn,0,1,1,PsARegion,4#37#6#85#272#70#268#17
t1.4301.jyg.qcs.ipva.cn,0,1,1,PsCRegion,275#94#9#105#13#147#285#140
t1.py.3895.qcs.ipva.cn,0,1,1,PsDRegion,84#86#228#88#216#138#98#139
t1.py.3895.qcs.ipva.cn,1,1,1,PsARegion,77#85#239#87#218#133#99#132
t1.8381.kf.qcs.ipva.cn,0,1,1,PsBRegion,125#145#320#146#330#207#67#210
t1.8381.kf.qcs.ipva.cn,0,1,1,PsCRegion,126#143#322#146#329#208#68#21

Requirement: Regularly match the rows of "PsARegion" and take out the first two columns

import re
f = open("/root/data.txt")  # 返回一个文件对象
line = f.readline()              # 调用文件的 readline()方法
text = ""
pattern = re.compile("PsARegion")
while line:
    if(pattern.search(line)):
       dataList = re.split(",", line)
       line = str(dataList[0]) + "," + str(dataList[1]) + "\n"
       text += line 
    line = f.readline()
print(text, end='')
f.close()

Results of the:

1.8 Practical case: match files according to file name and move


move_file.py

# -*- coding:UTF-8 -*-

import logging
import os
import re
import shutil
import sys
from imp import reload
from logging.handlers import RotatingFileHandler

reload(sys)

# 初始化日志    
logger = logging.getLogger('mylogger')
logger.setLevel(level=logging.INFO)
fmt = '%(asctime)s - %(pathname)s[line:%(lineno)d] - %(levelname)s: %(message)s'
format_str = logging.Formatter(fmt)
fh = RotatingFileHandler("move_file.log", maxBytes=10*1024*1024, backupCount=2,encoding="utf-8")
fh.namer = lambda x: "backup."+x.split(".")[-1]

fh.setFormatter(fmt=format_str)
logger.addHandler(fh)


def move_file(res_dir, tar_dir, pattern):
    """ 文件移动
    :param res_dir: 源路径
    :param tar_dir: 目标路径
    :param pattern: 正则匹配模式
    :return:
    """
    try:
        logger.info("开始移动文件!")
        for filename in os.listdir(res_dir):
            
            # 获取文件的完整路径
            file_path = os.path.join(res_dir, filename)
            print(filename, pattern)
			# 正则匹配文件名
            if re.match(pattern, filename):
                shutil.move(file_path, tar_dir) # 移动文件 # shutil库,它作为os模块的补充,提供了复制、移动、删除、压缩、解压
                print("已移动文件【%s】" %filename)
        logger.info("结束移动文件!")
    except Exception as why:
        print(why)
        
if __name__ == "__main__":
    print(sys.argv)
    if len(sys.argv) == 4:
        move_file(sys.argv[1], sys.argv[2], sys.argv[3])

Called on Windows

python.exe D:\\IPVA\\file_move\\move_file.py  D:\\IPVA\Data_Traffic\\DataServerCloud01_AlarmEvent\\  D:\\IPVA\Data_Traffic_Bak\\DataServerCloud01_AlarmEvent\\  Data.*.COMPLETED

The move was successful, and then I did a ren rename


reference:

Python Regular Expression | Novice Tutorial

Python four ways to read file content line by line - Programmer Sought

Guess you like

Origin blog.csdn.net/qq_35995514/article/details/130822861