Thanks for likes and attention, a little progress every day! come on!
Table of contents
1.1 Common operations of re module
1.8 Practical case: match files according to file name and move
Learning record of Python series articles:
Windows environment installation and configuration of Python series
Variables and operators of Python series
Judgment and cycle of Python series - Blog - CSDN Blog
Python series of strings and lists - blog driving tractor home - CSDN blog
File Operations and Functions of the Python Series
Detailed explanation of the standard library OS of Python series modules - Programmer Sought
Detailed explanation of the standard library re of Python series modules - Programmer Sought
Detailed explanation of the standard library json of Python series modules - Programmer Sought
Detailed explanation of the standard library shutil of Python series modules - Programmer Sought
1. Python regular expressions
A regular expression is a special sequence of characters that helps you easily check whether a string matches a certain pattern.
Python has added the re module since version 1.5, which provides Perl-style regular expression patterns.
- The re module brings full regular expression functionality to the Python language.
- The compile function generates a regular expression object from a pattern string and optional flags arguments. This object has a set of methods for regular expression matching and replacement.
- The re module also provides functions that do exactly what these methods do, taking a pattern string as their first argument.
1.1 Common operations of re module
module + function (method) |
describe |
re.match() |
Match at the beginning, similar to the ^ symbol in the shell |
re.search() |
match the entire line, but only the first |
re.findall() |
Match all and make a list of all matched strings |
re.split() |
Use the matching string as the delimiter, and convert the delimited to list type |
re.sub() |
match and replace |
1.2 re.match
re.match Attempts to match a pattern from the beginning of the string, if not the beginning of the match, match() returns none.
Function syntax :
re.match(pattern, string, flags=0)
Description of function parameters :
parameter |
describe |
pattern |
match regular expression |
string |
The string to match. |
flags |
The flag bit is used to control the matching mode of the regular expression, such as: whether to be case-sensitive, multi-line matching, etc. See: Regex Modifiers - Optional Flags |
The re.match method returns a matching object if the match is successful, otherwise it returns None.
We can use group(num) or groups() match object function to get match expression.
match object method |
describe |
group(num=0) |
The string that matches the entire expression, group() can be fed multiple group numbers at once, in which case it will return a tuple containing the values corresponding to those groups. |
groups() |
Returns a tuple containing all group strings, from 1 to the included group number. |
Example:
import re
print(re.match("aaa", "sdfaaasd")) # 结果为none,表示匹配未成功
print(re.match("aaa", "aaasd")) # 有结果输出,表示匹配成功
abc = re.match("aaa\d+", "aaa234324bbbbccc")
print(abc.group()) # 结果为aaa234324,表示打印出匹配那部分字符串
Results of the:
1.3 re.search
re.search scans the entire string and returns the first successful match.
Function syntax:
re.search(pattern, string, flags=0)
Description of function parameters :
parameter |
describe |
pattern |
match regular expression |
string |
The string to match. |
flags |
The flag bit is used to control the matching mode of the regular expression, such as: whether to be case-sensitive, multi-line matching, etc. |
The re.search method returns a matching object if the match is successful, otherwise it returns None.
We can use group(num) or groups() match object function to get match expression.
match object method |
describe |
group(num=0) |
The string that matches the entire expression, group() can be fed multiple group numbers at once, in which case it will return a tuple containing the values corresponding to those groups. |
groups() |
Returns a tuple containing all group strings, from 1 to the included group number. |
Example:
import re
# 有结果输出,表示匹配成功;re.search就是全匹配,而不是开头(但只返回一个匹配的结果);想开头匹配的话可以使用^aaa
print(re.search("hadoop", "sdfhadoopsdhadoopwwsdf"))
# 验证,确实只返回一个匹配的结果,并使用group方法将其匹配结果打印出来
print(re.search("hadoop\d+", "hadoop111222bbbbccchadoop333444").group())
Results of the:
1.4 re.findall
Find all substrings matched by the regular expression in a string and return a list , or a list of tuples if there are multiple matching patterns, or an empty list if no matches are found.
Note: match and search match once, findall matches all.
The syntax format is:
findall(string[, pos[, endpos]])
Description of function parameters :
parameter |
describe |
string |
The string to match. |
pos |
Optional parameter, specify the starting position of the string, the default is 0. |
endpos |
Optional parameter, specify the end position of the string, the default is the length of the string. |
Example:
import re
print(re.findall("hadoop", "sdfhadoopsdhadoopwwsdf"))
print(re.findall("hadoop\d+", "hadoop111222bbbbccchadoop333444"))
Results of the:
Summary: re.search() and re.findall()
- re.match only matches the beginning of the string, if the beginning of the string does not match the regular expression, the match fails and the function returns None
- neither start matches
- re.search() only matches the first one in a line, and re.findall() matches all the matches in a line
- re.search() can print the matching results through group(), re.findall() does not have a group() method, and directly displays all the matching results in the form of a list
1.5 re.compile function
The compile function is used to compile the regular expression and generate a regular expression (Pattern) object for use by the two functions of match() and search().
The syntax format is:
re.compile(pattern[, flags])
Test Data
t1.4301.jyg.qcs.ipva.cn,0,1,1,PsARegion,4#37#6#85#272#70#268#17
t1.4301.jyg.qcs.ipva.cn,0,1,1,PsCRegion,275#94#9#105#13#147#285#140
t1.py.3895.qcs.ipva.cn,0,1,1,PsDRegion,84#86#228#88#216#138#98#139
t1.py.3895.qcs.ipva.cn,1,1,1,PsARegion,77#85#239#87#218#133#99#132
t1.8381.kf.qcs.ipva.cn,0,1,1,PsBRegion,125#145#320#146#330#207#67#210
t1.8381.kf.qcs.ipva.cn,0,1,1,PsCRegion,126#143#322#146#329#208#68#210
Requirement: Line output matching "PsARegion"
import re
f = open("/root/data.txt") # 返回一个文件对象
line = f.readline() # 调用文件的 readline()方法
text = ""
pattern = re.compile("PsARegion")
while line:
if(pattern.search(line)):
# 拼接
text += line
line = f.readline()
print(text, end='')
f.close()
Execution result: the correct match outputs two rows of data
1.6 re.sub search and replace
Python's re module provides re.sub for replacing occurrences in strings.
grammar:
re.sub(pattern, repl, string, count=0, flags=0)
Description of function parameters :
parameter |
describe |
pattern |
Pattern string in regex. |
pos |
The replacement string, which can also be a function. |
string |
The original string to be searched and replaced. |
count |
The maximum number of replacements after pattern matching, default 0 means replace all matches. |
Example:
import re
phone = "2004-959-559 # 这是一个国外电话号码"
# 删除字符串中的 Python注释
num = re.sub(r'#.*$', "", phone)
print("电话号码是: ", num)
# 删除非数字(-)的字符串
num = re.sub(r'\D', "", phone)
print("电话号码是: ", num)
Execution result :
1.7 re.split split
The split method splits the string according to the substrings that can be matched and returns a list. Its usage is as follows:
re.split(pattern, string[, maxsplit=0, flags=0])
parameter:
parameter |
describe |
pattern |
match regular expression |
string |
The string to match. |
maxsplit |
The number of splits, maxsplit=1 split once, the default is 0, unlimited times. |
flags |
The flag bit is used to control the matching mode of the regular expression, such as: whether to be case-sensitive, multi-line matching, etc. See: Regex Modifiers - Optional Flags |
Test Data:
t1.4301.jyg.qcs.ipva.cn,0,1,1,PsARegion,4#37#6#85#272#70#268#17
t1.4301.jyg.qcs.ipva.cn,0,1,1,PsCRegion,275#94#9#105#13#147#285#140
t1.py.3895.qcs.ipva.cn,0,1,1,PsDRegion,84#86#228#88#216#138#98#139
t1.py.3895.qcs.ipva.cn,1,1,1,PsARegion,77#85#239#87#218#133#99#132
t1.8381.kf.qcs.ipva.cn,0,1,1,PsBRegion,125#145#320#146#330#207#67#210
t1.8381.kf.qcs.ipva.cn,0,1,1,PsCRegion,126#143#322#146#329#208#68#21
Requirement: Regularly match the rows of "PsARegion" and take out the first two columns
import re
f = open("/root/data.txt") # 返回一个文件对象
line = f.readline() # 调用文件的 readline()方法
text = ""
pattern = re.compile("PsARegion")
while line:
if(pattern.search(line)):
dataList = re.split(",", line)
line = str(dataList[0]) + "," + str(dataList[1]) + "\n"
text += line
line = f.readline()
print(text, end='')
f.close()
Results of the:
1.8 Practical case: match files according to file name and move
move_file.py
# -*- coding:UTF-8 -*-
import logging
import os
import re
import shutil
import sys
from imp import reload
from logging.handlers import RotatingFileHandler
reload(sys)
# 初始化日志
logger = logging.getLogger('mylogger')
logger.setLevel(level=logging.INFO)
fmt = '%(asctime)s - %(pathname)s[line:%(lineno)d] - %(levelname)s: %(message)s'
format_str = logging.Formatter(fmt)
fh = RotatingFileHandler("move_file.log", maxBytes=10*1024*1024, backupCount=2,encoding="utf-8")
fh.namer = lambda x: "backup."+x.split(".")[-1]
fh.setFormatter(fmt=format_str)
logger.addHandler(fh)
def move_file(res_dir, tar_dir, pattern):
""" 文件移动
:param res_dir: 源路径
:param tar_dir: 目标路径
:param pattern: 正则匹配模式
:return:
"""
try:
logger.info("开始移动文件!")
for filename in os.listdir(res_dir):
# 获取文件的完整路径
file_path = os.path.join(res_dir, filename)
print(filename, pattern)
# 正则匹配文件名
if re.match(pattern, filename):
shutil.move(file_path, tar_dir) # 移动文件 # shutil库,它作为os模块的补充,提供了复制、移动、删除、压缩、解压
print("已移动文件【%s】" %filename)
logger.info("结束移动文件!")
except Exception as why:
print(why)
if __name__ == "__main__":
print(sys.argv)
if len(sys.argv) == 4:
move_file(sys.argv[1], sys.argv[2], sys.argv[3])
Called on Windows
python.exe D:\\IPVA\\file_move\\move_file.py D:\\IPVA\Data_Traffic\\DataServerCloud01_AlarmEvent\\ D:\\IPVA\Data_Traffic_Bak\\DataServerCloud01_AlarmEvent\\ Data.*.COMPLETED
The move was successful, and then I did a ren rename
reference:
Python Regular Expression | Novice Tutorial
Python four ways to read file content line by line - Programmer Sought