Python Advanced Series Tutorial: Python Advanced Syntax and Regular Expressions

learning target

1. Be able to master the use of the with statement

2. Be able to know the two creation methods of the generator

3. Know the difference between deep copy and shallow copy

4. Able to master the writing of regular expressions in Python

1. Python Advanced Syntax

1. with statement and context manager

☆ with statement

Python provides a with statement, which is simple and safe. When using the with statement, the file operation can be automatically called to close the file operation, even if an exception occurs, the file operation will be automatically closed. Take a chestnut:

Use the with method to implement file operations, as follows:

 
 
 
 

# 1. Open the file by writing
withopen('1.txt','w') as f:
# 2. Read the file content
f.write('hello world')

2. How the generator is created

A Python generator is a special kind of iterator that yields values ​​on demand, rather than all at once. Generators save memory space and can generate data in a lazy manner, only generating data when needed.

How to create a generator

① Generator derivation

② yield keyword

☆ Generator comprehension

Generator expressions are a list comprehension-like syntax that can be used to create generator objects. Unlike list comprehensions, generator expressions use parentheses instead of square brackets, and return a generator object instead of a list.

The following is an example of a generator expression:

 
 
 
 

# Create a generator
my_generator =(i *2for i inrange(5))
print(my_generator)

# next get the next value of the generator
# value = next(my_generator)
# print(value)

# traverse the generator
for value in my_generator:
print (value)

When using next to get the next value of the generator, ie:

 
 
 
 

# Create a generator
my_generator = (i * 2 for i in range(5))
print(my_generator)

# next get the next value of the generator
value = next(my_generator)
print(value)

This code uses a generator expression to create a generator object my_generator that generates integers from 0 to 4, multiplying each integer by 2.

Next, use the next function to get the next value of the generator object and assign it to the variable value. Since generators generate values ​​on demand, the first time the next function is called, the generator generates the first value and returns that value. The next time the next function is called, the generator generates the next value and returns that value. By analogy, until there are no more values ​​to be generated in the generator, calling the next function again will throw a StopIteration exception, indicating that the generator has no more values ​​to generate.

In this example, the first time the next function is called, the generator generates the first value 0 and returns that value. Therefore, the value of the variable value is 0, which is then printed.

 
 
 
 

# Create a generator
my_generator = (i * 2 for i in range(5))
print(my_generator)

# Traverse the generator
for value in my_generator:
print(value)

This code uses a generator expression to create a generator object my_generator that generates integers from 0 to 4, multiplying each integer by 2.

Next, use a for loop to iterate through the generator object, taking each value returned by the generator in turn, and printing it out. Since the generator generates values ​​on demand, the next value is generated each iteration, so when traversing the generator, not all values ​​are generated at once, but only when needed. In this example, during the loop, the generator will generate five values ​​of 0, 2, 4, 6, and 8 in sequence, assign these values ​​to the variable value, and then print them out. The loop ends when there are no more values ​​in the generator to generate. Because generators generate values ​​on demand, they save memory space and can generate data in a lazy fashion, only generating data when needed.

Generator related functions:

 
 
 
 

The next function gets the next value in the generator
for loops through each value in the generator

☆ yield generator

Yield Keyword Generator Features: have yield keyword in def function

 
 
 
 

defgenerator(n):
for i inrange(n):
print('start generating...')
yield i
print('finish once...')

g = generator(5)
print(next(g))
print(next (g))
print(next(g))
print(next(g))
print(next(g))-----> normal
print(next(g))-----> error
Traceback (most recent call last):
File "/Users/cndws/PycharmProjects/pythonProject/demo.py", line 14, in<module>
print(next(g))
StopIteration

This code defines a generator function named generator, whose parameter n represents the number of generated data. Inside the function, a for loop is used to generate n numbers, and each time a number is generated, the yield statement is used to return the number as the value of the generator, and the generator starts to generate... and completes once... , in order to distinguish at which stage the output of the generator was generated. The yield statement will suspend the execution of the function and return the return value as the value of the generator. The next time the next() method is called, the execution will continue from the yield statement.

In the program, first call generator(5) to create a generator g, then call the next(g) method 5 times in a row to get the next value of the generator, and each time the next() method is called, the generator will generate the next value , and returns that value. Because there are 5 numbers to be generated in the generator, 5 values ​​can be obtained normally and printed out. When the program calls the next() method for the sixth time, since there are no more values ​​to be generated in the generator, a StopIteration exception will be thrown, indicating that the generator has no more values ​​to generate. This is a default behavior of the generator. When no more values ​​can be generated, a StopIteration exception will be thrown, which can be caught in the program and processed accordingly.

 
 
 
 

defgenerator(n):
for i inrange(n):
print('start generating...')
yield i
print('finish once...')

g = generator(5)
for i in g:
print(i)

This piece of code is similar to the previous piece of code, and also defines a generator function generator to generate n numbers. The difference is that a for loop is used here to traverse the generator object g instead of continuously calling the next() method to get the next value of the generator. In the for loop, each iteration will automatically call the next() method to get the next value of the generator, and assign this value to the loop variable i, and then execute the code block in the loop body. In this example, there is only one line of code in the loop body, which prints out the value of the loop variable i.

operation result:

 
 
 
 

Start build...
0
finish once...
start build...
1
finish once...
start build...
2
finish once...
start build...
3
finish once...
start build...
4
done once...

process ended with exit code 0

Because generator(5) returns a generator object, it can be used directly in a for loop to traverse all the values ​​in the generator object. In each loop, the generator generates the next value, assigns that value to the loop variable i, and prints it. Since the generator generates values ​​on demand, in the loop process, each time a value is generated, a line will be printed to start generating... and finish once..., so as to distinguish at which stage the output of the generator is generated of. Eventually, after the loop ends, there are no more values ​​in the generator to generate, and the program ends.

 
 
 
 

defgenerator(n):
for i inrange(n):
print('start generating...')
yield i
print('finish once...')

g = generator(5)
whileTrue:
try:
print(next(g) )
except StopIteration:
break

This piece of code is similar to the previous two pieces of code, and also defines a generator function generator to generate n numbers. The difference is that here a while loop and a try...except statement are used to get the next value of the generator. In the while loop, each loop will call the next() method to get the next value of the generator and print out the value. Since the generator generates values ​​on demand, in the loop process, each time a value is generated, a line will be printed to start generating... and finish once..., so as to distinguish at which stage the output of the generator is generated of. In the try...except statement, if the generator has no more values ​​to generate, it will throw a StopIteration exception, and then use the break statement to exit the loop.

Because the generator generates values ​​on demand, use the while loop and the try...except statement to get the next value of the generator, which can catch the exception in time and exit the loop when there are no more values ​​to be generated in the generator , to avoid abnormal situations in the program. This method is more flexible, and the output of the generator can be freely controlled according to the needs, and it can also be dealt with in time when there is an abnormal situation in the generator.

important point:

① The code execution will pause until yield, and then return the result, and the next time the generator is started, it will continue to execute at the paused position

② If the generator completes the data generation and gets the next data in the generator again, a StopIteration exception will be thrown, indicating that the iteration is stopped

③ The exception operation is not handled inside the while loop, and it needs to be added manually to handle the exception operation

④ The for loop automatically handles the stop iteration exception, which is more convenient to use and is recommended for everyone to use.

☆ yield keyword and return keyword

If you don't understand yield very well, you can first look at yield as a sibling of return. They are all used in functions and perform the duty of returning some kind of result.

The difference between the two is:

The function with return directly returns all the results, the program terminates and no longer runs, and destroys the local variables;

 
 
 
 

defexample():
x =1
return x

example = example()
print(example)

The function with yield returns an iterable generator (generator) object, you can use the for loop or call the next() method to traverse the generator object to extract the result.

 
 
 
 

defexample():
x =1
y =10
while x < y:
yield x
x +=1

example = example()
print(example)

☆ Why use yield generator

 
 
 
 

import memory_profiler as mem


# nums = [1, 2, 3, 4, 5]
# print([i*i for i in nums])


nums =list(range(10000000))
print('Memory before operation:', mem .memory_usage())
# list
# square_nums = [n * n for n in nums]
# generator
square_nums = (n * n for n in nums)
print('Memory after operation:', mem.memory_usage())

This code uses the memory_profiler module to test the memory usage of the program.

First import the memory_profiler module, and then define a list nums, containing 10000000 numbers. Then the memory usage before the operation is output. Next, use a generator expression n * n for n in nums to generate a new sequence square_nums, which contains the square of each element in nums. A generator expression is a generator whose syntax is similar to a list comprehension. The difference is that it uses parentheses instead of square brackets to produce a generator object instead of a list. Generator expressions are element-wise, computed only when needed, rather than computing all elements at once and storing them in memory.

Finally, the program outputs the memory usage after the operation.

After running the code, you can see that the memory usage of the program does not change significantly after running the generator expression. This is because generator expressions generate elements on demand, and do not compute and store all elements in advance. Unlike using list comprehensions, using generator expressions can reduce the memory footprint of the program while ensuring the functionality of the program.

☆ yield and Fibonacci sequence

There is a famous Fibonacci sequence in mathematics (Fibonacci)

Requirements: The first number in the sequence is 0, the second number is 1, and each subsequent number can be obtained by adding the first two numbers:

Example: 1, 1, 2, 3, 5, 8, 13, 21, 34, ...

Now we use a generator to implement this Fibonacci sequence. Every time a value is taken, an algorithm is used to generate the next data. The generator only generates one data per call, which can save a lot of memory.

 
 
 
 

deffib(max):
n, a, b =0,0,1
while n <max:
yield b # 使用 yield
# print b
a, b = b, a + b
n = n +1

for n in fib(5):
print n

This code uses a generator function to generate the Fibonacci numbers and a for loop to output the first 5 numbers in the series.

First look at the definition of the generator function fib. This function accepts a parameter max, indicating the largest number in the sequence to be generated. Three variables n, a and b are defined inside the function, which represent the sequence number of the currently generated number, the previous number and the current number in the sequence respectively. The initial values ​​are 0, 0 and 1 respectively. Then enter a loop, each loop uses the yield statement to return the number b in the current array, and then update the values ​​​​of variables a and b to calculate the next number, and add 1 to the value of n. This loop will be executed until the value of n reaches max.

Next is the code that uses a for loop to output the first 5 numbers in the sequence. First use fib(5) to call the generator function fib to get a generator object. Then pass this generator object to the for loop, and the loop will use the next function to get the next number in the generator one by one until it loops 5 times. In each loop, use the print function to output the number n in the current array. Since the yield statement is used in the generator function, this loop will output the first 5 numbers in the sequence, namely 1, 1, 2, 3, and 5, one by one.

It should be noted that this generator function can generate a sequence infinitely, so when using a for loop to output a sequence, you need to specify the number of loops or manually terminate the loop. In addition, since this generator function uses the yield statement, only one number will be generated in each loop, instead of all the numbers in the array at once, which saves memory.

3. Deep and shallow copy

☆ several concepts

  • Variable: is an element of a system table that has a connection space pointing to an object

  • Object: A block of memory that is allocated to store the value it represents

  • Reference: is an automatically formed pointer from a variable to an object

  • Type: belongs to object, not variable

  • Immutable objects: Objects that cannot be modified once created, including numeric types, strings, Boolean types, and tuples

(The value in the memory pointed to by the object cannot be changed. When changing a variable, since the value it points to cannot be changed, it is equivalent to copying the original value and then changing it, which will open up a new address, and the variable points to this new address.)

  • Mutable objects: objects that can be modified, including lists, dictionaries, and collections

(The value in the memory pointed to by the object can be changed. After the variable (accurately speaking, a reference) is changed, the value it points to is actually changed directly, and no copying behavior occurs, and no new address is opened. In layman's terms, it is an in-place change.) When we write:

 
 
 
 

a ="python"

What the Python interpreter does:

① Create variable a

② Create an object (allocate a block of memory) to store the value 'python'

③ Connect variables and objects through pointers, and the connection from variables to objects is called reference (variable reference object)

image-20210121111247319.png

☆ assignment

Assignment: just copy the reference of the new object, and will not open up new memory space.

It does not generate an independent object to exist alone, but just puts a new label on the original data block, so when one of the labels is changed, the data block will change, and the other label will also change accordingly.

 

☆ shallow copy

Shallow copy: Create a new object whose content is a reference to the original object.

The reason why shallow copy is called shallow copy is that it only copies one layer, the outermost object itself, and the internal elements are just a copy of a reference.

Case 1: Assignment

 

Case 2: shallow copy of mutable type

 

Case 3: shallow copy of immutable type

 

Note: shallow copying of immutable types will not open up new memory space for the copied object, but only copy the reference of this object

Shallow copy has three forms: slice operation, factory function (list()), copy function in copy module.

Such as: lst = [1,2,[3,4]]

Slicing operation: lst1 = lst[:] or lst1 = [each for each in lst]

Note: [:] is similar to [0:], meaning to split from the 0 index to the end. It returns a new list.

Factory function: lst1 = list(lst)

copy function: lst1 = copy.copy(lst)

But there is a nested list[3,4] in lst, if we modify it, the situation is different.

Shallow replication should be discussed in two cases:

1) When the value of the shallow copy is an immutable object (string, tuple, numeric type), it is the same as the case of "assignment", the id value of the object_ (the id() function is used to obtain the memory address of the object) and the shallow Copy the same as the original value.

2) When the value of the shallow copy is a mutable object (list, dictionary, collection), a "not so independent object" will exist. There are two cases:

The first case: there is no complex sub-object in the copied object, the change of the original value will not affect the value of the shallow copy, and the change of the value of the shallow copy will not affect the original value. The id value of the original value is different from the original value of the shallow copy.

The second case: there are complex sub-objects in the copied object (for example, a sub-element in a list is a list), if the complex sub-object is not changed, the value change of the shallow copy will not affect the original value. But changing the value of the complex sub-object in the original value will affect the shallow copied value.

☆ deep copy

Deep copy: Corresponding to shallow copy, deep copy copies all elements of the object, including multi-level nested elements. The object that is deep copied is a brand new object that is no longer associated with the original object.

Therefore, changing the original copied object will not affect the new object that has been copied. There is only one form, the deepcopy function in the copy module.

Mutable type deep copy:

 

Deep copy of immutable type: deep copy of immutable type will not open up new memory space for the copied object, but only copy the reference of this object

☆ Case presentation

Case 1: Deep copy for mutable objects

 
 
 
 

import copy
a=[1,2,3]

print("=====赋值=====")
b=a
print(a)
print(b)
print(id(a))
print(id(b))

print("=====浅拷贝=====")
b=copy.copy(a)
print(a)
print(b)
print(id(a))
print(id(b))

print("=====深拷贝=====")
b=copy.deepcopy(a)
print(a)
print(b)
print(id(a))
print(id(b))

result:

 
 
 
 

===== Assignment =====
[1,2,3]
[1,2,3]
37235144
37235144
===== Shallow Copy =====
[1,2,3]
[1,2 ,3]
37235144
37191432
===== deep copy =====
[1,2,3]
[1,2,3]
37235144
37210184

summary:

Assignment: equal value, equal address

copy shallow copy: equal values, not equal addresses

deepcopy deep copy: value is equal, address is not equal

Case 2: Deep and shallow copy for mutable objects (outer layer changes elements)

 
 
 
 

import copy
l=[1,2,3,[4,5]]

l1=l #assignment
l2=copy.copy(l)#shallow copy
l3=copy.deepcopy(l)#deep copy
l.append(6)

print(l)
print(l1)
print(l2)
print(l3)

result:

 
 
 
 

[1,2,3,[4,5],6]#l add an element 6
[1,2,3,[4,5],6]#l1 then add an element 6
[1,2,3, [4,5]]#l2 remains unchanged
[1,2,3,[4,5]]#l3 remains unchanged

Case 3: Deep and shallow copy for mutable objects (inner layer changes elements)

 
 
 
 

import copy
l=[1,2,3,[4,5]]

l1=l #assignment
l2=copy.copy(l)#shallow copy
l3=copy.deepcopy(l)#deep copy
l[3].append (6)

print(l)
print(l1)
print(l2)
print(l3)

result:

 
 
 
 

[1,2,3,[4,5,6]]#l[3] add an element 6
[1,2,3,[4,5,6]]#l1 then add an element 6
[1,2 ,3,[4,5,6]]#l2 followed by adding an element 6
[1,2,3,[4,5]]#l3 remains unchanged

summary:

① When the outer layer adds elements, the shallow copy will not change with the original list; when the inner layer adds elements, the shallow copy will change.

② No matter how the original list changes, the deep copy remains unchanged.

③ The assignment object changes along with the original list.

2. Overview of regular expressions

1. Why learn regular expressions

In the actual development process, there is often a need to find strings that meet certain complex rules, such as: email address, picture address, mobile phone number, etc. At this time, if you want to match or find strings that meet certain rules, you can use regular expressions

image-20210118135358176.png

In the process of practical application, it can play an important role in text processing, data analysis, web crawler, data cleaning and other fields. Here are a few reasons to learn regular expressions:

  • Text processing: In text processing, we need to perform operations such as searching, replacing, and splitting text, and regular expressions can help us complete these tasks quickly and accurately. For example, we can use regular expressions to search for text that contains a specific word or phrase, or to replace some characters in a piece of text with other characters. 

  • Data analysis: In data analysis, we need to perform operations such as cleaning, extracting, and converting data, and regular expressions can help us complete these tasks quickly and efficiently. For example, we can use regular expressions to extract numbers, dates, phone numbers and other information in a piece of text, or convert some data formats. 

  • Web crawler: In web crawlers, we need to parse, extract, filter and other operations on web pages, and regular expressions can help us complete these tasks quickly and accurately. For example, we can use regular expressions to extract information such as links, pictures, and videos in web pages, or to filter out web pages that meet certain conditions. 

  • Programming language: In programming language, regular expression is also a commonly used tool, which can help us perform string matching, replacement, segmentation and other operations. For example, in Python, we can use the re module to use regular expressions for string manipulation. 

To sum up, learning regular expressions can help us improve work efficiency in text processing, data analysis, web crawlers, programming and other fields, and it is also one of the necessary skills for programmers.

2. What is a regular expression

A regular expression describes a string matching pattern, which can be used to check whether a string contains a certain substring, replace the matched substring, or extract a substring that meets a certain condition from a string wait.

Mode: A specific string mode, which is composed of some special symbols.

Some kind: It can also be understood as a kind of fuzzy matching.

Exact match: select * from blog where title='python';

Fuzzy matching: select * from blog where title like '%python%';

Regular expressions are not unique to Python. Regular expressions are supported in languages ​​such as Java, PHP, Go, and JavaScript.

3. Functions of regular expressions

① Data verification (form verification, such as mobile phone, email, IP address)

② Data retrieval (data retrieval, data capture)

③ Data hiding (1356235 Mr. Wang)

④ Data filtering (forum sensitive keyword filtering) …

3. Introduction of re module

1. What is the re module

When you need to match strings through regular expressions in Python, you can use a re module

2. The re module uses three steps

 
 
 
 

# Step 1: Import re module
import re
# Step 2: Use the match method for matching operation
result = re.match(pattern regular expression, string to match, flags=0)
# Step 3: If the data If the match is successful, use the group method to extract the data
result.group()

Match function parameter description:

parameter

describe

pattern

match regular expression

string

The string to match.

flags

The flag bit is used to control the matching mode of the regular expression, such as: whether to be case-sensitive, multi-line matching, etc. See: Regex Modifiers - Optional Flags

The re.match method returns a matching object if the match is successful, otherwise it returns None.

We can use group(num) or groups() match object function to get match data.

A regular expression can contain some optional flag modifiers to control the pattern matched. Modifiers are specified as an optional flag. Multiple flags can be specified by bitwise OR(|) them. If re.I | re.M is set to the I and M flags:

Modifier

describe

re.I

Make matching case insensitive

re.L

Do localization recognition (locale-aware) matching, this function is to support the multi-language version of the character set environment, such as the escape character \w, in the English environment, it represents [a-zA-Z0-9_] , that is, all English characters and numbers. If used in a French environment, by default, cannot match "é" or "ç". Add this L option and you can match. However, this does not seem to be useful for the Chinese environment, it still cannot match Chinese characters.

re.M

multiline match, affects ^ and $

re.S

make . match all characters including newlines

re.U

Parse characters according to the Unicode character set. This flag affects \w, \W, \b, \B.

re.X

VERBOSE, redundant mode, this mode ignores the blanks and # comments in the regular expression, such as writing a regular expression that matches mailboxes. This flag allows you to write regular expressions that are easier to understand by giving you more flexible formatting.

3. Related methods of re module

☆ re.match(pattern, string, flags=0)

  • Match from the beginning of the string, if the match is successful, return the matching content, otherwise return None

Parameter Description:

  • pattern: The regular expression that needs to be matched.

  • string: The string to match.

  • flags: optional parameter, used to control how the regular expression is matched.

The re.match() function will match the regular expression from the beginning of the string. If the match is successful, it returns a Match object, otherwise it returns None. The Match object contains the matching result information, which can be obtained by calling the method of the Match object.

It should be noted that re.match() will only match the beginning of the string. If you need to match the regular expression in the entire string, you can use the re.search() function.

For example, the following code demonstrates how to use the re.match() function to match a regular expression from the beginning of a string:

 
 
 
 

import re

string = "hello, world"
pattern = r"hello"

match_obj = re.match(pattern, string)
if match_obj:
print("Match successful")
else:
print("Match failed")

The output is:

 
 
 
 

successful match

In the above code, we use the re.match() function to match a regular expression from the beginning of the string. Since the beginning of the string is "hello", the match is successful.

☆ re.findall(pattern, string, flags=0)

Parameter Description:

  • pattern: The regular expression that needs to be matched.

  • string: The string to match.

  • flags: optional parameter, used to control how the regular expression is matched.

The re.findall() function searches a string for a regular expression and returns all matching results. The return result is a list, and each element in the list is a matching result.

For example, the following code demonstrates how to use the re.findall() function to search for numbers in a string:

 
 
 
 

import re

string ="I have 2 apples and 3 oranges"
pattern =r"\d+"

result = re.findall(pattern, string)
print(result)

The output is:

 
 
 
 

['2','3']

In the above code, we use the re.findall() function to search for numbers in the string, and since the string contains the two numbers 2 and 3, a list containing these two numbers is returned.

☆ re.finditer(pattern, string, flags)

  • The function is the same as findall above, but it returns an iterator

Parameter Description:

  • pattern : pattern string.

  • repl : The replacement string, which can also be a function.

  • string : The original string to be searched and replaced.

  • count : The maximum number of replacements after pattern matching, default 0 means replace all matches.

  • flags: Matching method:

  • re.I makes matching case-insensitive, and I stands for Ignore to ignore case

  • re.S makes . match all characters including newlines

  • re.M multi-line mode, will affect ^, $

The re.finditer() function searches a string for a regular expression and returns all matching results. The return result is an iterator, and the matching results can be obtained one by one through the iterator.

For example, the following code demonstrates how to use the re.finditer() function to search for numbers in a string:

 
 
 
 

import re

string ="I have 2 apples and 3 oranges"
pattern =r"\d+"

iter_obj = re.finditer(pattern, string)
for match_obj in iter_obj:
print(match_obj.group())

The output is:

 
 
 
 

2
3

In the above code, we use the re.finditer() function to search for numbers in the string, and since the string contains the two numbers 2 and 3, an iterator containing these two numbers is returned. We traverse the iterator through the for loop, get the matching results one by one, and print them out.

4. Quick Start with Regular Expressions

Case 1: Find whether there is a number "8" in a string

 
 
 
 

import re


result = re.findall('8','13566128753')
# print(result)
if result:
print(result)
else:
print('No data was matched')

Case 2: Find out if there are numbers in a string

 
 
 
 

import re


result = re.findall('\d','a1b2c3d4f5')
# print(result)
if result:
print(result)
else:
print('No data was matched')

Case 3: Find whether there is a non-number in a string

 
 
 
 

import re


result = re.findall('\D','a1b2c3d4f5')
# print(result)
if result:
print(result)
else:
print('No data was matched')

4. Detailed explanation of regular expressions

Regular writing in three steps: what to check, how much to check, and where to check

Regular expressions are usually composed of two parts of data: ordinary characters and metacharacters Ordinary characters: 0123456789abcd@... Metacharacters: symbols unique to regular expressions => [0-9], ^, *, +, ?

1. What to check

the code

Function

. (English dot)

Match any one character (except \n)

[ ]

Match one of the characters listed in [ ], professional noun => character cluster

[^specified character]

Match a character other than the specified character, ^professional noun=>tobyte

\d

Match digits, i.e. 0-9

\D

matches a non-digit, i.e. not a digit

\s

Match blanks, ie spaces, tab keys

\S

match non-blank

\w

Matches non-special characters, i.e. az, AZ, 0-9, _

\W

Match special characters, i.e. non-alphabet, non-digit, non-underscore

Common writing of character clusters:

① [abcdefg] means match any character in abcdefg characters (1)

② [aeiou] means to match any one of the five characters a, e, i, o, u

③ [az] means match any one of the 26 characters between az

④ [AZ] means match any one of the 26 characters between AZ

⑤ [0-9] means match any one of the 10 characters between 0-9

⑥ [0-9a-zA-Z] means match any character between 0-9, az, AZ

The combination of character cluster + toy byte represents the meaning of inversion:

① [^aeiou] means match any character except a, e, i, o, u

② [^az] means match any character except az

\d is equivalent to [0-9], which means match any number between 0-9

\D is equivalent to [^0-9], which means matching non-digit characters, only one can be matched

2. Check how much

the code

Function

*

Match the previous character 0 times or unlimited times, it can be optional (0 to many)

+

Match the previous character 1 time or unlimited times, that is, at least 1 time (1 to many)

?

Match the previous character 1 or 0 times, that is, either 1 or none (0 or 1)

{m}

Match the previous character for m times, match the mobile phone number\d{11}

{m,}

Match the previous character at least m times, \w{3,}, means that the previous character must appear at least 3 times, and can be unlimited times at most

{m,n}

Match the previous character from m to n times, \w{6,10}, which means the previous character appears 6 to 10 times

Basic syntax: Regular matching characters. Or \w or \S + follow up to check how many times such as \w{6, 10} such as .*, match the previous character 0 or more times

3. Where to check

the code

Function

^

match starts with a string

$

match ends with a string

Extension: Regular Toolbox

https://c.runoob.com/front-end/854/

https://c.runoob.com/front-end/7625/#!flags=&re=1[3-9]\d{9}

Crawler => xpath, specialized in crawlers

crawler + regular

reptile+xpath

Focus on learning SQL

Five, several important concepts

1. Subexpression (also known as grouping)

In regular expressions, the content enclosed by a pair of parentheses is called a "subexpression".

 
 
 
 

re.search(r'\d(\d)(\d)','abcdef123ghijklmn')

Note: The r before the Python regular expression represents a raw string (rawstring), which declares that the content in the quotation marks represents the The original meaning of the content, avoiding the backslash trouble caused by multiple escapes.

In \d\d\d in the regular expression, (\d)(\d) is a subexpression, and there are two () parentheses, which represent two subexpressions

Note: The findall method, if there is a group in the pattern, returns a list that matches the group, so the findall method is not suitable for the group operation, and it is recommended to use the search (match one) or finditer (match multiple) methods.

2. Capture

When the regular expression matches the corresponding content in the string, the computer system will automatically put the content matched by the subexpression into the corresponding buffer area of ​​the system (the buffer area starts from $1)

Case presentation:

 
 
 
 

import re


# Match two consecutive identical words in the string
str1 ='abcdef123ghijklmn'
result = re.search(r'\d(\d)(\d)', str1)
print(result.group())
print(result. group(1))
print(result. group(2))

3. Back reference (back reference)

In regular expressions, we can refer to the content in the buffer area by \n (n represents the number of the nth buffer area), and we call this process "reverse reference".

① 4 consecutive numbers re.search(r'\d\d\d\d, str1)

1234、5678、6789

② 4 consecutive numbers, but the format of the numbers is 1111, 2222, 3333, 4444, 5555?

re.search(r'(\d)\1\1\1, str1)

4. A few practice questions

① Find four consecutive numbers, such as: 3569

Answer: \d\d\d\d or \d{4}

② Find the same four consecutive numbers, such as: 1111

Answer: (\d)\1\1\1

③ Find numbers, such as: 1221,3443

Answer: (\d)(\d)\2\1 The first () is placed in buffer 1, if you want to refer to \1 The second () is placed in buffer 2, if you want to refer to \2

④ Find characters, such as: AABB,TTMM (hint: AZ, regular: [AZ])

答:([A-Z])\1([A-Z])\2

⑤ Find the same four consecutive numbers or four characters (hint: \w)

答:(\w)\1\1\1 1111 aaaa bbbb

6. Other methods of regular expressions

1. Select a matching character

|Multiple rule cases can be matched: match the string hellojava or hellopython

 
 
 
 

import re


str='hellojava, hellopython'
result = re.finditer(r'hello(java|python)',str)
if result:
for i in result:
print(i.group())
else:
print('not matched to any data')

2. Group alias

the code

Function

(?P)

group by alias

(?P=name)

Quote the string matched by the alias name group

Case: match

 
 
 
 

# 导入模块
import re

str1 ='<book></book>'
result = re.search(r'<(?P<mark>\w+)></(?P=mark)>', str1)

print(result.group())

3. Comprehensive case

①Requirement: In the list ["apple", "banana", "orange", "pear"], match apple and pear

 
 
 
 

import re

list1 =["apple","banana","orange","pear"]
str1 =str(list1)
result = re.finditer('(apple|pear)', str1)
if result:
for i in result:
print(i.group())
else:
print('未匹配到任何数据')

② 需求:匹配出163、126、qq等邮箱

 
 
 
 

import re

email ='[email protected], [email protected], [email protected]'
result = re.finditer('\w+@(qq|126|163).com', email)
if result:
for i in result:
print(i.group())
else:
print('未匹配到任何数据')

③需求 :  匹配qq:10567这样的数据,提取出来qq文字和qq号码

 
 
 
 

import re

str1 ='qq:10567'
result = re.split(r':', str1)
if result:
print(f'{result[0]}号:{result[1]}')
else:
print('未匹配到任何数据')

Guess you like

Origin blog.csdn.net/Blue92120/article/details/131332545