This article will delve into a powerful tool of python: regular expressions. Regular expressions are a powerful text processing tool that can be used to match, search, replace and parse text. We will show step by step how to use regular expressions in Python, including its basic syntax, common usage, and some advanced tricks. And in the final "one more thing" section, we'll explore a little-known but very useful regular expression trick.

Simple regular expression matching

In Python, rethe module provides support for regular expressions. Let's start with the simplest character matching.

import re

# 检查字符串是否包含字母"a"
txt = "Hello, world!"
match = re.search("a", txt)
print(match)  # 输出：None，因为"a"没有在字符串中

In this example, we used re.search()the function to find if "a" is contained in a string. This is the most basic character matching, but you can already see the usefulness of regular expressions. For example, you can check whether an email address contains "@" this way.

use metacharacters

The real power of regular expressions lies in the use of metacharacters, such as ., *, ?, []and so on. The following example shows how to use .the (dot) metacharacter to match any character (except newline).

txt = "Hello, world!"
match = re.search("H.llo", txt)
print(match.group())  # 输出：Hello

In this example, .the character matches "e", making "H.llo" match "Hello".

Use a predefined character set

Sometimes we want to match a class of characters rather than a single character. For example, we might want to match any number. Python's regular expressions provide a predefined character set to achieve this function. \dto represent any number.

txt = "123 Hello, world!"
match = re.search("\d+", txt)
print(match.group())  # 输出：123

In this example, \d+a string of digits "123" is matched.

grouping and capturing

We can use parentheses ()to create subpatterns or groups and use group()methods to capture these groups.

txt = "123 Hello, world!"
match = re.search("(\d+) (Hello),", txt)
print(match.group(1))  # 输出：123
print(match.group(2))  # 输出：Hello

Use forward lookahead assertions

This is an advanced trick that allows us to match without consuming characters. For example, we might want to find all sentences that end with a period but do not contain a period.

txt = "Hello. My name is Python. Nice to meet you."
matches = re.findall(".*?(?=\\.)", txt)
for match in matches:
    print(match)  # 输出：Hello，My name is Python，Nice to meet you

In this example, .*?(?=\\.)all sentences ending with a period are matched, but the period is not consumed.

Character sets and ranges

We've discussed predefined character sets before, eg \d. But sometimes we may need a custom character set, we can use square brackets []to achieve this goal. For example, we can create a character set that contains only lowercase letters.

txt = "Hello, World!"
match = re.search("[a-z]+", txt)
print(match.group())  # 输出：ello

In this example, [a-z]+a string of consecutive lowercase letters "ello" is matched. Note that the initial letter "H" of "Hello" is not matched because it is capitalized.

Greedy and non-greedy matching

Python's regular expressions are greedy by default, which means they match as many characters as possible. But sometimes we may wish to do non-greedy matching. ?We can do this by adding a question mark after the quantifier .

txt = "12345"
match = re.search("\d+?", txt)
print(match.group())  # 输出：1

In this example, \d+?a non-greedy match is performed and only one digit "1" is matched.

zero-width assertion

Zero-width assertions allow us to place conditions between characters. For example, we can use (?<=a)bto match all occurrences of "b" after "a".

txt = "cab, dab"
matches = re.findall("(?<=a)b", txt)
for match in matches:
    print(match)  # 输出：b，b

In this example, (?<=a)ball occurrences of "b" after "a" are matched.

Use compiled regular expressions

If your program needs to use the same regular expression multiple times, you can compile it into a regular expression object. This can improve the running efficiency of the code.

pattern = re.compile("\d+")
txt = "123 Hello, world!"
match = pattern.search(txt)
print(match.group())  # 输出：123

In this example, we first compile the regular expression \d+, and then use pattern.search()the method to match.

One More Thing

So far, we have explored the basics of regular expressions in Python. But in this final "One More Thing" section, I want to share a trick that is not mentioned often, but is very useful when dealing with complex text patterns: named groups .

Named groups allow us to assign a name to the matching group and then refer to it later in the code. This is very useful when dealing with complex pattern matching.

txt = "James: 1234567890"
match = re.search("(?P<name>\w+): (?P<phone>\d+)", txt)
print(match.group('name'))  # 输出：James
print(match.group('phone'))  # 输出：1234567890

In this example, we've used named groups (?P<name>\w+)and (?P<phone>\d+)to match names and phone numbers, and used group()methods to get them.

Regular expressions are a very powerful tool, and I hope this article can help you master its usage in Python.

If it is helpful, please pay more attention to the personal WeChat public account: [Python full perspective] TeahLead_KrisChang, 10+ years of experience in the Internet and artificial intelligence industry, 10+ years of experience in technology and business team management, Tongji Software Engineering Bachelor, Fudan Engineering Management Master, Aliyun certified cloud service senior architect, head of AI product business with hundreds of millions of revenue.

The Complete Guide to Regular Expressions in Python