Foreword: I
always wanted to publish a blog post to record the use of regular expressions to facilitate future search. Because regular expressions are used frequently, it is necessary to learn regular expressions, so I publish a blog here To make an introduction at the entry level, I hope it will be useful to you. If there are mistakes, I hope to correct me, learn from each other, and communicate a lot.
1. The concept of regular expressions
The concept of regular expressions is simple: regular expressions use a single string to describe and match a series of strings that meet a certain syntactic rule .
2. Regular expression scenarios
List some scenarios where regular expressions are used:
- Batch extract / replace regular strings
- Use in various advanced text editors
- Used in various office software
- Use in various development languages (java / JS / golang / php, etc.)
- Validation of user input (IP address, special order number requirements, etc.)
- Template library tag library development
- Web crawler (development of crawling robot)
- Efficient batch text processing
3. Tool recommendation
Here I recommend a tool to use regular expressions to learn regexBuddy. This software is a fee-based software. If you need a cracked version, you can go to the following link [download and install] (http://www.ddooo.com/softdown/135270.htm) .
4. First acquaintance with regular expressions and understanding of metacharacters
The simplest regular expression exists in the command line of the Windows system or Linux system, for example: * represents a string of any length,? any string of length 1, etc.
4.1 The concept of metacharacters
Let's look at this table:
Metacharacters | Explanation |
---|---|
. | Match any character except newline |
\w | Match letters or numbers or underscores or Chinese characters |
\s | Match any whitespace |
\d | Match number |
\b | Match the beginning or end of a word |
^ | Match the beginning of the string |
$ | Match the end of the string |
4.2 Metacharacter antisense
grammar | Explanation |
---|---|
\W | Match any characters that are not letters, S numbers, underscores, Chinese characters |
\S | Match any character that is not a blank character |
\D | Match any non-digit character |
\B | Match where the word is not the beginning or end |
^x | Match any character except x |
^aeiou | Match any character except aeiou |
Note: Pay attention to the escape of characters: if we want to match characters such as dot symbol and question mark symbol, we need to escape, use escape character \ to escape, otherwise it will not match.
The above symbols can be tested and deepened by the regexBuddy software.
5. Regular expression related use
5.1 Several repeated patterns
grammar | Explanation |
---|---|
* | Repeat zero or more times |
+ | Repeat one or more times |
? | Repeat zero or one time |
{n} | Repeat n times |
{n,} | Repeat n or more times |
{n,m} | Repeat n to m times |
There is another knowledge point is the branch condition:
- Use | to separate different rules
- Test each condition from left to right. If a certain branch is satisfied, it will not care about the other conditions.
When we look at other people's regular expressions, we usually see [0-9], which is equal to \ d. [] Place the selection criteria in square brackets. Pay attention to this wording.
5.2 Grouping of regular expressions
Grouping is to make a sub-expression into a subset, we can use () to group, which is convenient for dividing the match string.
Design a concept of greed and laziness here.
Greed is: repeat as much as possible; on the contrary, laziness is: repeat as little as possible.
grammar | Explanation |
---|---|
*? | Repeat any number of times, but repeat as little as possible |
+? | Repeat 1 or more times, but repeat as little as possible |
?? | Repeat 0 or 1 times, but repeat as little as possible |
{n,m}? | Repeat n to m times, but repeat as little as possible |
{n,}? | Repeat n times or more, but repeat as little as possible |
5.3 Simple demo
1. Example 1
If we want to match ceshi cheshi or home home, what should we do?
Answer: Grouping will be used here. We can group words, and then use spaces as a distinction to match.
答案如下:
\b(?<one>\w+)\b\s+\1\b
2. Example 2
Look at the following sentence: I'm singing while you're dancing. Find the word with ing in the sentence.
Answer: Here we need to use the knowledge point of the zero-width assertion, that is,? = Exp zero-width positive prediction is the first assertion, and the expression exp can be matched after the position where it appears.
答案如下:
\b\w+(?=ing\b)
There is another kind of zero-width assertion here:? <= Exp The zero-width assertion is reviewed after the assertion that the position where it appears can match the expression exp.
For example: I'm reading a book. Find the word that starts with re from this sentence.
(?<=\bre)\w+\b
6. Summary
This blog will be updated while learning and will continue to update ...