[Getting started with regular expressions] Quick start with regular expressions

Foreword: I
always wanted to publish a blog post to record the use of regular expressions to facilitate future search. Because regular expressions are used frequently, it is necessary to learn regular expressions, so I publish a blog here To make an introduction at the entry level, I hope it will be useful to you. If there are mistakes, I hope to correct me, learn from each other, and communicate a lot.

1. The concept of regular expressions

The concept of regular expressions is simple: regular expressions use a single string to describe and match a series of strings that meet a certain syntactic rule .

2. Regular expression scenarios

List some scenarios where regular expressions are used:

  • Batch extract / replace regular strings
  • Use in various advanced text editors
  • Used in various office software
  • Use in various development languages ​​(java / JS / golang / php, etc.)
  • Validation of user input (IP address, special order number requirements, etc.)
  • Template library tag library development
  • Web crawler (development of crawling robot)
  • Efficient batch text processing

3. Tool recommendation

Here I recommend a tool to use regular expressions to learn regexBuddy. This software is a fee-based software. If you need a cracked version, you can go to the following link [download and install] (http://www.ddooo.com/softdown/135270.htm) .

4. First acquaintance with regular expressions and understanding of metacharacters

The simplest regular expression exists in the command line of the Windows system or Linux system, for example: * represents a string of any length,? any string of length 1, etc.

4.1 The concept of metacharacters

Let's look at this table:

Metacharacters Explanation
. Match any character except newline
\w Match letters or numbers or underscores or Chinese characters
\s Match any whitespace
\d Match number
\b Match the beginning or end of a word
^ Match the beginning of the string
$ Match the end of the string

4.2 Metacharacter antisense

grammar Explanation
\W Match any characters that are not letters, S numbers, underscores, Chinese characters
\S Match any character that is not a blank character
\D Match any non-digit character
\B Match where the word is not the beginning or end
^x Match any character except x
^aeiou Match any character except aeiou

Note: Pay attention to the escape of characters: if we want to match characters such as dot symbol and question mark symbol, we need to escape, use escape character \ to escape, otherwise it will not match.
The above symbols can be tested and deepened by the regexBuddy software.

5. Regular expression related use

5.1 Several repeated patterns

grammar Explanation
* Repeat zero or more times
+ Repeat one or more times
? Repeat zero or one time
{n} Repeat n times
{n,} Repeat n or more times
{n,m} Repeat n to m times

There is another knowledge point is the branch condition:

  • Use | to separate different rules
  • Test each condition from left to right. If a certain branch is satisfied, it will not care about the other conditions.

When we look at other people's regular expressions, we usually see [0-9], which is equal to \ d. [] Place the selection criteria in square brackets. Pay attention to this wording.

5.2 Grouping of regular expressions

Grouping is to make a sub-expression into a subset, we can use () to group, which is convenient for dividing the match string.
Design a concept of greed and laziness here.
Greed is: repeat as much as possible; on the contrary, laziness is: repeat as little as possible.

grammar Explanation
*? Repeat any number of times, but repeat as little as possible
+? Repeat 1 or more times, but repeat as little as possible
?? Repeat 0 or 1 times, but repeat as little as possible
{n,m}? Repeat n to m times, but repeat as little as possible
{n,}? Repeat n times or more, but repeat as little as possible

5.3 Simple demo

1. Example 1
If we want to match ceshi cheshi or home home, what should we do?
Answer: Grouping will be used here. We can group words, and then use spaces as a distinction to match.

答案如下:
\b(?<one>\w+)\b\s+\1\b

2. Example 2
Look at the following sentence: I'm singing while you're dancing. Find the word with ing in the sentence.
Answer: Here we need to use the knowledge point of the zero-width assertion, that is,? = Exp zero-width positive prediction is the first assertion, and the expression exp can be matched after the position where it appears.

答案如下:
\b\w+(?=ing\b)

There is another kind of zero-width assertion here:? <= Exp The zero-width assertion is reviewed after the assertion that the position where it appears can match the expression exp.
For example: I'm reading a book. Find the word that starts with re from this sentence.

(?<=\bre)\w+\b

6. Summary

This blog will be updated while learning and will continue to update ...

Published 197 original articles · praised 73 · 10,000+ views

Guess you like

Origin blog.csdn.net/qq_39397165/article/details/105334663