Memorize the basic elements of regular expressions

There are three common functions of regularization, which are: verify the validity of data, find text that meets the requirements, and cut and replace text.

Regular expressions, simply put, are rules for describing strings. In regular expressions, ordinary characters still represent the original meaning, such as the character a, which can match the a after H in "Hanmeimei is a girl", or the a after is, which is the same as the common one we see every day String lookups are the same.

Regex can also do functions that ordinary search and replace cannot. Its real power lies in the ability to find text that matches a certain rule.

If you want to find all the numbers in the text, if you don’t know how to regularize them, you may need to manually type the numbers, from 0 to 9, and do this 10 times, and it’s very troublesome to find them one by one. But it would be much more convenient if regular expressions are used. We can directly use \d to represent any one of the 10 numbers from 0-9.

\d{11} means that a single number appears 11 times, that is, 11 digits. If there are only names and mobile phone numbers in the text, we can use this to find out the mobile phone number in the text.

The so-called metacharacters refer to those special characters that have special meanings in regular expressions, and metacharacters are the basic components that constitute regular expressions. Regex is composed of a series of metacharacters.

1. Special single character

The dot (.) in English means any single character other than newline, \d means any single number, \w means any single number or letter or underscore, \s means any single blank character. In addition, there are three corresponding \D, \W and \S, which represent the opposite meanings to the original.

 2. Blank character

In addition to special single characters, you will definitely encounter blanks such as spaces and newlines when processing text. In fact, it is often used when writing code, newline \n, TAB tab \t, etc.

In regex, it is also similar to \n or \r to represent blank symbols, just remember them. Regular use is usually used. In most scenarios, \s can meet the needs, and \s represents any single blank symbol.

 3. Quantifiers

In regular expressions, an asterisk (*) in English represents 0 to multiple occurrences, a plus sign (+) represents 1 to multiple occurrences, a question mark (?) represents 0 to 1 occurrences, and {m,n} represents m to n occurrences.

 When we use \d+, we can match 3, but when we use \d*, we can match 6.

4. Scope

In regular expressions, there are four categories of symbols representing ranges:

 The first is the pipe symbol. We use it to separate multiple regular expressions, indicating that any one of them is satisfied. For example, ab|bc can match both ab and bc. This is very useful when there are multiple situations in the regular pattern.

Brackets [] represent multiple choices, and can represent any single character inside, so any vowel letter can be represented by [aeiou]. In addition, in the square brackets, we can also use a dash to represent the range, such as [az] can represent all lowercase letters. If the first bracket is a caret (^), it means not, and it means that it cannot be any single element inside.

For example, a resource may start with http://, or https://, or ftp://, then we can use (https?|ftp):// to represent the protocol part of the resource.

 This article is the August Day18 study notes, the content comes from Geek Time "Introduction to Regular Expressions Course", this course is recommended.

Guess you like

Origin blog.csdn.net/key_3_feng/article/details/132370720