Simple understanding of regular expressions

1. Introduction

Simply put, regular expressions are a powerful tool that can be used for pattern matching and replacement. We can find regular expressions in almost all tools based on UNIX systems, such as the vi editor, Perl or PHP scripting languages, and awk or sed shell programs. In addition, client-side scripting languages ​​like JavaScript also provide support for regular expressions. It can be seen that regular expressions have exceeded the limitations of a certain language or a certain system, and become a widely accepted concept and function.

Regular expressions allow users to construct a matching pattern by using a series of special characters, and then compare the matching pattern with target objects such as data files, program input, and form input on WEB pages. According to whether the comparison object contains the matching pattern, execute the corresponding program of.
  For example, one of the most common applications of regular expressions is to verify whether the format of the email address entered by the user online is correct. If the regular expression verifies that the user’s email address is in the correct format, the form information filled in by the user will be processed normally; on the contrary, if the email address entered by the user does not match the regular expression pattern, a prompt message will pop up, asking the user to re- Enter the correct email address. This shows that regular expressions play a pivotal role in the logical judgment of WEB applications.

2. Basic Grammar Usage

/hard/

The part between the "/" delimiters is the pattern to be matched in the target object. The user only needs to put the content of the pattern that he wants to find a matching object between the "/" delimiters. In order to enable users to customize the content of the mode more flexibly, regular expressions provide special "metacharacters". The so-called metacharacters refer to special characters that have special meaning in regular expressions, and can be used to specify the appearance mode of their leading characters (that is, the characters before the metacharacter) in the target object.

The more commonly used metacharacters include: "+", "*", and "?".

a. The "+" metacharacter stipulates that its leading character must appear one or more consecutive times in the target object
b. The "*" metacharacter stipulates that its leading character must appear zero or more consecutive times in the target object
c. "?" The metacharacter stipulates that its leading object must appear zero or one consecutive time in the target object

Let's see how to use metacharacters

/ vo + /

Because the above regular expression contains the "+" meta-character, it means that it can match the string of "vool", "vo", or "vootball" in the target object with one or more letters o consecutively appearing after the letter f .

/of*/

Because the above regular expression contains the "*" metacharacter, it means that it can be the same as the string of "asy", "ago", or "agg" in the target object with zero or more consecutive letters g after the letter e match.
/Wil?/
Because the above regular expression contains the "?" metacharacter, it means that it can be matched with the "Win" or "Wilson" in the target object, and the string of zero or one letter l appears after the letter i. match.

In addition to metacharacters, users can also specify exactly how often the pattern appears in the matched object. For example,
/jim{2,6}/ the
above regular expression stipulates that the character m can appear 2-6 times in the matching object. Therefore, the above regular expression can match with strings such as jimmy or jimmmmmy.

After having a preliminary understanding of how to use regular expressions, let's take a look at the use of several other important metacharacters.
\s: used to match a single space character, including tab and newline;
\S: used to match all characters except a single space character;
\d: used to match numbers from 0 to 9;
\w: used
Used to match letters, numbers or underscore characters; \W: used to match all characters that do not match \w;
.: used to match all characters except newline characters.

(Note: We can regard \s and \S as well as \w and \W as the inverse operations of each other)

In addition to the meta-characters we introduced above, regular expressions also have another more unique special character, that is, the locator. The locator is used to specify where the matching pattern appears in the target object.

The more commonly used locators include: "^", " KaTeX parse error: Undefined control sequence: \b at position 5:", "\̲b̲" and "\B". Among them, the "^... " locator specifies that the matching pattern must appear at the end of the target object, the \b locator specifies that the matching pattern must appear at the beginning or the end of the target string, and the "\B" locator It is stipulated that the matching object must be located within the two boundaries between the beginning and the end of the target string, that is, the matching object can neither be used as the beginning or the end of the target string. Similarly, we can also regard "^" and "$" and "\b" and "\B" as two sets of locators that are mutually inverse operations. for example:

/^yoll/
Because the above regular expression contains the "^" locator, it can match the string beginning with "yell", "yello" or "yellhound" in the target object.

/br / because the above regular expression contains "/ because the above regular expression contains"/ Due to the said positive the table of the formula in the package containing " " locator can be targeted in order to "match CBR", "BBR" or "br" end of the string.

/\bbom/
Because the above regular expression pattern starts with the "\b" locator, it can match the string starting with "bomb", or "bom" in the target object.

/man\b/
Because the above regular expression pattern ends with the "\b" locator, it can match the string ending with "human", "woman" or "man" in the target object.

In order to facilitate users to set the matching mode more flexibly, regular expressions allow users to specify a certain range in the matching mode without being limited to specific characters. E.g:

/[AZ]/ The
above regular expression will match any uppercase letter from A to Z.

/[az]/ The
above regular expression will match any lowercase letter from a to z.

/[0-9]/ The
above regular expression will match any number in the range from 0 to 9.

/([az][AZ][0-9])+/ The
above regular expression will match any character string composed of letters and numbers, such as "aB0". One thing to remind users here is that you can use "()" in regular expressions to combine strings. The content contained in the "()" symbol must also appear in the target object. Therefore, the above regular expression will not match a string such as "abc", because the last character in "abc" is a letter instead of a number.

If we want to implement the "or" operation similar to programming logic in regular expressions, and choose one of multiple different patterns for matching, we can use the pipe character "|". E.g:

/to|too|2/The
above regular expression will match the "to", "too", or "2" in the target object.

There is also a more commonly used operator in regular expressions, namely the negation "[^]". Unlike the locator "^" we introduced earlier, the negation "[^]" stipulates that the string specified in the pattern cannot exist in the target object. E.g:

/[^AC]/ The
above string will match any character except A, B, and C in the target object. Generally speaking, when "^" appears in "[]", it is regarded as a negation operator; when "^" is outside of "[]", or there is no "[]", it should be regarded as Locator.

Finally, when users need to add metacharacters to the regular expression pattern and find their matching objects, they can use the escape character "\". E.g:

/Th*/ The
above regular expression will match "Th*" instead of "The" in the target object.

Guess you like

Origin blog.csdn.net/weixin_43465609/article/details/107908059