Syntax rules for regular expressions

1. Line locators (^ and $)

  Line locators are used to describe the boundaries of strings . "^" indicates the start of a line; "$" indicates the end of a line . Such as:

  ^tm : This expression indicates that the starting position of the string tm to be matched is the head of the line, such as tm equal Tomorrow Moon can be matched

  tm$ : This expression indicates that the position to match the string tm is the end of the line, Tomorrow Moon equal tm matches.

  If the string to be matched can appear in any part of the string, it can be written directly as: tm

Second, the word delimiter (\b, \B)

  The word delimiter \b indicates that the string to be searched is a complete word. Such as: \btm\b

  There is also an uppercase \B, which means the opposite of \b. The string it matches cannot be a complete word, but part of another word or string. Such as: \Btm\B

3. Character class ([ ])

  Regular expressions are case-sensitive, if you want to ignore case, you can use the square bracket expression "[]". As long as the matched characters appear within square brackets, the match is successful. But be careful: a square bracket can only match one character. For example, to match the string tm is not case sensitive, then the expression should be written in the following format: [Tt][Mm]

  The POSIX-style predefined character classes are shown in the table:

  

Fourth, select the character (|)

   Another way to implement the above matching pattern is to use the selection character (|). This character can be understood as "or", as in the above example, it can also be written as (T|t)(M|m), which means that the expression starts with the letter T or t, followed by the letter M or m.

  The difference between using "[]" and using "|" is that "[]" can only match a single character, while "|" can match a string of any length. If you are not afraid of trouble, the above example can also be written as: TM|tm|Tm|tM

 

5. Hyphen (-)

  The naming convention for variables is that they can only start with a letter and an underscore. But this way, if you want to use a regular expression to match the first letter of a variable name, write it as: [a,b,c,d…A,B,C,D…]

  This is undoubtedly very troublesome, and regular expressions provide the hyphen "-" to solve this problem. A hyphen can represent a range of characters. The above example can be written as: [a-zA-Z]

6. Exclude characters ([^])

  The above example matches variables that conform to the naming rules. Now in turn, to match variables that do not conform to the naming rules, the regular expression provides the "^" character. This metacharacter has appeared before and indicates the start of a line. And here will be placed in square brackets, indicating the meaning of exclusion.

  For example: [^a-zA-Z], this expression matches variable names that do not start with letters and underscores.

7. Qualifier (? * + {n,m})

  For repeating letters or strings, you can use qualifiers to match. There are six main types of qualifiers, as shown in the table:

  

8. The dot character (.)

  The dot character (.) matches any character except a newline .

  Note: It is any character except the newline character. For example, it matches words that start with s, end with t, and contain a letter in the middle.

  The format is as follows: ^st$, matching words include: sat, set, sit, etc.

  As another example, match a word whose first letter is r, the third letter is s, and the last letter is t. The regular expression that can match this word is: ^rs*t$

Nine, escape character (\)

  The transfer character (\) in regular expressions is similar to that in PHP, which is to turn special characters (such as ".", "?", "\", etc.) into ordinary characters. As an example of an IP address, use a regular expression to match IP addresses of the form 127.0.0.1. If you use the dot character directly, the format is: [0-9]{1,3}(.[0-9]{1,3}){3}

  This is obviously wrong, because "." can match an arbitrary character. At this time, not only IPs such as 127.0.0.1, but also strings such as 127101011 will be matched. So when using ".", you need to use the escape character (\). The format of the above regular expression after modification is: [0-9]{1,3}(\.[0-9]{1,3}){3}

10. Backslash (\)

  In addition to being able to do escape characters, backslashes have other functions. Backslashes can display some unprintable characters, as shown in the table:

  

  You can also specify predefined character sets, as shown in the table:

  

  Another function of backslash is to define assertions, of which \b and \B have been understood, and others are shown in the table:

  

11. Bracket character (())

  小括号字符的第一个作用就是可以改变限定符的作用范围,如“|”、“*”、“^”等。来看下面的一个表达式。

  (thir|four)th,这个表达式的意思是匹配单词thirth或fourth,如果不使用小括号,那么就变成了匹配单词thir和fourth了。

  小括号的第二个作用是分组,也就是子表达式。如(\.[0-9]{1,3}){3},就是对分组(\.[0-9]{1,3})进行重复操作。后面要学到的反向引用和分组有着直接的关系。

十二、反向引用

十三、模式修饰符

  模式修饰符的作用是设定模式。也就是规定正则表达式应该如何解释和应用。

  不同的语言都有自己的模式设置,PHP中的主要模式如表所示:

  

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326468032&siteId=291194637