Commonly used regular expression grammar learning

Regular expressions are a string matching tool that can help us search, obtain, and replace strings. It is a concept of computer science, not only belongs to a certain language, regular expressions can be widely used in Python, JavaScript, Java and other languages.

Table of contents

1. What is a regular expression?

2. Regular expressions in JavaScript

3. How to use regular expressions

3.1. Instance methods of regular objects

3.2. String methods

4. Use of modifier flag

5. Rules

5.1. Character classes 

5.2. Anchors

5.3. Escape characters

6. Sets and Ranges

7. Quantifiers

8. Greedy and lazy modes

9. Capture groups

9.1. As individual items in the resulting array 

9.2. Treat parentheses as a whole

9.3. Naming of capturing groups

9.4. Exclusion of capture groups

9.5. OR of capturing groups


1. What is a regular expression?

  • Regular expression (English: Regular Expression, often abbreviated as RegExp ), also known as regular expression, regular expression, regular expression, regular expression;
  • A regular expression is described by a single string, matching a series of strings that match a certain syntax rule;
  • Many programming languages ​​support string manipulation using regular expressions.

2. Regular expressions in JavaScript

In JavaScript, regular expressions are created using the RegExp class , but there are also corresponding literal methods: regular expressions are mainly composed of two parts: patterns (patterns) can also be understood as matching rules and modifiers (flags)

const re1 = new RegExp("hello", "i")
//使用RegExp类创建正则表达式 "hello"是模式 "i"是修饰符
const re2 = /hello/i
//使用字面量创建正则表达式 

3. How to use regular expressions

There are two ways to use it:

  1. You can use the instance (exec and test) methods on the regular object (RegExp);
  2. Use the (match, matchAll, replace, search, split) method of String (String), passing in a regular expression.

3.1. Instance methods of regular objects

exec

A RegExp method that performs a find match in a string, returning an array (or null if no match is found).

let message = "hello ABC, abc ,ASbc, AABC"
const re1 = /abc/ig
console.log(re1.exec(message)) 

test

A RegExp method that tests for a match in a string, returning true or false.

let message = "hello ABC, abc ,ASbc, AABC"
const re1 = /abc/ig
console.log(re1.test(message)) //true

3.2. String methods

match
A String method that performs a lookup match in a string, returning an array, or null if no match is found.
let message = "hello ABC  abc ASbc AABC"
      const re1 = /abc/ig
      const result = message.match(re1)
      console.log(result)

 

matchAll
A String method that performs a find of all matches in a string, returning an iterator.

matchAll regular must add g modifier.

let message = "hello ABC  abc ASbc AABC"
      const re1 = /abc/ig
      const result = message.matchAll(re1)
      console.log(result)

 What is returned is an iterator, which is iterable and can be traversed using the for of loop.

for (const item of result){
        console.log(item)
      }

search
A String method that tests for a match in a string, returning the index of the position where the match was found, or -1 on failure.
replace
A String method that performs a find match in a string, and replaces the matched substring with a replacement string.
split
One uses a regular expression or a fixed string to separate a string, and stores the separated substrings in an array
The String method.

4. Use of modifier flag

There are three common modifiers:

g (global) all, match all
i (ignore) ignore case
m (multiple) match multiple lines   

const re1 = /abc/ig

5. Rules

5.1. Character classes 

◼ A character class is a special symbol that matches any symbol in a particular set .

\d ("d" from "digit") Digits: characters from 0 to 9.
let message = "hello123 ABC545  abc6546 AS56346bc AA6546BC"
      const re1 = /\d/ig
      const result = message.match(re1)
      console.log(result)

 

\s ("s" from "space") space symbols: including space, tab \t, newline \n and a few other rare characters such as \v, \f and \r.
let message = "hello123 A   BC545  abc6546 AS56346bc AA6546BC"
      const re1 = /\s/ig
      const result = message.match(re1)
      console.log(result)

 

\w ("w" from "word") "Word" character: Latin letter or digit or underscore _.
let message = "hello123 A   BC545  abc6546 AS56346bc AA6546BC"
      const re1 = /\w/ig
      const result = message.match(re1)
      console.log(result)

. (dot) dot. is a special character class that matches "any character except a newline".
let message = "hello123 A   BC545  abc6546 AS56346bc AA6546BC"
      const re1 = /./ig
      const result = message.match(re1)
      console.log(result)

Inverse classes ( Inverse classes )
\D non-digit: any character except \d , such as a letter.
\S Non-space symbol: Any character other than \s, such as a letter.
\W Non-word character: Any character other than \w, such as a non-Latin letter or a space.

5.2. Anchors

The ◼ symbols ^ and $ have special meanings in regular expressions, they are called "anchor points".

The symbol ^ matches the beginning of the text;

let message = "ABcder AbCdess abkkkiruj bAcurj"
      const re1 = /^abc/ig
      const result = re1.test(message)
      console.log(result) //true

The symbol $ matches the end of the text;

let message = "ABcder AbCdess abkkkiruj bAcurj"
      const re1 = /urj$/ig
      const result = re1.test(message)
      console.log(result) //true
Word boundary ( Word boundary ) 
A word boundary \b is a check, just like ^ and $, it checks whether the position in the string is a word boundary .
let message = "ABCd ABC ABCDD ABCDF"
      const re1 = /\bABC\b/ig
      const result = message.match(re1)
      console.log(result)

 Only single ABC words will be matched, nothing else will be matched.

5.3. Escape characters

If special characters are to be used as regular characters, they need to be escaped:
    Just put a backslash ( \ ) in front of it ;
◼Common characters that need to be escaped:
    The slash symbol '/' is not a special symbol, but it also needs to be escaped in literal regular expressions;

6. Sets and Ranges

◼Sometimes we only need to select one of multiple matching characters:

  • Several characters or character classes in square brackets […] mean "search for any one of the given characters";
◼Sets _
  • For example, [eao] means to find any of the 3 characters 'a', 'e' or `'o';
let message = "abc aac aec adc"
      const re1 = /a[abed]c/ig
      const result = message.match(re1)
      console.log(result)

 

Ranges
  • Square brackets can also enclose character ranges;
  • For example, [az] will match letters in the range from a to z, and [0-5] means numbers from 0 to 5;
  • [0-9A-F] means two ranges: it searches for a character that satisfies the digits 0 to 9 or the letters A to F;
  • \d - same as [0-9];
  • \w - same as [a-zA-Z0-9_];
let message = "5a 4d 7b 1d"
      const re1 = /[0-9][a-z]/ig
      const result = message.match(re1)
      console.log(result)

7. Quantifiers ( Quantifiers )

Quantity {n}
  • Exact number of digits: {5}
let message = "aaaajccccccaaajjcccccaaaaajccaa"
      const re1 = /a{3}/ig
      const result = message.match(re1)
      console.log(result)

  • A range of digits: {3,5}
let message = "aaaajccccccaaajjcccccaaaaajccaa"
      const re1 = /a{3,5}/ig
      const result = message.match(re1)
      console.log(result)

Abbreviation:

  • +: stands for "one or more", equivalent to {1,}
let message = "aaaajccccccaaajjcccccaaaaajccaa"
      const re1 = /a+/ig
      const result = message.match(re1)
      console.log(result)

  • ?: stands for "zero or one", equivalent to {0,1}. In other words, it makes the symbol optional
  • *: represents "zero or more", equivalent to {0,}. That is, this character can appear multiple times or not

8. Greedy and lazy modes

 There is a requirement. If you want to match two books, first look at the ones that are matched by default.

let message = "两本书《一本书》和《两本书》"
      const re1 = /《.+》/ig
      const result = message.match(re1)
      console.log(result)

It can be seen that taking the "to the second book" of the first book as a whole, what we want is to match the two books.

The default matching rule is that after finding the matching content, it will continue to search backward until the last matching content is found
This matching method, we call it greedy mode (Greedy)

Quantifiers in lazy mode are the opposite of those in greedy mode.
  • As long as the corresponding content is obtained, it will not continue to match backwards;
  • We can add a question mark '?' after the quantifier to enable it;
  • So the matching pattern becomes *? or +?, or even '?' becomes ??
let message = "两本书《一本书》和《两本书》"
      const re1 = /《.+?》/ig
      const result = message.match(re1)
      console.log(result)

9. Capture groups

Parts of a pattern can be enclosed in parentheses (...), this is called a "capturing group".

This does two things:
  • It allows parts of matches as separate items in the resulting array;
  • It treats parentheses as a whole;

9.1. As individual items in the resulting array 

For example, in the above example of matching books, if we want to use "" to match the book and get the title of the book, we can use the capture group.

let message = "两本书《一本书》和《两本书》"
      const re1 = /(《)(.+?)(》)/ig
      const result = message.matchAll(re1)
      for(const item of result){
        console.log(item)
      }

We split the matches into three groups, use matchAll to return an iterator, and iterate over it.

for(const item of result){
        console.log(item[2])
      }

9.2. Treat parentheses as a whole

If you want to get more than two abc, it will only match more than two c without parentheses, because {2,} matches the nearest character by default.

let message = "abcdddabcabcccccabcaaaa"
      const re1 = /(abc){2,}/ig
      const result = message.match(re1)
      console.log(result)

After the brackets are added, the content inside the brackets is regarded as a whole. The above example matches two or more abcs.

9.3. Naming of capturing groups

In the case of taking the title of the book above, the group is recorded using numbers.

For more complex patterns, counting parentheses is inconvenient. We have a better option: give the brackets a name.
This is done by placing ?<name> immediately after the opening bracket .
let message = "两本书《一本书》和《两本书》"
      const re1 = /(?<name1>《)(?<name2>.+?)(?<name3>》)/ig
      const result = message.matchAll(re1)
      for(const item of result){
        console.log(item)
      }

This will be more convenient when fetching elements.

9.4. Exclusion of capture groups

In the above example of fetching books, if you do not want to fetch " ", you can exclude this capturing group.

Groups can be excluded by adding ?: at the beginning. 

let message = "两本书《一本书》和《两本书》"
      const re1 = /(?:《)(?<name2>.+?)(?:》)/ig
      const result = message.matchAll(re1)
      for(const item of result){
        console.log(item)
      }

9.5. OR of capturing groups

 or is a term in regular expressions, which is actually a simple "or".

  • In regular expressions, it is represented by a vertical bar | ;
  • It is usually used together with a capture group, where multiple values ​​are represented;
let message = "cbacbadddabcabccaccaabcaaaa"
      const re1 = /(abc|cba){2,}/ig
      const result = message.match(re1)
      console.log(result)

Guess you like

Origin blog.csdn.net/m0_51636525/article/details/125659329