java-Regex regular expression

There are two types of characters in regular expressions:

  • Ordinary characters are the matching characters themselves
  • Metacharacters, these characters have special meaning

Character set

Between a single character and any character, there is the concept of a character group. Any character in the matching group is represented by square brackets []

Metacharacter

  1. \
  2. . https://blog.csdn.net/qq_37335220/article/details/114166483
  3. *
  4. ?
  5. +
  6. - https://blog.csdn.net/qq_37335220/article/details/114168725
  7. ^ https://blog.csdn.net/qq_37335220/article/details/114169068
  8. {
  9. |
  10. $
  11. \A
  12. \WITH
  13. \with
  14. \b
  15. \t
  16. \n
  17. \d [0-9]
  18. \w
  19. (
  20. \s

quantifier

  1. +: Represents one or more occurrences of the preceding character, such as the regular expression ab+c, which can match abc, abbc, or abbbc.
  2. *: Indicates zero or more occurrences of the preceding character, such as the regular expression ab*c, which can match abc, ac, or abbbc.
  3. ?: Indicates that the preceding character may or may not appear. For example, the regular expression ab? c can match both abc and ac, but not abbc.

The more general syntax for representing the number of occurrences is {m, n}. There can be no spaces around the comma. The number of occurrences ranges from m to n, including m and n. If n is not limited, it can be omitted. If m is the same as n, you can write {M}, such as: ab{1,10}c means: b can appear 1 to 10 times

Grouping

Expressions can be enclosed in parentheses () to indicate a grouping, such as a(bc)d, bc is a grouping. Groups can be nested, such as a(de(fg)). The grouping has a number by default, starting from 1 and increasing from left to right according to the order in which the parentheses appear, such as the expression:

a(bc)((de)(fg))

The string abcdefg matches this expression, the first group is bc, the second is defg, the third is de, and the fourth is fg. Group 0 is a special group, the content is the entire matched string, here is abcdefg.

The substring matched by the grouping can be accessed later, as if it has been captured, so the default grouping is called the capturing grouping

Brackets [] means matching one of the characters, brackets () and metacharacter'|' together, can mean matching one of the sub-expressions, for example:

(https?|ftp|file)

In regular expressions, you can use slashes\plus group numbers to refer to previously matched groups, which is called backtracking, such as:

<(\w)>(.*)</\1>

Using numbers to reference grouping may be prone to confusion. You can name the group and refer to the previous group by name. The syntax for naming the group is (? <name>Ⅹ), and the syntax for referring to the group is \k<name>, for example, The above example can be written as:

<(? <ele>\w)>(.*)</\k<ele>>

Regular expressions are represented by strings. In Java, the character ``is also a metacharacter in string syntax, which makes the'' in regular expressions. In Java string representation, two''s are used, that is, ' \', but to match the character `` itself, in the Java string representation, 4''s are used, that is,'\\'

Guess you like

Origin blog.csdn.net/qq_37335220/article/details/114157532