javase personal trash review notes 09 Java regular expressions

Java regular expression
A string is actually a simple regular expression. For example, the Hello World regular expression matches the "Hello World" string.

. (Dot) is also a regular expression, it matches any character such as "a" or "1".

The java.util.regex package mainly includes the following three categories:

Pattern class: The
pattern object is a compiled representation of a regular expression. The Pattern class has no public constructor. To create a Pattern object, you must first call its public static compilation method, which returns a Pattern object. This method accepts a regular expression as its first parameter.

Matcher class: The
Matcher object is an engine for interpreting and matching input strings. Like the Pattern class, Matcher has no public constructor. You need to call the matcher method of the Pattern object to obtain a Matcher object.

PatternSyntaxException:
PatternSyntaxException is a non-mandatory exception class, which represents a syntax error in a regular expression pattern.
Regular expressions are used in the following examples. runoob. is used to find whether the string contains runoob substrings:

import java.util.regex.*;
 
class RegexExample1{
    
    
   public static void main(String args[]){
    
    
      String content = "I am noob " +
        "from runoob.com.";
 
      String pattern = ".*runoob.*";
 
      boolean isMatch = Pattern.matches(pattern, content);
      System.out.println("字符串中是否包含了 'runoob' 子字符串? " + isMatch);
   }
}
/*实例输出结果为:

字符串中是否包含了 'runoob' 子字符串? true

Regular expression syntax
In other languages, \ means: I want to insert a normal (literal) backslash in the regular expression, please don't give it any special meaning.
In Java, \ means: I want to insert a backslash in a regular expression, so the characters after it have special meaning.

character

Description

\

Mark the next character as a special character, text, backreference, or octal escape character. For example, "n" matches the character "n". "\n" matches a newline character. The sequence "\\"match"\","\("match"(".

^

Matches the position at the beginning of the input string. If the Multiline property of the RegExp object is set, ^ will also match the position after "\n" or "\r".

$

Matches the position at the end of the input string. If the Multiline property of the RegExp object is set, $ will also match the position before "\n" or "\r".

Match the preceding character or sub-expression zero or more times. For example, zo* matches "z" and "zoo". * Equivalent to {0,}.

Match the preceding character or subexpression one or more times. For example, "zo+" matches "zo" and "zoo" but not "z". + Is equivalent to {1,}.

?

Match the preceding character or subexpression zero or one time. For example, "do(es)?" matches "do" or "do" in "does". ? Equivalent to {0,1}.

{n}

n is a non-negative integer. Match exactly n times. For example, "o{2}" does not match the "o" in "Bob", but it matches the two "o"s in "food".

{n,}

n is a non-negative integer. Match at least n times. For example, "o{2,}" does not match the "o" in "Bob", but matches all o in "foooood". "o{1,}" is equivalent to "o+". "o{0,}" is equivalent to "o*".

{n,m}

m and n are non-negative integers, where n <= m. Match at least n times and at most m times. For example, "o{1,3}" matches the first three o's in "fooooood". 'o{0,1}' is equivalent to'o?'. Note: You cannot insert spaces between commas and numbers.

?

When this character immediately follows any other qualifier (*, +,?, {n}, {n,}, {n,m}), the matching mode is "non-greedy". The "non-greedy" pattern matches the searched string as short as possible, while the default "greedy" pattern matches the searched string as long as possible. For example, in the string "oooo", "o+?" matches only a single "o", and "o+" matches all "o"s.

.

Match any single character except "\r\n". To match any character including "\r\n", use a pattern such as "[\s\S]".

(pattern)

Match pattern and capture the matched sub-expression. You can use the $0...$9 attributes to retrieve the captured matches from the result "match" set. To match bracket characters (), use "(" or ")".

(?:pattern)

Matches pattern but does not capture the matched sub-expression, that is, it is a non-capturing match and does not store the match for later use. This is useful when combining pattern parts with the "or" character (|). For example,'industr(?:y|ies) is a more economical expression than'industry|industries'.

(?=pattern)

Perform forward prediction lookahead search sub-expression that matches the string at the beginning of the string matching pattern. It is a non-capturing match, that is, a match that cannot be captured for later use. For example,'Windows (?=95|98|NT|2000)' matches "Windows" in "Windows 2000" but not "Windows" in "Windows 3.1". Prediction first does not occupy characters, that is, after a match occurs, the search for the next match immediately follows the previous match, not after the characters that make up the prediction first.

(?!pattern)

A sub-expression that performs a backward prediction lookahead search that matches a search string that is not at the starting point of the string that matches the pattern. It is a non-capturing match, that is, a match that cannot be captured for later use. For example,'Windows (?!95|98|NT|2000)' matches "Windows" in "Windows 3.1" but not "Windows" in "Windows 2000". Prediction first does not occupy characters, that is, after a match occurs, the search for the next match immediately follows the previous match, not after the characters that make up the prediction first.

x | y

Match x or y. For example,'z|food' matches "z" or "food". '(z|f)ood' matches "zood" or "food".

[xyz]

character set. Match any character contained. For example, "[abc]" matches the "a" in "plain".

[^xyz]

Reverse character set. Matches any characters that are not included. For example, "[^abc]" matches "p", "l", "i", "n" in "plain".

[a-z]

Character range. Match any character in the specified range. For example, "[az]" matches any lowercase letter in the range "a" to "z".

[^a-z]

Reverse range character. Matches any characters not in the specified range. For example, "[^az]" matches any character that is not in the range of "a" to "z".

\b

Match a word boundary, that is , the position between a word and a space . For example, "er\b" matches the "er" in "never" but not the "er" in "verb".

\B

Non-word boundary matching. "Er\B" matches the "er" in "verb", but not the "er" in "never".

\cx

Matches the control character indicated by x. For example, \cM matches Control-M or carriage return. The value of x must be between AZ or az. If this is not the case, assume that c is the "c" character itself.

\d

Numeric character matching. Equivalent to [0-9].

\D

Non-numeric characters match. Equivalent to [^0-9].

\f

Form feed matches. Equivalent to \x0c and \cL.

\n

Newline characters match. Equivalent to \x0a and \cJ.

\r

Matches a carriage return character. Equivalent to \x0d and \cM.

\s

Match any blank characters, including spaces, tabs, form feeds, etc. Equivalent to [\f\n\r\t\v].

\S

Match any non-whitespace character. Equivalent to [^ \f\n\r\t\v].

\t

Tab match. Equivalent to \x09 and \cI.

\ v

Vertical tab matching. Equivalent to \x0b and \cK.

\w

Matches any word character, including underscores. Equivalent to "[A-Za-z0-9_]".

\W

Match any non-word character. Equivalent to "[^A-Za-z0-9_]".

\xn

Match n, where n is a hexadecimal escape code. The hexadecimal escape code must be exactly two digits long. For example, "\x41" matches "A". "\x041" is equivalent to "\x04"&"1". ASCII codes are allowed in regular expressions.

\on one

Matches num, where num is a positive integer. To capture matching backreferences. For example, "(.)\1" matches two consecutive identical characters.

\n

Identifies an octal escape code or backreference. If there are at least n capturing subexpressions before \n, then n is a backreference. Otherwise, if n is an octal number (0-7), then n is an octal escape code.

\nm

Identifies an octal escape code or backreference. If there are at least nm capture subexpressions before \nm, then nm is a backreference. If there are at least n captures before \nm, then n is a backreference followed by the character m. If neither of the preceding conditions exist, \nm matches the octal value nm, where n and m are octal numbers (0-7).

\ nml

When n is an octal number (0-3), m and l are octal numbers (0-7), match the octal escape code nml.

\a

Match n, where n is a Unicode character represented by a four-digit hexadecimal number. For example, \u00A9 matches the copyright symbol (©).

How to write regular expressions in Java

  1. Just match
    1). Implementation method 1, match a number.
public void regex1() {
    
    

        //要匹配的字符

        String str = "8";

        //正则表达式

        String regex = "[0-9]";

        //返回匹配的结果,匹配成功就返回true,失败就返回false,此次匹配返回true。

        boolean flag = Pattern.matches(regex, str);

        System.out.println(flag);

}

2). Implementation method 2, match 3 to 5 letters, with no uppercase and lowercase letters, including 3 and 5.

public void regex2() {
    
    

    //要匹配的字符

    String str = "hello";

    //正则表达式

    String regex = "[a-zA-Z]{3,5}";

    //输出匹配的结果, 此次匹配返回true。

    System.out.println(str.matches(regex));

}

3). Implementation 3 (this implementation is the fastest), matching 11-digit phone numbers, matching rules: the first digit is 1, the second digit is any of 2, 3, 7, 8, and the next 9 4 is not included in the digits.

public void regex3() {
    
    

    //要匹配的字符

    String str = "13656231253";

    //正则表达式

    String regex = "1[2378][0-35-9]{9}";

    //将给定的正则表达式编译为模式。 如果匹配需求较多,且需用同相同的regex去匹配,就可将这句写到静态模块里面,用的时候直接使用实例p

    Pattern p = Pattern.compile(regex);

    //创建一个匹配器,匹配给定的输入与此模式。

    Matcher m = p.matcher(str);

    //尝试将整个区域与模式进行匹配。

    boolean flag = m.matches();

    //输出匹配结果,此次结果为true

    System.out.println(flag);

}


  1. replace.
public void regexReplace () {
    
    

    //要匹配的字符

    String str = "12a6B985Ccv65";

    //正则表达式

    String regex = "[a-zA-Z]+";

    //正则表达式

    String regex2 = "\\d+";

    //将字符串中英文字母替换为&符号,输出12&6&985&65

    System.out.println(str.replaceAll(regex, "&"));

    //将字符串中单个数字或者连续的数字替换为0,输出0a0B0Ccv0

    System.out.println(str.replaceAll(regex2,"0"));

}
  1. Cutting, cutting the string according to uppercase letters.
public void outputStr() {
    
    

    String str = "oneTtowTthreeDfourJfive";

    String regex = "[A-Z]";

    String[] arr = str.split(regex);

    for (String s: arr) {
    
    

    System.out.print(s + " ");

    }

}

输出:one tow three four five

start method
The following is an example of counting the number of occurrences of the word "cat" in the input string:

import java.util.regex.Matcher;
import java.util.regex.Pattern;
 
public class RegexMatches
{
    
    
    private static final String REGEX = "\\bcat\\b";
    private static final String INPUT =
                                    "cat cat cat cattie cat";
 
    public static void main( String args[] ){
    
    
       Pattern p = Pattern.compile(REGEX);
       Matcher m = p.matcher(INPUT); // 获取 matcher 对象
       int count = 0;
 
       while(m.find()) {
    
    
         count++;
         System.out.println("Match number "+count);
         System.out.println("start(): "+m.start());
         System.out.println("end(): "+m.end());
      }
   }
}
/*以上实例编译运行结果如下:

Match number 1
start(): 0
end(): 3
Match number 2
start(): 4
end(): 7
Match number 3
start(): 8
end(): 11
Match number 4
start(): 19
end(): 22

The matches and lookingAt methods
are used to try to match an input sequence pattern. The difference between them is that matches requires the entire sequence to match, while lookingAt does not.
Although the lookingAt method does not need to match the entire sentence, it needs to match from the first character.

These two methods are often used at the beginning of the input string.

We use the following example to explain this function:

import java.util.regex.Matcher;
import java.util.regex.Pattern;
 
public class RegexMatches
{
    
    
    private static final String REGEX = "foo";
    private static final String INPUT = "fooooooooooooooooo";
    private static final String INPUT2 = "ooooofoooooooooooo";
    private static Pattern pattern;
    private static Matcher matcher;
    private static Matcher matcher2;
 
    public static void main( String args[] ){
    
    
       pattern = Pattern.compile(REGEX);
       matcher = pattern.matcher(INPUT);
       matcher2 = pattern.matcher(INPUT2);
 
       System.out.println("Current REGEX is: "+REGEX);
       System.out.println("Current INPUT is: "+INPUT);
       System.out.println("Current INPUT2 is: "+INPUT2);
 
 
       System.out.println("lookingAt(): "+matcher.lookingAt());
       System.out.println("matches(): "+matcher.matches());
       System.out.println("lookingAt(): "+matcher2.lookingAt());
   }
}
/*以上实例编译运行结果如下:

Current REGEX is: foo
Current INPUT is: fooooooooooooooooo
Current INPUT2 is: ooooofoooooooooooo
lookingAt(): true
matches(): false
lookingAt(): false

replaceFirst and replaceAll methods The
replaceFirst and replaceAll methods are used to replace text matching regular expressions. The difference is that replaceFirst replaces the first match, and replaceAll replaces all matches.

The following example explains this function:

import java.util.regex.Matcher;
import java.util.regex.Pattern;
 
public class RegexMatches
{
    
    
    private static String REGEX = "dog";
    private static String INPUT = "The dog says meow. " +
                                    "All dogs say meow.";
    private static String REPLACE = "cat";
 
    public static void main(String[] args) {
    
    
       Pattern p = Pattern.compile(REGEX);
       // get a matcher object
       Matcher m = p.matcher(INPUT); 
       INPUT = m.replaceAll(REPLACE);
       System.out.println(INPUT);
   }
}

appendReplacement and appendTail methods The
Matcher class also provides appendReplacement and appendTail methods for text replacement:

See the following example to explain this function:

import java.util.regex.Matcher;
import java.util.regex.Pattern;
 
public class RegexMatches
{
    
    
   private static String REGEX = "a*b";
   private static String INPUT = "aabfooaabfooabfoobkkk";
   private static String REPLACE = "-";
   public static void main(String[] args) {
    
    
      Pattern p = Pattern.compile(REGEX);
      // 获取 matcher 对象
      Matcher m = p.matcher(INPUT);
      StringBuffer sb = new StringBuffer();
      while(m.find()){
    
    
         m.appendReplacement(sb,REPLACE);
      }
      m.appendTail(sb);
      System.out.println(sb.toString());
   }
}
/*以上实例编译运行结果如下:

-foo-foo-foo-kkk

Special thanks to
the above part of the code switched
Author: Vic_is_new_Here
link: https: //www.jianshu.com/p/3c076c6b2dc8
Source: Jane books

Guess you like

Origin blog.csdn.net/qq_45864370/article/details/108551823