Slightly more advanced usage of java regular expressions

1. Basic grammar review

symbol	meaning	Example	Example description	Other examples
`\\b`	Match the boundary of the target string, the boundary refers to the end of the string or before the space	`wang\\b`	wang matching boundary
`\\B`	Match non-boundary of target string	`wang\\B`	Match non-boundary wang
`\\d和\\D`,`\\w和\\W`	`\\d和\\D`means `[0-9]`sum `[^0-9]`, `\\w和\\W`means `[0-9a-zA-Z]`sum`[^0-9a-zA-Z]`
`	`	Match \| the expression before or after	ab\|cd	Match ab or cd
`？`				? Only the character closest to it takes effect.
`(?!)`		`a(?i)bc`	Match abc characters, abc is case-insensitive, only b is case-insensitive
``
``
``
``

Basic grammar test:

public static void main(String[] args) {
    
    
        String content = "a1_2\n1c_8挖";
//        String regStr = "[a-z]";//匹配a-z之间任意一个字符
//        String regStr = "[^a-z]";//匹配不在a-z之间任意一个字符
//        String regStr = "//D";//匹配不是数字的任意一个字符
//        String regStr = "\\w";//匹配数字、字母（大小写）、下划线的任意一个字符;等价于[a-zA-z0-9_]
//         String regStr = "[a-zA-z0-9.]";
//        String regStr = ".";//匹配除\n之外的所有任意一个字符

//        String regStr = "\\s";//匹配空白字符（包括空格、制表、回车）
//        String regStr = "[abcd]";//匹配在abcd中的任意一个字符
//        String regStr = "";//匹配abc字符,abc不区分大小写，只有b不区分大小写："a((?i)b)c"
        
        //String regStr = "\\d|挖"; //选择匹配符
        //String regStr = "_(?!.*_)"; //选择匹配符
        String regStr = "(_)(\\d)"; //选择匹配符 
        
        //创建pattern是制定大小写不敏感
        Pattern pattern = Pattern.compile(regStr/*,Pattern.CASE_INSENSITIVE*/);
        Matcher matcher = pattern.matcher(content);

        while (matcher.find()){
    
    
            System.out.println("找到字符串【"+matcher.group(0)+"】符合条件");//输出匹配到的整个字符串
            System.out.println("找到字符串【"+matcher.group(1)+"】符合条件");//输出匹配到的第一个分组
            System.out.println("找到字符串【"+matcher.group(2)+"】符合条件");//输出匹配到的第二个分组
            System.out.println("起始位置："+matcher.start());
            System.out.println("结束位置："+matcher.end());
        }
    }

//输出
找到字符串【_2】符合条件
找到字符串【_】符合条件
找到字符串【2】符合条件
起始位置：2
结束位置：4
找到字符串【_8】符合条件
找到字符串【_】符合条件
找到字符串【8】符合条件
起始位置：7
结束位置：9

2. Grouping

Grouping is divided into capturing grouping and non-capturing grouping. It will be more economical if it is not captured, so if you don’t need to capture it, you don’t need to capture it

(1) Capture grouping example

    public static void main(String[] args) {
    
    
        String content = "industry1551jkh4546askindustriesldsk4444";
        String regStr = "(?<thisaname>\\d\\d)\\d\\d";
        //创建pattern是制定大小写不敏感
        Pattern pattern = Pattern.compile(regStr/*,Pattern.CASE_INSENSITIVE*/);
        Matcher matcher = pattern.matcher(content);

        while (matcher.find()){
    
    
            System.out.println("找到："+matcher.group(0));
            System.out.println("找到[通过索引]："+matcher.group(1));
            System.out.println("找到[通过组名]："+matcher.group("thisaname"));
        }
    }
//输出
找到：1551
找到[通过索引]：15
找到[通过组名]：15
找到：4546
找到[通过索引]：45
找到[通过组名]：45
找到：4444
找到[通过索引]：44
找到[通过组名]：44

(2) Non-capturing grouping

    public static void main(String[] args) {
    
    
        String content = "industry1551jkh4546askindustriesldsk4444wwq你好，wwqhello，wwqisagunis";
      
      //（1）
        String regStr = "industr(y|ies)";//分组会被捕获
      //（2）
        //String regStr = "industr(?:y|ies)";//分组不被捕获,效果和「industr(y|ies)」一样只是不捕获
      //（3）
        //String regStr = "wwq(?=hello|is)";//分组不被捕获。只匹配包含分组条件的前面的匹配字符，此处为匹配后面为hello或者is的wwq
      //（4）
        //String regStr = "wwq(?!hello|is)";//分组不被捕获。只匹配包含分组条件的前面的匹配字符，此处为匹配后面不为hello或者is的wwq

        //创建pattern是制定大小写不敏感
        Pattern pattern = Pattern.compile(regStr/*,Pattern.CASE_INSENSITIVE*/);
        Matcher matcher = pattern.matcher(content);

        while (matcher.find()){
    
    
            System.out.println("找到："+matcher.group(0));
        }
    }
//输出
（1）对应输出：matcher.group(1)输出y或ies
  找到：industry
  找到：industries
（2）对应输出：matcher.group(1)异常
  找到：industry
  找到：industries
（3）对应输出：matcher.group(1)异常
  找到：wwq
  找到：wwq
（4）对应输出：matcher.group(1)异常
  找到：wwq

3. Greedy matching

String str="abcaxc";
Patter p="ab.*c";

默认是贪婪匹配，匹配到的字符是abcaxc

若p改为ab.*c?
则改为非贪婪匹配，匹配到的字符是abc

4. Backreference

After the content in the parentheses is captured, it can be used after the parentheses, which is called a backreference. This kind of reference can be inside or outside the expression, the internal backreference is used \\分组号, and the external backreference is used$分组号

	public static void main(String[] args) {
    
    				
				String content = "我..........我要要....要....学学学....编程!";
        String regStr = "\\.";
        //创建pattern是制定大小写不敏感
        Pattern pattern = Pattern.compile(regStr/*,Pattern.CASE_INSENSITIVE*/);

        //将...去掉
        //Matcher matcher = pattern.matcher(content);
        //content = matcher.replaceAll("");
        content = content.replaceAll(regStr, "");
        System.out.println(content);
        /*
            1. "(.)\\1+"，首先匹配出重复的字：(我)我，(要)要要，(学)学学
            2. 将其替换为分组中的字，即 我我-->我，要要要-->要，学学学-->学
         */
        String res = Pattern.compile("(.)\\1+"/*内部反向引用*/).matcher(content).replaceAll("$1"/*外部反向引用*/);
        System.out.println(res);
	}

//输出
我我要要要学学学编程!
我要学编程!

5. 0-wide assertion

The zero-width assertion is used to determine whether the condition for continuing to match is met, and it does not match the string itself

symbol	illustrate	Example	Example description
`?=exp`	Stop when exp is matched
`?!exp`	Stop when no exp is matched
`?<=exp`	Continue if exp is matched
`?<!exp`	If exp is not matched, continue

（1）?=exp

For example, to match the contents of cooking, singing, and doing except ing, only the contents of cooking, sing, and do are taken. At this time, the increment expression can be used to [a-z]*(?=ing)match

Note: The execution steps of the lookahead assertion are as follows: first find the first ing from the right end of the string to be matched (that is, the expression in the lookahead assertion) and then match the previous expression. If it cannot be matched, continue Find the second ing and match the string before the second ing. If it can match, it will match, which conforms to the greediness of the regular.

public static void main(String[] args) {
    
    
        String str = "755iwat888iwat";
        String regexp = "\\d*(?=iwat)";//匹配到exp就停止

        Matcher matcher = Pattern.compile(regexp).matcher(str);
        while (matcher.find()){
    
    
            System.out.println(matcher.group(0));
        }
    }
//输出
755
//解析，遇到第一个iwat就不再继续往下匹配了

（2）?<=exp

For example, (?<=abc).*it can match defg in abcdefg

Note: The late assertion is exactly the opposite of the look-ahead assertion. Its execution steps are as follows: first find the first abc (that is, the expression in the look-ahead assertion) from the leftmost end of the string to be matched, and then match the following Expression, if it cannot be matched, continue to search for the second abc and then match the string after the second abc. If it can be matched, it will be matched.

public static void main(String[] args) {
    
    
        String str = "755iwat888iwat999";
        String regexp = "(?<=iwat)\\d*".trim();//只匹配iwat后面的连续数字

        Matcher matcher = Pattern.compile(regexp).matcher(str);
        while (matcher.find()){
    
    
            System.out.println(matcher.group(0).trim());
        }
    }
//输出
888
999

(3) ?!exp(and ?=expthe opposite)

For example, (?!exp) represents the position before "exp". If "exp" is not true, match this position; if "exp" is true, it will not match.

Example 1:

		public static void main(String[] args) {
    
    
        String str = "755iwat888iwat888999";
        String regexp = "888(?!.*888)".trim();//最后一次出现的888
        Matcher matcher = Pattern.compile(regexp).matcher(str);
        while (matcher.find()){
    
    
            System.out.println(matcher.group(0));
            System.out.println(matcher.start());
        }
    }
//输出
888
14

Example 2:

public static void main(String[] args) {
    
    
        String str = "755iwat888iwat888999";
        String regexp = "[0-9]*(?!iwat)".trim();//匹配后面不是iwat的连续数字
        Matcher matcher = Pattern.compile(regexp).matcher(str);
        while (matcher.find()){
    
    
            System.out.println(matcher.group(0));
            //System.out.println(matcher.start());
        }
    }
//输出
75
88
888999

（4）?<!exp

And ?<=expthe opposite, put it in front and use

public static void main(String[] args) {
    
    
        String str = "755iwat888iwat888999";
        String regexp = "(?<!iwat)[0-9]*".trim();//匹配前面不是iwat的连续数字，应是755，88，88999
        Matcher matcher = Pattern.compile(regexp).matcher(str);
        while (matcher.find()){
    
    
            System.out.println(matcher.group(0));
            //System.out.println(matcher.start());
        }
    }
//输出
755
88
88999

6. Commonly used APIs

//  Pattern类的方法
0. Pattern.compile(regStr/*,Pattern.CASE_INSENSITIVE*/)//返回一个Pattern
1. Pattern.matches(regStr,str)//判断模式能否将字符串整体匹配出来，返回一个boolean
		System.out.println(Pattern.matches(regStr,content));
2. Pattern.matcher(content)//返回一个Matcher

//  Matcher类的方法
3. Matcher.group(0)//返回匹配到的第0个分组
4. matcher.start()//返回匹配到字符串的开始位置
5. matcher.end()
6. matcher.find()//是否有匹配到的字符串的部分序列
7. matcher.matches()//字符串是否从头到尾完全匹配了(matches方法会吞掉一个matcher匹配的分组)
8. matcher.replaceAll(str)//将匹配到的字符串全部替换为str，返回替换后的字符串
  

//String
9. str.replaceAll(regex,replacement)//使用replacement替换str中匹配的regex部分，返回替换后的字符串
10. str.matches(regex)//等价于Pattern.matches(regex, str)，返回boolean
11. str.split(regex)//按照匹配的regex部分对str进行分割，返回一个字符数组，返回的数组中不包括匹配到的部分