Summary of the classic usage of regular expressions in Java - capturing groups

Common functions of regular expressions: match, cut, replace, get (extract specified format characters from strings)

[ The use of regular expressions under the String class ]

There are several commonly used methods in the String class that involve regular expressions. as follows:

//根据正则表达式regex判断是否匹配,匹配为true 否则false
boolean  matches(String regex)

//将满足正则表达式的地方,替换为指定的字符replacement。
String   replaceAll(String regex, String replacement)

//将满足正则表达式的地方作为切分点,切分为字符数组
String[] split(String regex)

The above three methods involve matching, cutting, and replacing. String has limited functions, so there are regular expression objects Pattern and Matcher to provide more powerful functions.

[Use through regular objects]
Steps:

1. 将正则封装成对象
    Pattern p = Pattern.compile(regex);
    p.split(str)//切割
2. 通过正则对象获取匹配器对象 
    Matcher m = p.matcher(str)
3. 使用Matcher对象的方法对字符串进行操作
    //匹配判断 Attempts to match the entire region against the pattern.
    m.matches();

    //替换Replaces every subsequence of the input sequence that matches the pattern with the given replacement string.
    m.replaceAll(String replacement)

    //获取 Attempts to find the next subsequence of the input sequence that matches the pattern.
    m.find();

一般情况下,除了“获取”功能需求,其他基本使用String类的函数方法就可以了。

Let's take a look at the specific meaning:

//匹配判断 Attempts to match the entire region against the pattern.
boolean m.matches();

//替换Replaces every subsequence of the input sequence that matches the pattern with the given replacement string.
String m.replaceAll(String replacement)

//获取 Attempts to find the next subsequence of the input sequence that matches the pattern.
boolean m.find();

If the matching process fails, it will be terminated, and the fasle
cut will be returned directly. The replacement is an operation on the entire string, and all the places in the string that meet the regular conditions will be operated accordingly. That is, one call, acting on the entire string.
获取就不一样了,每调用一次,判断是否有下一个匹配的子串,返回值是boolbean类型Therefore, it is generally used in combination with whileloops, first use to boolean m.find()determine whether there is the next matching substring, and then use String m.group(), to obtain the matching substring.

[Summary of common constructions of regular expressions]

1. Character class

[abc] a、b 或 c(简单类)
[^abc] 任何字符,除了 a、b 或 c(否定)
[a-zA-Z] a 到 z 或 A 到 Z,两头的字母包括在内(范围)

2. Predefined character classes

. 任何字符(与行结束符可能匹配也可能不匹配)
\d 数字:[0-9]
\D 非数字: [^0-9]
\s 空白字符:[ \t\n\x0B\f\r]
\S 非空白字符:[^\s]
\w 单词字符:[a-zA-Z_0-9]
\W 非单词字符:[^\w]

3. Greedy Quantifier

X?       X,一次或一次也没有
X*       X,零次或多次
X+      X,一次或多次
X{n}     X,恰好 n 次
X{n,}    X,至少 n 次
X{n,m} X,至少 n 次,但是不超过 m 次

4. Boundary Matcher

^ 行的开头
$ 行的结尾
\b 单词边界
\B 非单词边界
\A 输入的开头
\G 上一个匹配的结尾
\Z 输入的结尾,仅用于最后的结束符(如果有的话)
\z 输入的结尾 

[Capturing groups in regular expressions]
In regular expressions, ( )parentheses can be used to encapsulate multiple elements into groups, and the encapsulated group can be regarded as one 大元素, so that quantifiers can be used for processing.

[ATCG]+ //[ATCG]表示匹配ATCG四个中的任意一个,[ATCG]+表示0个或多个,也就是匹配只含有A、T、G、C的字串如ATGGCTAGCGATG

(ATGC)+表示ATGC整体出现0次或多次 如ATGCATGCATGC....

In the expression ((A)(B(C))), there are four such groups:

1 ((A)(B(C)))
2 (A)
3 (B(C))
4 (C) 

During the acquisition process, you can obtain the matching result of the corresponding group by passing in the group number of m.group(int num).

public class B {
    public static void main(String[] args) {
        String pattern = "((A)(B(C)))D";
        // 创建 Pattern 对象
        Pattern r = Pattern.compile(pattern);
        // 现在创建 matcher 对象
        Matcher m = r.matcher("ABCDEABCDF");
        while(m.find()) {
            System.out.println("group(0) : " + m.group(0));// 匹配结果
            System.out.println("group(1) : " + m.group(1));// 第一个括号((A)(B(C))所匹配的内容
            System.out.println("group(2) : " + m.group(2));// 第二个括号(A)所匹配的内容
            System.out.println("group(3) : " + m.group(3));// 第三个括号(B(C))所匹配的内容
            System.out.println("group(4) : " + m.group(4));// 第三个括号(C)所匹配的内容
            System.out.println("---------------");
        } 
    }

The output is as follows

group(0) : ABCD
group(1) : ABC
group(2) : A
group(3) : BC
group(4) : C
---------------
group(0) : ABCD
group(1) : ABC
group(2) : A
group(3) : BC
group(4) : C
---------------

In addition (Ax)\\1+(BM)\\2+, \\1+it means that the same thing as the first group (Ax) appears more than once, and the same thing as the second group (BM) also appears more than once

public static void main(String[] args) {
        String pattern = "(Ax)\\1+(BM)\\2+";
        //String pattern = "(Ax)(BM)\\1+\\2+";//这样无法匹配
        //String pattern = "(Ax)(BM)\\1+\\2{1,}";//这样也无法匹配
        // 创建 Pattern 对象
        Pattern r = Pattern.compile(pattern);
        // 现在创建 matcher 对象
        Matcher m = r.matcher("AxAxBMBMBMBM");
        while(m.find()) {
            System.out.println("group(0) : " + m.group(0));// 匹配结果
            System.out.println("group(1) : " + m.group(1));// 第一个括号(Ax)所匹配的内容
            System.out.println("group(2) : " + m.group(2));// 第二个括号(BM)所匹配的内容
            System.out.println("---------------");
        } 
    }

output result

group(0) : AxAxBMBMBMBM
group(1) : Ax
group(2) : BM
---------------=


You can also use wildcards for advanced replacement in word documents. It is a very practical technique. For
details see another blog post WPS and word documents under Office. Use wildcards for advanced replacement.

When using split(), you may use $num,
such as str = str.replaceAll("(.)\\1+","$1");$1 to represent the first group of the previous parameter

public static void main(String[] args) {
        String pattern = "(.)\\1+(BB)\\2+";

        // 创建 Pattern 对象
        Pattern r = Pattern.compile(pattern);
        // 现在创建 matcher 对象
        Matcher m = r.matcher("...BBBBCC..BBBBCC");
        String res=m.replaceAll("$1$2");//替换为.BB
        System.out.println(res);
    }

output result

.BBCC.BBCC

【Example 1】

/*我我...我我...我我我我...要要要要...要要要要...
学学学学学...学学编编...编编编编..编..编...程程
...程程程——>我要学编程*/
    public class RegexTest
    {
            public static void main(String[] args){
                    test();
            }

            /*
             * 1. 治疗口吃:我我...我我...我我我我...要要要要...要要要要...学学学学学...学学编编...编编编编..编..编...程程...程程程
             */

             /*
              * 1. 治口吃
              */
              public static void test(){
                    String str = "我我...我我...我我我我...要要要要...要要要要...学学学学学...学学编编...编编编编..编..编...程程...程程程";

                    //1. 将字符串中.去掉,用替换。
                    str = str.replaceAll("\\.+","");

                    //2. 替换叠词
                    str = str.replaceAll("(.)\\1+","$1");
                    System.out.println(str);
              }
    }

【Example 2】

import java.util.TreeSet;
import java.io.PrintStream;

public class RegexTest
{
        public static void main(String[] args){
                test();
        }

        /*
         * ip地址排序。
         * 192.168.10.34 127.0.0.1 3.3.3.3 105.70.11.55
         */

        public static void test(){
                String ip_str = "192.168.10.34 127.0.0.1 3.3.3.3 105.70.11.55";

                //1. 为了让ip可以按照字符串顺序比较,只要让ip的每一段的位数相同。
                //所以,补零,按照每一位所需最多0进行补充,每一段都加两个0。

                ip_str = ip_str.replaceAll("(\\d+)","00$1");
                System.out.println(ip_str);

                //然后每一段保留数字3位。
                ip_str = ip_str.replaceAll("0*(\\d{3})","$1");
                System.out.println(ip_str);

                //1. 将ip地址切出。
                String[] ips = ip_str.split(" +");

                TreeSet<String> ts = new TreeSet<String>();

                for(String ip : ips){
                        ts.add(ip);
                }

                for(String ip : ts){
                        System.out.println(ip.replaceAll("0*(\\d+)","$1"));
                }
        }
}

regular look around

Attached is a Niu Ke exercise question
https://www.nowcoder.com/questionTerminal/758401c48ddc4deebb955821e175614d

In Java, a regular expression is used to intercept the string before the first English left parenthesis in the string. For example: Beijing (Haidian District) (Chaoyang District) (Xicheng District), the interception result is: Beijing. The regular expression is ( A )

A “.*?(?=\()”
B “.*?(?=()”
C “.*(?=\()”
D “.*(?=()”

This topic involves regular greedy and non-greedy matching and look around
. For greedy and non-greedy matching, see Regular expression greedy and non-greedy mode.
For regular look around analysis, see In -depth understanding of the concept and usage of regular expression look around

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325339966&siteId=291194637