java regular expression examples

 

character

Explanation

\

The next character is marked as a special character, text, back-references or octal escape. For example, "n" matches the character "n". "\ N" matches a newline character. Sequence "\\" matches "\", "\\ (" match "(."

^

Matches the input string starting position. If the set  RegExp  object  Multiline  properties, and also the position after the ^ "\ n" or "\ r" match.

$

Matches the position of the input end of the string. If you set  RegExp  object's  Multiline  property, and also the position before the $ "\ n" or "\ r" match.

*

Zero or more times matches the preceding character or sub-expression. For example, zo * matches "z" and "zoo". * Equivalent to {0}.

+

One or more times to match the preceding character or sub-expression. For example, "zo +" and "zo" and "zoo" match, but does not match the "z". + Is equivalent to {1}.

?

Zero or one matches the preceding character or sub-expression. For example, "do (es)?" Matches the "do" or "does" in the "do". ? Is equivalent to {0,1}.

{n}

is a nonnegative integer. Exactly matching  n  times. For example, "o {2}" and "Bob" in the "o" does not match, but the two "food" in the "o" match.

{n,}

is a nonnegative integer. Matching at least  times. For example, "o {2,}" mismatch "Bob" in the "o", and match all o "foooood" in. "o {1,}" is equivalent to "o +". "o {0,}" is equivalent to "o *".

{n,m}

M  and  n  are nonnegative integers, where  n  <=  m . Matching at least  n  times at most  m  times. For example, "o {1,3}" matching "fooooood" in the first three o. 'o {0,1}' is equivalent to 'o?'. Note: You can not insert spaces between commas and numbers.

?

When this character immediately any other qualifiers (*, +,?, { N- }, { n- ,}, { n- , m }) Thereafter, when the pattern matching is "non-greedy." "Non-greedy" pattern matching to search for possible short string, and the default "greedy" pattern matching to search for possible long string. For example, the string "oooo" in, "o +?" Matches only a single "o", and "o +" match all "o".

.

Match any single character except "\ r \ n" is. To match any character including "\ r \ n", including the use mode such as "[\ s \ S]" or the like.

(pattern)

Matching  pattern  and capture subexpression of the match. You can use  $ 0 ... $ 9  properties result from "matching" to retrieve the set of matching the captured. To match parentheses characters (), use "\ (" or "\)."

(?:pattern)

Matching  pattern  sub-expression but does not capture the match, that it is a non-capturing match, not to store for later use in the match. This use "or" character | mode is useful when the combination of components (). For example, 'industr (:? Y | ies) is a ratio of' industry | expression more economical industries'.

(?=pattern)

Performing forward prediction subexpression first search, the expression matches in a match  pattern  string string starting point. It is a non-capturing match, that does not capture the match for later use. For example, 'Windows (= 95 |? 98 | NT | 2000)' match "Windows 2000" in the "Windows", but does not match the "Windows 3.1" "Windows". Lookahead do not take character, that is, after a match occurs, the next match for your search immediately after the previous match, rather than the composition of the prediction after the first character.

(?!pattern)

Subexpression perform a reverse lookahead search, which matches the expression is not in the match  pattern  starting point string search string. It is a non-capturing match, that does not capture the match for later use. For example, 'Windows (95 |?! 98 | NT | 2000)' Matching "Windows 3.1" "Windows", but does not match the "Windows 2000" in the "Windows". Lookahead do not take character, that is, after a match occurs, the next match for your search immediately after the previous match, rather than the composition of the prediction after the first character.

x | Y

Match  x  or  Y . For example, 'z | food' match "z" or "food". '(z | f) ood' match "zood" or "food".

[xyz]

character set. Matches any character included. For example, "[abc]" matches "plain" in "a".

[^xyz]

Reverse character set. Matches any character not included. For example, "[^ abc]" matches "plain" in "p", "l", "i", "n".

[a-z]

Range of characters. Matches any character within the specified range. For example, "[az]" matches "a" to any of the lowercase letters "z" range.

[^a-z]

Reverse range of characters. Matches any character not within the specified range. For example, "[^ az]" not match any "a" to any character in the "z" range.

\b

Matches a word boundary, that is, the position between a word and a space. For example, "er \ b" match "never" in "er", but does not match the "verb" in "er".

\B

Non-word boundary matching. "Er \ B" matches "verb" in "er", but does not match "never" in "er".

\cx

Match  x  control character indicated. For example, \ cM matching Control-M or carriage return. x  values must be between AZ or az. If not, it is assumed that c is "c" character itself.

\d

Numeric characters match. It is equivalent to [0-9].

\D

Non-numeric characters match. It is equivalent to [^ 0-9].

\f

In other matching page break. Equivalent to \ x0c and \ cL.

\n

Newline match. Equivalent to \ x0a and \ cJ.

\r

Matching a carriage return. Equivalent to \ x0d and \ cM.

\s

Matches any whitespace characters, including spaces, tabs, page breaks and so on. And [\ f \ n \ r \ t \ v] equivalent.

\S

Matches any non-whitespace characters. And [^ \ f \ n \ r \ t \ v] equivalent.

\t

Tabs match. And \ x09 and \ cI equivalent.

\ v

Vertical tabs match. And \ x0b and \ cK equivalent.

\w

Character class matches any character, including underscore. And "[A-Za-z0-9_]" equivalent.

\W

Matches any non-word character. And "[^ A-Za-z0-9_]" equivalent.

\xn

Matching  n , here  n  is a hexadecimal escape code. Hexadecimal escape code must be exactly two digits long. For example, "\ x41" matching "A". "\ x041" and "\ x04" & "1" equivalent. It allows the use of ASCII codes in the regular expression.

\ a

匹配 num,此处的 num 是一个正整数。到捕获匹配的反向引用。例如,"(.)\1"匹配两个连续的相同字符。

\n

标识一个八进制转义码或反向引用。如果 \n 前面至少有 n 个捕获子表达式,那么 n 是反向引用。否则,如果 n是八进制数 (0-7),那么 n 是八进制转义码。

\nm

标识一个八进制转义码或反向引用。如果 \nm 前面至少有 nm 个捕获子表达式,那么 nm 是反向引用。如果 \nm前面至少有 n 个捕获,则 n 是反向引用,后面跟有字符 m。如果两种前面的情况都不存在,则 \nm 匹配八进制值nm,其中 和 m 是八进制数字 (0-7)。

\nml

当 n 是八进制数 (0-3),m 和 l 是八进制数 (0-7) 时,匹配八进制转义码 nml

\un

匹配 n,其中 n 是以四位十六进制数表示的 Unicode 字符。例如,\u00A9 匹配版权符号 (©)。

package niuke;

import java.util.regex.Pattern;

public class MeituanTest1 {
    public static void main(String[] args) {

//        test1();
//        test2();
//        test3();
//        test4();
//        test5();
//        test6();
//        test7();
//          test8();
        test9();
    }

    //字符串中\代表转义,在正在表达式中\\相当于字符串中的一个\
    public static void test1(){
        String str="\\";
//        String patternStr="^x\\w*@tal\\w*\\.\\w*";
        String patternStr="\\\\";
        boolean result = Pattern.matches(patternStr, str);
        if (result) {
            System.out.println("字符串"+str+"匹配模式"+patternStr+"成功");
        }
        else{
            System.out.println("字符串"+str+"匹配模式"+patternStr+"失败");
        }
    }


    //正则式是最简单的能准确匹配一个给定String的模式,
    // 模式与要匹配的文本是等价的.静态的Pattern.matches方法
    // 用于比较一个String是否匹配一个给定模式.例程如下:
    public static void test2(){
        String str="java";
        String patternStr="java";
        boolean result = Pattern.matches(patternStr, str);
        if (result) {
            System.out.println("字符串"+str+"匹配模式"+patternStr+"成功");
        }
        else{
            System.out.println("字符串"+str+"匹配模式"+patternStr+"失败");
        }
    }


    //匹配连续多个字符
    public static void test3(){
        String str="jaaav";
        String patternStr="j(a*)v";
        boolean result = Pattern.matches(patternStr, str);
        if (result) {
            System.out.println("字符串"+str+"匹配模式"+patternStr+"成功");
        }
        else{
            System.out.println("字符串"+str+"匹配模式"+patternStr+"失败");
        }
    }

    //方括号中只允许的单个字符,模式“b[aeiou]n”指定,
    // 只有以b开头,n结尾,中间是a,e,i,o,u中任意一个的才能匹配上,
    // 所以数组的前五个可以匹配,后两个元素无法匹配.
    //方括号[]表示只有其中指定的字符才能匹配.
    public static void test4(){
        String[] dataArr = { "ban", "ben", "bin", "bon" ,"bun","byn","baen"};
        for (String str : dataArr) {
            String patternStr="b[aeiou]n";
            boolean result = Pattern.matches(patternStr, str);
            if (result) {
                System.out.println("字符串"+str+"匹配模式"+patternStr+"成功");
            }
            else{
                System.out.println("字符串"+str+"匹配模式"+patternStr+"失败");
            }
        }
    }

    //如果需要匹配多个字符,那么[]就不能用上了,
    // 这里我们可以用()加上|来代替,()表示一组,|表示或的关系,
    // 模式b(ee|ea|oo)n就能匹配been,bean,boon等.
    //    因此前三个能匹配上,而后两个不能.
    public static void test5(){
        String[] dataArr = { "been", "bean", "boon", "buin" ,"bynn"};
        for (String str : dataArr) {
            String patternStr="b(ee|ea|oo)n";
            boolean result = Pattern.matches(patternStr, str);
            if (result) {
                System.out.println("字符串"+str+"匹配模式"+patternStr+"成功");
            }
            else{
                System.out.println("字符串"+str+"匹配模式"+patternStr+"失败");
            }
        }
    }

    //String类的split函数支持正则表达式,上例中模式能匹配”,”,
    // 单个空格,”;”中的一个,split函数能把它们中任意一个当作分隔符,
    // 将一个字符串劈分成字符串数组.
    public static void test6(){
        String str="薪水,职位 姓名;年龄 性别";
        String[] dataArr =str.split("[,\\s;]");
        for (String strTmp : dataArr) {
            System.out.println(strTmp);
        }
    }

    public static void test7(){
        String[] dataArr = { "google", "gooogle", "gooooogle", "goooooogle","ggle"};
        for (String str : dataArr) {
            String patternStr = "g(o{2,5})gle";
            boolean result = Pattern.matches(patternStr, str);
            if (result) {
                System.out.println("字符串" + str + "匹配模式" + patternStr + "成功");
            } else {
                System.out.println("字符串" + str + "匹配模式" + patternStr + "失败");
            }
        }
    }


    public static void test8(){
        String[] dataArr = { "Tan", "Tbn", "Tcn", "Ton","Twn"};
        for (String str : dataArr) {
            String regex = "T[a-c]n";
            boolean result = Pattern.matches(regex, str);
            if (result) {
                System.out.println("字符串" + str + "匹配模式" + regex + "成功");
            } else {
                System.out.println("字符串" + str + "匹配模式" + regex + "失败");
            }
        }
    }

    //匹配以x开头包含@tal和.的字符串
    public static void test9(){
        String str="[email protected]";
        String patternStr="^x\\w*(@tal)\\w*\\.\\w*";
        boolean result = Pattern.matches(patternStr, str);
        if (result) {
            System.out.println("字符串"+str+"匹配模式"+patternStr+"成功");
        }
        else{
            System.out.println("字符串"+str+"匹配模式"+patternStr+"失败");
        }
    }
    
}

实际例子:

一行log:

121.56.62.86 - z.m.tv.sohu.com [29/May/2019:14:52:59  0800] "jsonlog={"atype":"apps","channelid":"315","cv":"7.2.0","enterid":"0","imei":"865939038074368","mfo":"vivo","mfov":"vivo Y67A","mos":"android","mosv":"6.0","msg":"imp","mtype":"6","passport":"","pro":"1","sim":"1","startid":"1559111507899","tkey":"4ff95d6e341645c9ce550d3908c130289a70a3e8","uid":"bcfefc656480478864ed429caf0fddaa","vids":[{"catecode":"101154;101147","datatype":2,"idx":"0002","mdu":"0004","memo":"{\"from_page\":\"1\",\"abmod\":\"\"}","pg":"61000","playlistid":9096123,"scn":"02","site":1,"time":1559111502980,"vid":2849803},{"catecode":"101154;101147","datatype":2,"idx":"0003","mdu":"0004","memo":"{\"from_page\":\"1\",\"abmod\":\"\"}","pg":"61000","playlistid":9096123,"scn":"02","site":1,"time":1559111502980,"vid":2849806},{"catecode":"101154;101147","datatype":2,"idx":"0004","mdu":"0004","memo":"{\"from_page\":\"1\",\"abmod\":\"\"}","pg":"61000","playlistid":9096123,"scn":"02","site":1,"time":1559111502980,"vid":2851842},{"catecode":"101154;101147","datatype":2,"idx":"0005","mdu":"0004","memo":"{\"from_page\":\"1\",\"abmod\":\"\"}","pg":"61000","playlistid":9096123,"scn":"02","site":1,"time":1559111502981,"vid":2851846},{"catecode":"101154;101147","datatype":2,"idx":"0006","mdu":"0004","memo":"{\"from_page\":\"1\",\"abmod\":\"\"}","pg":"61000","playlistid":9096123,"scn":"02","site":1,"time":1559111502981,"vid":2853761},{"catecode":"101154;101147","datatype":2,"idx":"0007","mdu":"0004","memo":"{\"from_page\":\"1\",\"abmod\":\"\"}","pg":"61000","playlistid":9096123,"scn":"02","site":1,"time":1559111502981,"vid":2853764},{"catecode":"101154;101147","datatype":2,"idx":"0001","mdu":"0002","memo":"{\"from_page\":\"1\",\"abmod\":\"\"}","pg":"61000","playlistid":9096124,"scn":"02","site":1,"time":1559111502982,"vid":2843185},{"catecode":"101154;101147","datatype":2,"idx":"0002","mdu":"0002","memo":"{\"from_page\":\"1\",\"abmod\":\"\"}","pg":"61000","playlistid":9096124,"scn":"02","site":1,"time":1559111502982,"vid":2843182},{"catecode":"101109","datatype":2,"idx":"0003","mdu":"0002","memo":"{\"from_page\":\"1\",\"abmod\":\"\"}","pg":"61000","playlistid":9098247,"scn":"02","site":1,"time":1559111502982,"vid":2887825},{"catecode":"101154;101147","datatype":2,"idx":"0001","mdu":"0004","memo":"{\"from_page\":\"1\",\"abmod\":\"\"}","pg":"61000","playlistid":9096123,"scn":"02","site":1,"time":1559111507927,"vid":2847533},{"catecode":"101154;101147","datatype":2,"idx":"0008","mdu":"0004","memo":"{\"from_page\":\"1\",\"abmod\":\"\"}","pg":"61000","playlistid":9096123,"scn":"02","site":1,"time":1559112778890,"vid":2854747}],"webtype":"WiFi"}" 204 0 "okhttp/3.12.2"
正则:
private static Pattern pattern = Pattern
    .compile("^([0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.*) - .* \\[(.*)\\] \"jsonlog=(.*)\" [0-9]{3} [0-9]{1,5} \"(.*)\"$");

 

 

发布了159 篇原创文章 · 获赞 75 · 访问量 19万+

Guess you like

Origin blog.csdn.net/xuehuagongzi000/article/details/78007650