[Java Advanced Grammar] (1) Regular Expressions: Do you really know how to use regular expressions?

1️⃣ concept

Regular expressions are a text-based pattern-matching method for searching, retrieving, and replacing one or more sequences of characters . Java provides many facilities to support regular expressions, including classes in java.util.regexthe package . In this article, we'll explain Java's regular expression syntax, common operations, and tricks in detail.

2️⃣ Grammar

Regular expressions use a specific syntax to define patterns. Following are the main elements of Java regular expression syntax −

  • Strings
    Letters, numbers, and most punctuation marks can be represented directly as strings. For example,"abcde"to represent the string "abcde";

  • Character classes
    Character classes are used to match any character in a set of characters. Among them, square brackets ([]) indicate the beginning and end of the character class. For example,[abcd]matches any one of 'a', 'b', 'c', or 'd'.
    Character classes also support range notation. For example,[0-9]represents any one of the digits 0 through 9. You can also nest character classes, for example [a-z&&[^bc]]to represent

  • Predefined character classes
    Predefined character classes are a set of already defined character classes that shorten the definition of these characters. Here are some common predefined character classes:

    • \d: Numeric characters. equivalent to [0-9];
    • \D: A non-numeric character. equivalent to [^0-9];
    • \s: whitespace character. including spaces, tabs, and newlines;
    • \S: A non-whitespace character. Do not include spaces, tabs, and newlines;
    • \w: word character. Include numbers, letters, and underscores. equivalent to [a-zA-Z0-9_];
    • \W: A non-word character. Do not include numbers, letters and underscores.
  • Quantifiers
    Quantifiers are used to specify the number of times a pattern is matched. Here are some common quantifiers:

    • *: Matches 0 or more characters preceding it. For example, a*byou can match 'b', 'ab', 'aab', etc.;
    • +: Matches 1 or more characters preceding it. For example, a+byou can match 'ab', 'aab', 'aaab', etc.;
    • ?: Matches the preceding 0 or 1 characters. For example, a?bcan match 'b' or 'ab';
    • {n,m}: Match at least n and at most m characters in front of it. For example, a{1,3}byou can match 'ab', 'aab', 'aaab';
    • {n}: Matches exactly n characters. For example, a{2}byou can match 'aab';
    • {n,}: Matches at least n characters preceding it. For example, a{2,}byou can match 'aab', 'aaab', 'aaaab', etc.

    Quantifiers can also be used for non-greedy pattern matching. Included in the regular expression ?to make it match as small a range as possible, by default it will match as large a range as possible. For example, a.+?byou can match 'ab', 'acb', 'abbb', etc.

  • Locators Locators
    specify the location of a pattern. Here are some common locators:

    • ^: Beginning of the string. For example, ^abcmatch strings starting with 'abc';
    • $: end of string. For example, abc$match strings ending with 'abc';
    • \b: Word boundaries. For example, \bfoo\byou can match 'foo' in a string, but not 'foobar' or 'football' etc.
  • Special characters
    In regular expressions, some characters have special meanings, which need to be\represented by escape characters ( ). Here are some common special characters:

    • .: matches any single character except newline;
    • |: or operator. For example, a|bcan match 'a' or 'b';
    • (): Grouping operator. A subexpression can be captured as a group and the contents of the group can be displayed. For example, (ab)*can match zero or more consecutive 'ab' strings;
    • ?imnsux: A special token used to specify a regular expression. For example, (?i)means to ignore case.

3️⃣ Java operation API

Regular Java's regular expression-based API provides many methods to implement various operations. Of these, the most common operations include match, find, and replace.

Java class method effect
Pattern boolean matches(String regex, CharSequence input) String matches, returns a boolean indicating whether it matches
Pattern compile(String regex) Compiles the given regular expression into a pattern and returns
Matcher matcher(CharSequence input) Finds a pattern in the specified input sequence. Returns a matcher object whose methods can be used to obtain information about the match
Matcher boolean find() Searches for the next substring that matches the regular expression in the target string. Returns a boolean indicating whether a matching substring was found
int start() Get the starting position of the substring in the target string
int end() Get the end position of the substring in the target string
String group() get substring
String group(int group) Retrieves a string matching a capture group. A capturing group is a subexpression within a regular expression enclosed in parentheses
String replaceFirst(String replacement) Replaces the first substring in the target string that matches the regular expression with the given string, returning a string
String replaceAll(String replacement) Replace all substrings in the target string that match the regular expression with the given string, and return the string
String String[] split(String regex) Split the string according to the pattern matched by the regular expression, and return a string array
String replaceAll(String regex, String replacement) Replaces every substring of this string matching the given regular expression with the given string
String replaceFirst(String regex, String replacement) Replaces the first substring of this string matching the given regular expression with the given string

3.1 Matching

Java's java.util.regex.Pattern class provides boolean matches(String regex, CharSequence input)methods for string matching operations. This method matches the entire string and returns a boolean indicating whether it matched.

Here is an example using matches()the method :

import java.util.regex.*;

public class RegexDemo {
    
    
    public static void main(String[] args) {
    
    
        String pattern = ".*hello.*";
        String input = "hello, world!";
        boolean isMatch = Pattern.matches(pattern, input);
        System.out.println(isMatch);
    }
}

The above code will output:

true

3.2 Search

Java's java.util.regex.Matcher class provides several methods to find substrings in a string that match a regular expression. Among them, the most commonly used methods include find()and group().

find()The method searches the current target string for the next substring that matches the regular expression, and returns a Boolean value indicating whether a matching substring is found. If found start(), end()the start and end positions of the substring can be obtained using the and methods. This substring can be accessed by calling group()the method . If find()the method has found all matching substrings, calling it again will return false.

Here is a sample code:

import java.util.regex.*;

public class RegexDemo {
    
    
    public static void main(String[] args) {
    
    
        String pattern = "you";
        String input = "How are you? How have you been?";
        
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(input);

        while (m.find()) {
    
    
            System.out.println("Found match at index " + m.start() + ": " + m.group());
        }
    }
}

The above code will output:

Found match at index 8: you
Found match at index 22: you

If you need to match multiple substrings in a pattern one by one in a certain order, you can use "capturing groups" in regular expressions. Capture groups are created using a bracket pattern, and each group is numbered with a number. For example, in the following regular expression: To(d{2}), we create a capture group and number it 1. This group will match 'To' followed by two consecutive 'd' character sequences.

A Matcher object represents the state information of a matching operation. This information can be accessed via find()the , start(), end()and group()methods . Among them, find()the method is used to find the next matching item in the target string; start()and end()the method is used to access the start and end positions of the previous matching item; group()the method is used to return the substring of the specified group in the previous matching item.

3.3 Replacement

Java's java.util.regex.Matcher class provides replaceFirst()and replaceAll()methods to perform replacement operations.

replaceFirst()The method replaces the first substring in the target string that matches the regular expression with the given replacement string, and returns the resulting string. For example:

import java.util.regex.*;

public class RegexDemo {
    
    
    public static void main(String[] args) {
    
    
        String pattern = "llo";
        String input = "hello, world!";

        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(input);

        String output = m.replaceFirst("foo");
        System.out.println(output);
    }
}

The above code will output:

hefoo, world!

replaceAll()method will use the given replacement string to replace all substrings matching the regular expression in the target string, and return the result string. For example:

import java.util.regex.*;

public class RegexDemo {
    
    
    public static void main(String[] args) {
    
    
        String pattern = "l";
        String input = "hello, world!";

        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(input);

        String output = m.replaceAll("x");
        System.out.println(output);
    }
}

The above code will output:

hexxo, worxd!

3.4 Split

The java.lang.String class of Java provides split()the method , which can split the string according to the pattern matched by the regular expression and return a string array.

Here is a sample code:

import java.util.Arrays;

public class RegexDemo {
    
    
    public static void main(String[] args) {
    
    
        String pattern = "\\s+";
        String input = "apple banana   cherry";
        
        String[] tokens = input.split(pattern);
        System.out.println(Arrays.toString(tokens));
    }
}

The above code will output:

["apple", "banana", "cherry"]

In the above code, we split the string inputby whitespace characters (including spaces, tabs, and newlines) and store the result tokensin an array.

4️⃣ Tips

In addition to basic regular expression syntax and operations, there are some practical tips that can help us better use Java's regular expression capabilities.

4.1 Precompiled regular expressions

If you need to use the same regular expression multiple times, you can precompile it into a Pattern object. This avoids having to recompile the regular expression each time it is used, which improves efficiency.

Here is a sample code using precompiled regular expressions:

import java.util.regex.*;

public class RegexDemo {
    
    
    public static void main(String[] args) {
    
    
        String pattern = "\\d+";
        String input = "123-456-7890";

        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(input);

        while (m.find()) {
    
    
            System.out.println(m.group());
        }
    }
}

The above code will output:

123 
456 
7890

4.2 Embed conditional expressions

Java's regular expressions also support ternary conditional expressions similar to other programming languages, and conditional statements can be embedded in regular expressions. This is often used to apply a different replacement string to a particular match. For example:


import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexDemo {
    
    
    public static void main(String[] args) {
    
    
        String pattern = "\\[(logged|critical)\\]: (.*?)\\n";
        String input = "[logged]: Hello, world!\n[critical]: Error: Invalid input\n";

        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(input);

        while (m.find()) {
    
    
            System.out.println("string = " + m.group());
            System.out.println("string part 1 = " + m.group(1));
            System.out.println("string part 2 = " + m.group(2));
            String output = m.replaceAll(m.group(1).equals("logged") ? "$2" : "***" );
            System.out.println("replaceAll result = " + output);
        }
    }
}

The above code will output:

string = [logged]: Hello, world!

string part 1 = logged
string part 2 = Hello, world!
replaceAll result = Hello, world!Error: Invalid input

In the above code, we have used a regular expression to match a specific pattern in a string.
The meaning of the whole regular expression is to match the text format like "[logged/critical]: (any character string, excluding newline character)\n". Among them, '[logged/critical]' will be stored in the first capture group, and '(any string, excluding newline characters)' will be stored in the second capture group.
Then, using replaceAll()the method and embedding a conditional expression, the replacement string is selected based on the value of the first capture group.

Let's analyze why this is the result: in the first loop, the branchm.group(1).equals("logged") is executed if the condition is met , which means that all matching results satisfying the regularity are replaced by the string of the second capture group, so the result after the replacement becomes: . Then when the next judgment of the while loop is made, the string has been changed at this time, and there is no longer a substring that satisfies the regularity, so the loop ends."$2"Hello, world!Error: Invalid input

4.3 Using zero-width assertions

Java's regular expressions also support a special syntax called "zero-width assertions". A zero-width assertion is a non-capturing match that is used to specify a position when matching an expression, which must meet certain conditions . Commonly used zero-width assertions include positive lookahead assertions, negative lookahead assertions, positive lookbehind assertions, and negative lookbehind assertions.

Here's an example code that uses a positive lookahead assertion:


import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexDemo {
    
    
    public static void main(String[] args) {
    
    
        String pattern = "\\w+(?=\\s+is)";
        String input = "John is a man, and Mary is a woman.";

        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(input);

        while (m.find()) {
    
    
            System.out.println(m.group());
        }
    }
}

The above code will output:

John
Mary

In the above code, we have used a regular expression to find a word in a string that is followed by a blank character and the 'is' string. where (?=\\s+is)is a positive lookahead assertion that restricts the match to be followed by whitespace and the 'is' string.

5️⃣ Application scenarios

Regular expressions are a language used to describe string patterns and have powerful matching and searching capabilities. It is widely used in many computer programs and systems, and its application scenarios include:

  • Text Search/Match : You can use regular expressions to search or match patterns in text, such as phone numbers, email addresses, or domain names, etc., according to a specific format;

  • Data verification/verification : regular expressions can be used to verify whether the input data conforms to specific formats and rules, such as password strength, date format, ID number, etc.;

  • Real-time log file analysis : You can use regular expressions to find the log data generated in real time to match the pattern that needs to be monitored, and analyze the log file content;

  • Code search and replace in editor/IDE : You can use regular expressions to perform code search and replace in editors or IDEs to improve work efficiency.

Regular expressions can span many programming languages ​​and operating systems, and mainstream programming languages ​​have built-in support for regular expressions, such as Java, Python, Perl, JavaScript, C++, C#, etc. In short, regular expressions have strong versatility and flexibility, and are often used when writing programs and processing text data.

Guess you like

Origin blog.csdn.net/LVSONGTAO1225/article/details/131191553