Java Basics--Regular Expressions

Table of contents

Common metacharacters are as follows

Pattern class

Matcher class

example

PatternSyntaxException class

example


In the process of program development, we will impose various restrictions on some strings, such as our common registered email address, mobile phone number and other operations, generally limit the length and format. And these restrictive operations are done with regular expressions. The so-called regular expression refers to a single character string used to describe or match a series of character strings that meet certain grammatical rules. In fact, it is a rule.

A regular expression is a text pattern composed of ordinary characters (such as the character az) and special characters (metacharacters). The character preceding the metacharacter) occurrence pattern in the target object.

Common metacharacters are as follows

metacharacter

Functional description

\

Escape character, for example "\n" matches "\n"

^

Beginning flag of regular expression

$

end of regular expression

*

matches zero or more times

+

matches one or more times

?

matches one or zero times

.

matches any character

{n}

match n times

{n,}

match at least n times

{n,m}

n<=m, match at least n times, match at most m times

x|y|

match x or y

[xyz]

collection of characters. matches any one of the characters contained in

[a-z]

Matches any character from the lowercase letters a to z.

[A-Z]

Matches any one of the uppercase letters A through Z.

metacharacter

Functional description

[a- zA -Z]

Matches any one character in any alphabet , uppercase

[A-z]

Another way of writing, matches any character in any uppercase

\d

Numbers: [0-9]

\D

Not a number: [^0-9]

\s

Whitespace characters: [ \t\n\x0B\f\r]

\S

Non-blank characters: [^\s]

\w

Word characters: [a-zA-Z_0-9]

\b

word boundary

\B

non word boundary

\A

start of input

\G

the end of the previous match

\Z

end of input, only for the final terminator (if any)

\z

end of input

\b

word boundary

Java's regular expressions are very similar to Perl's syntax. In Java, use the java.util.regex package to process regular expressions, mainly involving three classes: Pattern, Matcher, and PatternSyntaxException.

  1. The Pattern class is a compiled representation of a regular expression and it has no public constructor. To create a Pattern object, call its public static compile method, which accepts a regular expression as an argument.
  2. The Matcher class is an engine for interpreting and matching operations on input strings, and it has no public constructor. A Matcher object can be obtained by calling the matcher method of the Pattern object, and then the object can be used for string matching operations.
  3. The PatternSyntaxException class is a non-mandatory exception class used to indicate syntax errors in regular expression patterns.

Pattern class

The Pattern class is the class in Java used to represent compiled representations of regular expressions. It provides a set of methods and functions for creating, matching and manipulating regular expressions.

To use the Pattern class, you need to call its static method compile to compile a regular expression and convert it into a Pattern object. Here are some commonly used methods:

  1. compile(String regex): static method, compiles the given regular expression into a Pattern object.
  2. Matcher matcher(CharSequence input): Returns a new Matcher object for matching the input string.
  3. pattern(): returns the regular expression represented by the current Pattern object.
  4. split(CharSequence input): According to the current regular expression, split the input string into multiple string arrays.
  5. Static boolean matches(String regex, CharSequence input): Attempts to match the entire input sequence with a regular expression, returning a Boolean value indicating whether the matching result was successful.

In addition to the above methods, the Pattern class also provides many other methods for regular expression matching and replacement.

Matcher class

The Matcher class is a class used in Java to perform regular expression matching operations on strings. It is an instantiated object of the Pattern class and provides a series of methods and functions to perform matching operations.

To use the Matcher class, you first need to create a Matcher object. You can create a Matcher object by calling the matcher(CharSequence input) method of the Pattern class, passing in the character sequence to be matched as a parameter.

Here are some common methods of the Matcher class:

  1. boolean  matches(): Attempts to match the entire input sequence with a regular expression, returning a Boolean value indicating whether the match was successful.
  2. boolean lookingAt(): Match the previous string, and return true only if the matched string is at the front 
  3. boolean  find(): Finds the next match in the input sequence, returning a Boolean value indicating whether a match was found.
  4. String group(): Returns the subsequence matched by the last matching operation.
  5. int start(): Returns the starting index position of the last match.
  6. int end(): Returns the position after the end index of the last matching operation.
  7. replaceAll(String replacement): Replace all parts of the input string that match the regular expression with the specified string.
  8. replaceFirst(String replacement): Replace the first part of the input string that matches the regular expression with the specified string.
  9. reset(CharSequence input): Reset the Matcher object and specify a new input sequence for matching.

example

Here's an example showing how to use Patternthe and Matcherclasses for regular expression matching and manipulation:

import java.util.regex.*;

public class myclass {
    public static void main(String[] args) {
        String input = "请联系我邮箱: [email protected] 或者访问我的网站 https://www.example.com 获取更多信息。";

        // 定义正则表达式
        String emailPatternString = "\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}\\b";
        String urlPatternString = "https?://[\\w.-]+\\.[A-Za-z]{2,}";

        // 创建 Pattern 对象
        Pattern emailPattern = Pattern.compile(emailPatternString);
        Pattern urlPattern = Pattern.compile(urlPatternString);

        // 创建 Matcher 对象
        Matcher emailMatcher = emailPattern.matcher(input);
        Matcher urlMatcher = urlPattern.matcher(input);

        // 查找并输出第一个匹配的邮箱地址
        if (emailMatcher.find()) {
            String matchedEmail = emailMatcher.group();
            System.out.println("匹配到的邮箱地址: " + matchedEmail);
        }

        // 判断输入中是否包含 URL
        boolean containsUrl = urlMatcher.find();
        System.out.println("输入中是否包含 URL: " + containsUrl);

        // 循环查找并输出所有匹配的 URL
        int count = 0;
        while (urlMatcher.find()) {
            count++;
            String matchedUrl = urlMatcher.group();
            System.out.println("匹配到的 URL " + count + ": " + matchedUrl);
        }

        // 替换邮箱地址和网址
        String modifiedInput = emailMatcher.replaceAll("[email protected]");
        String modifiedInput1 = urlMatcher.replaceAll("https://www.abc.com");

        System.out.println("------------------");
        System.out.println("修改后的输入: " + modifiedInput);
        System.out.println("修改后的输入: " + modifiedInput1);
    }
}

operation result:

匹配到的邮箱地址: [email protected]
输入中是否包含 URL: true
------------------
修改后的输入: 请联系我邮箱: [email protected] 或者访问我的网站 https://www.example.com 获取更多信息。
修改后的输入: 请联系我邮箱: [email protected] 或者访问我的网站 https://www.abc.com 获取更多信息。

PatternSyntaxException class

PatternSyntaxException is an exception class in Java used to indicate a syntax error found while parsing a regular expression.

This class is defined in the java.util.regex package and extends from the RuntimeException class. It provides detailed information about regular expression syntax errors to help developers debug and fix problems.

The PatternSyntaxException class has the following important properties and methods:

  1. String getDescription(): Get description information about syntax errors.
  2. int getIndex(): Get the index of the position where the syntax error occurs.
  3. String getMessage(): Get the full error message including description and index.
  4. String getPattern(): Gets the regular expression pattern that raised the syntax error.
  5. String getMessage(Locale locale): Gets the error message in the specified locale.

When compiling or parsing a regular expression using compile() or other related methods of the Pattern class, if a syntax error is found, a PatternSyntaxException will be thrown.

example

Here is a sample code that shows how to use the PatternSyntaxException exception class to handle exceptional conditions:

import java.util.regex.*;

public class myclass {
    public static void main(String[] args) {
        String invalidPattern = "*abc";

        try {
            Pattern.compile(invalidPattern);//当*abc妀为^abc执行
        } catch (PatternSyntaxException e) {
            System.out.println("正则表达式语法错误: " + e.getDescription());
            System.out.println("错误位置索引: " + e.getIndex());
            System.out.println("错误消息: " + e.getMessage());
            System.out.println("错误模式: " + e.getPattern());
        }
    }
}

In the above example, we try to compile a regular expression "*abc" that contains syntax errors. A PatternSyntaxException will be thrown due to an incorrect pattern. We handle this situation by catching the exception and printing the relevant information.

正则表达式语法错误: Dangling meta character '*'
错误位置索引: 0
错误消息: Dangling meta character '*' near index 0
*abc
^
错误模式: *abc

Guess you like

Origin blog.csdn.net/m0_74293254/article/details/132286694