At first glance: regular expressions

content

Case introduction

regular expression

Naming conventions

Structure and composition

Usage scenarios

Regular check in Java

regular metacharacters

Regular: normal characters

Regular: \d 

Regular: \D

Regular: \w

Regular: \W 

Regular: \s

Regular: \S

Regular: .

Regular: | 

Regular: [abc]

Regular: [^abc]

Regular: [az]

Regular: [^az]

Regular: \num

Regular: ?

Regular: +

Regular: {n}

Regular: {n,m}

Regular: *


Case introduction

Before talking about regular expressions, let's start with a scene and gradually introduce it.

You may have had this experience: we go to some websites to register an account. When you set a password, the website will prompt you for the length range of the password and the corresponding rule restrictions (as shown in the figure below).

According to the above figure, we can describe the password setting rules as two conditions:

(1) The length is 6-16 bits;

(2) The password must contain numbers, uppercase letters, lowercase letters, and special characters (specified characters);

Now suppose we don't know regular expressions. As a programmer, how do you implement such a password authentication?

The following is a verification method I wrote (sample):

/**
 * 校验用户密码是否满足设置规则
 * 
 * @param password 用户输入的密码
 * @return true-满足;false-不满足
 */
public static boolean checkPassword(String password) {
    // 密码不能为空
    if (password == null || password.isEmpty()) {
        return false;
    }
    // 校验密码长度(6-16位)
    int len = password.length();
    if (len < 6 || len > 16) {
        return false;
    }
    // 定义四种组合条件
    boolean hasNumber = false;
    boolean hasSmallLetter = false;
    boolean hasBigLetter = false;
    boolean hasSpecialChar = false;
    // 将密码字符串拆分为单个字符,然后对每个字符进行校验
    char[] chars = password.toCharArray();
    for (char c : chars) {
        // 是否包含数字0-9
        if (c >= '0' && c <= '9') {
            hasNumber = true;
            continue;
        }
        // 是否包含小写字母a-z
        if (c >= 'a' && c <= 'z') {
            hasSmallLetter = true;
            continue;
        }
        // 是否包含大写字母A-Z
        if (c >= 'A' && c <= 'Z') {
            hasBigLetter = true;
            continue;
        }
        // 是否满足指定的特殊字符
        if ("~@#S%*_-+=:.?".indexOf(c) > 0) {
            hasSpecialChar = true;
            continue;
        }
        // 如果某个字符不在上面四种情况,则不满足规则
        return false;
    }
    // 如果四种组合条件均满足,则符合密码设置规则
    return hasNumber && hasSmallLetter && hasBigLetter && hasSpecialChar;
}

Is this method written correctly? We do not use several sets of passwords to verify:

It can be seen that the 8 groups of passwords we listed have all been verified, indicating that our method is OK.

But for such a password setting rule check, we almost wrote nearly 30 lines of code, doesn't it feel a bit cumbersome? Obviously the rules are very simple, but the amount of code has been written so much, is there any way to simplify our code? Of course there is! So, at this time, we can let our protagonist regular expressions appear today.

The following is a verification method based on regular expressions with the same verification function:

/**
 * 通过正则表达式校验用户密码是否满足设置规则
 * 
 * @param password 用户输入的密码
 * @return true-满足;false-不满足
 */
public static boolean checkPasswordByRegex(String password) {
    return Pattern.matches("^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~@#S%*_\\-+=:.?])[A-Za-z0-9~@#S%*_\\-+=:.?]{8,20}$", password);
}

So it's written in the end, right? So, we can continue to call this method to verify with the sample data above:

From the results, we can see that he is also in line with our expectations. So we found that when we don't use regular expressions, our code is nearly 30 lines, and using regular expressions, the code is condensed into 1 line, that is, when using regular expressions, our code can be simplified.

But at the same time, we also know that regular expressions have a certain learning cost. If you don't understand regular expressions, then you may be confused. If there is a problem, you will not be able to modify it.

Therefore, it is still necessary to learn regular expressions. At least in the future, after your colleagues write it, the idea of ​​"what is this written? Why can't I understand" will appear in their minds.

regular expression

What is a regular expression? From the above case, you may know a little bit about it. Yes, he describes certain rules through a line of strings (the red box indicated by the arrow in the figure below).

Naming conventions

Regular expressions are called Regular Expressions in English, so we usually combine the first few letters of these two words to define the variable names related to regular expressions as regexp (singular) or regexps (plural).

for example:

For another example, in Java's String class, there are several related replacement methods, it also supports regular expressions, and its parameter names are also regex.

Structure and composition

Regular expressions usually consist of some ordinary characters, and some metacharacters.

Ordinary characters: When used as a character by itself, it has no other meaning, like uppercase and lowercase letters and numbers that we commonly use.

Metacharacter: In addition to being a character, it can also express other meanings (the figure below is an excerpt of some metacharacters).

 In fact, most of our learning of regular expressions is based on meta-character learning.

Usage scenarios

After learning regular expressions, what use scenarios can we have?

(1) Do the rule verification of strings (for example, in the introduction of the previous case, we can use regular expressions to verify whether a password complies with the rules).

(2) Do string replacement (for example, remove all uppercase and lowercase letters in a string, or replace with specified symbols).

(3) Extract the required characters in the string (for example, extract all the numbers in a string to form a new string).

Regular check in Java

The main purpose of regular expressions is to verify strings, so in Java, you only need to use the following method to verify.

boolean result = Pattern.matches(regex, input);

in:

regex is the regular expression check rule we need to write;

input is the string we want to verify;

The returned result is the result of our verification. When it is true, it means that the verification has passed, and when it is false, it means that the verification has not passed.

regular metacharacters

Regular: normal characters

When our regular expression is a string of ordinary characters (excluding metacharacters), the verification string will only pass the verification if it is consistent with the regular expression.

The specific effects are as follows:

Note: In the following example, in order to save space and not appear cumbersome, the code will not be pasted, and only the verification result will be pasted.

Regular: \d 

\d means a digit.

Such as:

aaa\d : Indicates that the string to be verified must start with aaa and end with a number.

aaa\dbbb : there is a number between aaa and bbb

 aaa\d\d : aaa followed by 2 numbers

 

Note: In the regularization defined by Java, since a \ represents a string escape, when Java defines a meta character with \, you need to write an additional \, that is, \\. As for other languages, you can check it yourself Learn about relevant information.

Regular: \D

\D represents a non-digit, which is the opposite of the meaning of \d above.

Such as:

\D\D\D : It means a string of length 3 that does not contain numbers.

111\D222 : It means that between 111 and 222, it must contain a non-number.

Regular: \w

\w represents a letter (both upper and lower case), a number, or an underscore.

Such as:

12\w45 : It means that there must be a letter, number, or underscore between 12 and 45.

Regular: \W 

\W Contrary to \w, the character at this position is neither a letter, a number, nor an underscore.

That is: special symbols (except underscores), or spaces, etc. are satisfied.

Such as:

12\w45 : It means that there is a non-letter, non-digit, or non-underscore between 12 and 45.

Regular: \s

\s means match an invisible symbol, i.e. a space or a tab (Tab key)

Such as:

88\s99 : It means that there must be a space or tab between 88 and 99.

(Because my editor sets 1 tab to be replaced with 4 spaces, so I won't list the tabs here)

Regular: \S

 \S is the opposite of \s and represents a visible symbol.

Such as:

88\S99 : It means that there must be a visible symbol between 88 and 99. 

Regular: .

 . (decimal point) means any single character other than "\n" and "\r".

Such as:

.... : means any four characters

Regular: | 

| (vertical bar) represents the relationship of OR, which means that the detected string must meet one of them before it meets the condition.

Such as:

aa|bb|cc : It means that the input string must be one of aa, or bb, or cc.

 

Note that if there are other characters before and after we or the relationship, we need to wrap them with ( ).

Such as:

xx(aa|bb|cc)yy : It means that the input string must start with xx, end with yy, and be one of aa, bb, or cc in the middle.

Regular: [abc]

[ ] means match any one of the characters.

Such as: 

a[bcd]e : it means that the middle of a and e must be one of b, or c, or d

Note: Use | to represent one of them, which can be a character or a string. When only square brackets are used, only one of the characters is represented.

Regular: [^abc]

[^ ] means not to match any characters in brackets.

Such as: 

a[^bcd]e : It means that the middle of a and e is satisfied except for the three characters of b, c, and d.

Regular: [az]

[value 1-value 2] means that all characters between value 1 and value 2 are satisfied (including value 1 and value 2). This regex is often used to represent the range of uppercase and lowercase letters and the range of numbers.

Such as:

a[bd]e: Equivalent to  a[bcd]e , because bd is actually three numbers b, c, and d.

 a[0-9]e: It means that there is a number between a and e, which is equivalent to a\de (as mentioned earlier, \d means a number)

 

Regular: [^az]

[^value1-value2] means that all characters except value1 and value2 can be satisfied.

Such as:

a[^1-3]e : It means the character between a and e, as long as it is not 1, 2, 3, it is satisfied.

Regular: \num

The num here refers to number, that is, a number. When \ is followed by a number, it means that the result in the first few brackets is matched.

For example: Now there is the abcd string, when we wrap c with parentheses, and then write \1 after the string, that is, ab(c)d\1, then \1 here refers to c, because \1 Indicates the result in the 1st parenthesis.

ab(c)d\1 : Equivalent to abcdc.

If we continue to include the d in ab(c)d\1 and write \2 after it, that is, ab(c)(d)\1\2, then the \2 here represents the character d, because The result of the second parenthesis is d, so the entire expression is equivalent to abccdd.

ab(c)(d)\1\2 : equivalent to abccdd, also equivalent to  ab(cd)\1 .

Regular: ?

? means match the preceding subexpression zero or one time.

Such as:

abc?de : Indicates that the matchable string is abde (matches c 0 times) or abcde (matches c once).

Regular: +

Match the preceding subexpression one or more times (number of times >= 1, that is, at least 1 time)

Such as:

abc+de: There is at least one c before ab and de.

Regular: { n }

Here n  is a non-negative integer. Matches the determined preceding subexpression  times.

Such as:

 abc{3}de : Indicates that there are 3 cs between ab and de.

 ab(xx|yy){3}de : Indicates the number of xx or yy between ab and de, which together add up to 3.

Regular: {n,m}

Both m and n are non-negative integers, where n<=m. Matches at least n times and at most m times.

Such as:

abc{2,3}de : Indicates that there are 2 to 3 c's between ab and de.

Regular: *

Indicates that the preceding subexpression is matched any number of times.

Such as:

abc*de : Indicates that there is any number (including 0) c between ab and de.

Guess you like

Origin blog.csdn.net/sunnyzyq/article/details/122840555