MySQL-use of regular expressions

MySQL-use of regular expressions

Regular expressions match text by comparing a pattern to a text string. For example, regular expressions can be used to extract phone numbers from a file, find repeated words in an article, replace sensitive vocabulary in an article, etc. Regular expressions are powerful and flexible and are often used for very complex queries.

In MySQL, the REGEXP keyword is used to specify the character matching pattern of a regular expression. The where clause provides preliminary support for regular expressions. Its basic syntax format is as follows:

属性名 REGEXP '匹配方式'

"Attribute name" indicates the name of the field to be queried; "matching method" indicates the method to match the query. There are many pattern matching characters in "matching method", and they represent different meanings. The following table lists the commonly used matching methods in the REGEXP operator.
Insert image description here

Regular expression universal character table
   

Basic character matching

    Let's start with a very simple example. Retrieve all lines that contain the 'project' text in the text. This statement looks very similar to fuzzy matching. It just replaces like with REGEXP. Then why do we sometimes use regular expressions? Instead of matching like, I will press the button for now. Finally, I will make a comparison and summary of these two matching methods.

 select profession from tb_user where profession REGEXP '工程';

Insert image description here
    

Match a single character

    "." is a special character used in regular expressions to match any character in the text. It can replace alphanumeric and Chinese text.

select profession from tb_user where profession  REGEXP '.程';

Insert image description here
    

Perform OR matching

    When your search condition is one of two or more text strings (either this string or other strings), then you need to use or match, and the character used is: "|".

select email from tb_user where email REGEXP 'qq|sina';

Insert image description here
    The above statement is used to match the mailbox number in the table using qq or sina mailbox. If you want to match more than two or conditions, add the "|" symbol according to the above example to match the corresponding string.

    If you want to match just any single character, then you need to use the "[]" symbol to achieve it.

select email from tb_user where email REGEXP '[abc]ao';

Insert image description here
    The function of this code is to match the mailbox numbers that begin with a/b/c and are subsequently connected to 'ao' in the email data segment.

    In fact, "[]" is another expression form of "|", so when should the former be used instead of the latter? An example is used below to illustrate the situation.

 select email from tb_user where email REGEXP 'a|b|cao';

Insert image description here
    The meaning expressed by the above command is the same as when using "[]", but why are the results obtained different? This is because this statement MySQL assumes that you want to match the mailbox number with "a" or "b" or "cao", so the result you get is not what we want. In this case, you need to use "[]" to separate each email number. Characters are enclosed in a set, otherwise it will be applied to this string.

    Sets can also be defined to match one or more characters. For example, [0-9] matches any number, and [aZ] matches any alphabetic character.

select email from tb_user where email REGEXP '[0-3]';

Insert image description here
    

Match special characters

    Regular expressions consist of special characters with specific meanings, such as ".", "[]", "|", etc. So how should we implement it if we want to match these special characters in text?
    When you want to match some special characters, you must use \\ as the leader. For example, \\- means searching for - in the text segment. This processing is called escaping, and all characters with special meaning in regular expressions must be escaped in this way. (If you want to match \, then you need to use "\\" to match)

select email from tb_user where email REGEXP '\\.';

Insert image description here

Matches mailbox numbers where "." appears in the column value
    

Match multiple instances

    All matching patterns used before match a single occurrence. If you want to have stronger control over the number of matches, you need to repeat metacharacters to complete. These characters are listed in the table at the beginning.
Insert image description here
    For example, if you want to find whether "[666] eng" exists in the text, you can use repeated metacharacters. The following examples list three different search methods. You can know the corresponding meaning by referring to the above table. There are no different ones here. One explained. However, it should be noted that the repeated metacharacter is used to match the number of occurrences of the character that appears before it, which means that it is only effective for the character before it.

1:select profession from tb_user where profession REGEXP '\\[[0-9]+ eng\\]';
2:select profession from tb_user where profession REGEXP '\\[[0-9]* eng\\]';
3:select profession from tb_user where profession REGEXP '\\[[0-9]{3,} eng\\]';

Insert image description here
    

locator

If you want to match text at a specific location, you need to use locators.
^ matches the beginning of text, $ matches the end of text

select profession from tb_user where profession REGEXP '^软';

Insert image description here
    
^ has a dual purpose in regular expressions. If used in the set "[]", it means to negate the content in the set. In other cases, it serves as a locator.
    

Comparison of regular expressions and like fuzzy matching

    like is used to compare whether the entire column value matches. If the text used to match like appears in the column value, but does not exist in the form of a complete column value, then like will not find it (unless the wildcard character % is used).
Regular expressions match within column values. If the matched text appears in the column value, then it can be found.
To put it simply, like must match the entire string content and regexp can match substrings. This is a very important difference.

Guess you like

Origin blog.csdn.net/ccjjjjdff/article/details/130789871