Regular expressions must not learn how

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/cblstc/article/details/100080296

Foreword

A regular expression is a good thing, but I do not know, blindness. So, determined to learn the system of regular expressions, and strive to understand regular expressions, regular expressions can solve problems in their daily work with positive and improve work efficiency.

Regular expressions must know will be

getting Started

Matches any character

c.t-> cat/cutetc.

A matching set of characters in a

[A-Za-z0-9] -> letters and numbers

Take a non-operation

[^0-9] -> non-digital

Matching special characters

\d-> Digital
\D-> non-digital
\w-> letters, numbers and _
\s-> whitespace characters (backspace [\b]excluded)

Change case

\UJava\E-> JAVArepresents all uppercase
\ujava\E-> Javarepresenting the next character uppercase
\LJAVA\E-> javarepresents all lowercase
\lJAVA\E-> jAVArepresents write a lowercase character

Change case very practical, when we use the editor, can be used (查找的单词)to find the word, and then \U$1\Ereplace all uppercase, \u$1\Ereplacing titlecase etc.

Repeat match

\d+-> a plurality of digital to 1
\d*-> 0 to a plurality of digital
\d?-> Digital 0-1
\d{3,5}-> 3-5 digit

Greedy and lazy match match

*And +are greedy matching metacharacters, for example, has this to say html code

<h1>你好</h1>
<h1>你们好</h1>

Regular expression is <h1>.*</h1>, then it will match the entire string, with our expectations differ, because the greedy match will be the biggest match of the string as possible.
Lazy to change can be a good match to solve this problem, greedy matching metacharacters plus ?laziness match, being an example of this expression can be expressed as <h1>.*?</h1>=

Location match

\bhelloworld\b->helloworld , helloworldjavathat can not be matched
\B-\B-> - all non-boundary character left and right
^Helloworld-> to Helloworldstart, pay attention and take the non-difference, negation operator written []inside
Helloworld$-> to Helloworldend

Branch match mode

If we need to match the newline beginning and end of the string, we can use pattern matching branch to change ^and $behavior, and then use pattern matching branch. ^And $to represent the beginning of the border after the wrap, before the end of the boundary line feed. For example, if we need to match all comments, so you can use regular expressions(?m)^\s*//.*$

Subexpression

Why subexpression appear? Suppose we want to match multiple Helloworldwords, use a regular expression Helloworld*matches such as Helloworldddthe string and the like, the correct approach is to use parentheses wrapped up the word(Helloworld)*

Advanced

Trackback - consistent match

For example, HTML representation using the h1-h6 tags, we now want to match all the tags can be used <[hH][1-6]>.*?</[hH][1-6], but if there is an error inside the html expression formats, such as <h1>标题</h2>, in this way it will not work. We need to use the knowledge back references.
This example uses the trackback expression is <[hH]([1-6])>.*?</[hH]\1>, where \1represents like subexpression matches the first if the first matching sub-expression 2, it \1will match 2

Cite a common example: I use webstorm editor, there is a page on http://www.baidu.coma string, I want to replace it <a href="http://www.baidu.com">http://www.baidu.com</a>, you can use regular expressions to (http:.*)match the url, then <a href="$1">$1</a>replace (webstorm use $ string representing the match, similar are placeholders in the character).
So, regular expressions can make good use of the benefit of mankind!

Look around

Look-ahead

What look forward, under normal circumstances, we need to match a character, but it does not need to display the matching results. For plums: matching the URL protocol, but is not required to :match it, it can be used \w+(?=:)instead \w+:, because the latter will :match up

Find backwards

Opposition forward looking, for example, the number to match the price, do not need to display $symbols can be used (?<=\$).*$.

Two-pronged approach

Suppose we need to get <div>Helloworld</div>the contents inside, you can do so(?<=<div>).*?(?=</div>)

Negated around looking for

Meaning negated is: do not match the character forward and backward, such as matching the string of 10 numbers $100 can buy 10 apples, you can do so\b(?<!\$)\d+\b

give up

Embedding condition (Learn)

Embed conditions are relatively complicated to understand just fine

Trackback conditions

Syntax: (location sub-expressions) true_regex | false_regex?
We may encounter such a situation, it is assumed to match the left parenthesis, then we hope to be able to go in the right bracket match. But if there is no opening parenthesis, we do not want to be matched to the right bracket. Suppose we want to match both the phone number

123-456-789
(123)456-789

can use(\()?\d{3}(?(1)\)|-)\d{3}-d{3}

Before and after the search criteria

Syntax:? (Before and after the find expression) true_regex
Suppose we want to match the first and third rows

11111
22222-
33333-44444

can use\d{5}(?(?=-)-\d{5})

to sum up

This blog symbols are more likely to produce some mistakes, I hope a lot of parents who pointed out!

postscript

Ever since reading the "regular expression must know will be," the book, find the search function to replace the editor with a regular expression is simply too strong, it is no exaggeration to say that the regular expression is an essential skill for all developers.

Reference material

Regular expressions must know will be (Revised Edition) [Ben Forta with]

Guess you like

Origin blog.csdn.net/cblstc/article/details/100080296