Regular Expressions sort of knowledge

1)         Overview

Regular expressions (regular expression) describes a set of strings (pattern), it can be used to check whether a string containing the certain substring, replacing the sub-string matching or removed from a string meet a certain criteria substring and so on.

Regular expression pattern by a common text characters (e.g. characters a to z) and special characters (referred to as "meta character") thereof. Modes are described in the text search to match one or more strings. Regular expression as a template, a character pattern to match with the search string.

2)   basic symbols :( are special characters to match the character itself, and needs in front of the escape character)

^ Represents the start position of the matched string (with the exception when in square brackets [], can be understood as the inverse of the parentheses represents a character string does not match)

$ Means match the end position of the string

* Indicates a match several times to zero

+ Indicates a match to a plurality of times (at least once)

? Represents zero or one match

Indicates that matches a single character 

| Expressed as or, in the two take a

() Parentheses indicate match all characters in brackets

[] Brackets represents a range of characters as described in [0-9 az AZ] matching brackets

Braces {} for defining the number of matches as {n} {n characters which matches n, n} represents at least match the characters {n, m} indicates at least n, m at most

\ Escape character matching basic symbols as required as an escape character \ * * number indicates a match

\ W represent letters and numbers \ W non-alphanumeric

\ D represents a number \ D non-digital

 

3)   non-printing characters

Character Description

\ Cx matches control characters specified by the x. For example, \ cM matches a Control-M or carriage return. The value of x must be AZ or az. Otherwise, c as a literal 'c' character.

\ F match for a website page. Equivalent to \ x0c and \ cL.

\ N Matches a newline. Equivalent to \ x0a and \ cJ.

\ R match a carriage return. Equivalent to \ x0d and \ cM.

\ S Matches any whitespace characters, including spaces, tabs, page breaks, and so on. Is equivalent to [\ f \ n \ r \ t \ v]. Note Unicode Regular Expressions will match full-width space character.

\ S Matches any non-whitespace characters. Is equivalent to [^ \ f \ n \ r \ t \ v].

\ T matches a tab. Equivalent to \ x09 and \ cI.

\ V matches a vertical tab. Equivalent to \ x0b and \ cK.

4)   qualifier

Qualifier is used to specify the regular expression of a given component must appear many times to meet the match. There * or + or? Or {n} or {n,} or {n, m} total of six kinds.

* Matches the preceding subexpression zero or more times. For example, zo * matches "z" and "zoo". * Is equivalent to {0}.

+ Matches the preceding subexpression one or more times. For example, 'zo +' will match "zo" and "zoo", but can not match the "z". + Is equivalent to {1}.

? Matches the preceding subexpression zero or one. For example, "do (es)?" Matches "do", "does" in the "does", "doxy" in the "do". ? Is equivalent to {0,1}.

{N} n is a non-negative integer. Matching the determined n times. For example, 'o {2}' does not match the "Bob" in the 'o', but can match the "food" in the two o.

{N,} n is a non-negative integer. Matching at least n times. For example, 'o {2,}' does not match the "Bob" in the 'o', but it can match all o "foooood" in. 'O {1,}' is equivalent to 'o +'. 'O {0,}' is equivalent to 'o *'.

{N, m} m and n are non-negative integers, where n <= m. Match at least n times and match up to m times. For example, "o {1,3}" will match "fooooood" in the previous three o. 'O {0,1}' is equivalent to 'o?'. Please note that no spaces between the comma and the two numbers.

Guess you like

Origin www.cnblogs.com/newxu/p/11801552.html