&& Java regular expression text complex operations

Regular expressions
1. Regular Expressions strengths and use?
  A powerful and flexible text processing tool;
  most programming languages, databases, text editor, development environment supports regular expressions.
2. Regular expressions defined:
  as his name is described as a rule, can match a string class by this rule.

3. regular expression syntax

(1) ordinary characters
  letters, numbers, characters, underscores, and no special definition of punctuation, are "ordinary characters." Expression ordinary character, a string matching when a match with the same character.
(2) simple escape character

\n Represent a newline character
\t Tabs
\\ Representatives \ itself
\^ ,\$,\.,\(, \) , \{, \} , \? , \+ , \* ,\| ,\[, \] Match these characters themselves

(3) standard set of characters

  Case sensitive, uppercase opposite meaning

\d  Any of a number of 0 to 9 in any one
\w Any letters or numbers or underscores, i.e. A ~ Z, a ~ z, 0 ~ 9, _ any one
\s Including space, tab, newline characters in any of a blank
. A decimal point can match any character (except newline) To match including "\ n" all the characters, usually with [\ s \ S]

(4) a custom set of characters

  [] Brackets match, it is possible to match any one character in brackets

  Regular expressions are special symbols, are included in parentheses, special significance is lost, except ^, - outside.
  Standard set of characters, in addition to the decimal point, if contained in parentheses custom character set comprising the set. For example: [\ d \ - +. ] Will match: number, decimal point, +, -

[ab5@] Match "a" or "b" or "5" or "!"
[^ abc] Matches any character other than "a", "b", "c"
[f-k] Any matching between "f" ~ "k" letter
[^A-F0-3] Matches any character other than "A" ~ "F", "0" ~ "3"

(5) Classifiers: special symbol is modified to match the number of

  The number of matches in the greedy (matching characters better, default!)
   Match the number of non-greedy (matching characters as possible, matching the number of special symbols modified and then add a "?" Sign)

{n} Expressions repeated n times
{m,n} Expression repeated at least m times, n times up to
{m,}  Expression was repeated at least m
? Match expression zero or 1, corresponding to {0,1}
+ 1 expression appears at least equivalent to {1}
* Expression occurs or does not occur any number of times, corresponding to {0}

(6) character boundary

^ Where to start with the string matching
$ The end of the string place match
\b Matches a word boundary

Regular expression pattern matching:

  • IGNORECASE ignore case mode

    Ignore case when matching.

     By default, the regular expression is to be case-sensitive.

  • SINGLELINE single-line mode

    The entire text as a string, only a beginning, an ending.
    The decimal point. "" Match can contain newline (\ n), including any character.

  • MULTILINE multiline mode

    Each line is a string that has a beginning and an end.
     After specifying MULTILINE, if only matches the string start and end location, you can use \ A and \ Z

(7) selecting and grouping

expression effect
| Branch structure "Or" the relationship between the left and right expression, matching the left or right
() Capture group (1). When the number is modified to match the expression in brackets may be modified as a whole
(2). When taking matching results, the expression in parentheses is matched to the content may be obtained separately
(3). Each a pair of parentheses assigns a number, using () 1 according to a sequence of captured from a left parenthesis
automatic numbering starts. Capturing a first capture element number zero is the entire regular expression pattern
matching text
(:? Expression) non-capturing group Some expressions have to be used (), but do not need to save () neutron expression matching
content with, then you can use non-capturing group to offset the side effects caused by use ().

Back reference (\ nnn):

  Each pair () assigns a number, using () capturing automatically numbered from 1 in the order of a left parenthesis.
  It may be a reference to a string of packets captured by reverse reference.

(8) pre-search (zero width assertions)

   Were only sub-expressions, the matching content not included in the final match result, zero width
  in this position should meet a certain criteria. Determine the current position before and after the character, it meets the specified criteria, but does not match the characters before and after. It is matched to the location.
  Regular expression matching process, if the sub-expression to match the content of character rather than position, and saved to the final match results, then consider this sub-expression is the possession of a character; if the sub-expression matching only the location, or content match is not saved to the final result of the match, then we consider the sub-expression zero width. Possession or zero-width character is matched against the contents of whether to save the final result in terms of matching.

(?=exp)  Later assert itself appear to match the position of the expression exp
(?<=exp) Assert itself appeared in front of the location can match the expression exp
(?!exp)  Behind this assertion does not match the position of the expression exp
(?<!exp)  This assertion does not match the position of the previous expression exp

4. common list of regular expressions

Matching Chinese characters [\u4e00-\u9fa5]
Matching blank line \n\s*\r
匹配HTML标记 <(\S*?)[^>]*>.*?</\1>|<.*? />
匹配首尾空白字符 ^\s*|\s*$
匹配Email地址 \w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*
匹配网址URL [a-zA-z]+://[^\s]*
匹配国内电话号码 \d{3}-\d{8}|\d{4}-\d{7}
匹配腾讯QQ号 [1-9][0-9]{4,}
匹配中国邮政编码 [1-9]\d{5}(?!\d)
匹配身份证 \d{15}|\d{18}
匹配ip地址 \d+\.\d+\.\d+\.\d+

5.Java程序中使用正则表达式

 

Guess you like

Origin www.cnblogs.com/mxj961116/p/10963059.html
Recommended