junior python regular expression

python  regex r "(? <= str1) . +? (? = str2)"

Regular expression to describe the use of a single string, the string matching a matching series of syntactic rules. In many text editor, regular expressions are typically used to search, replace text that matches a pattern.

(?<=str1)

It represents regular thereafter, the front face to match str1

(?=str2)

Represented in the previous regular, the rear to match str2

These two expressions represents the conditions, not the actual match a regular part of

 

' . ' Except for matching newline ( \ n- all characters) outside.

' * ' In front of the pattern matching for 0 or more times (greedy, i.e. matches as much)

' + ' In front of the pattern matching for one or more times (greedy)

'? 'In front of the pattern matching for 0 times or 1 times (greedy)

' * ? , + ? ,? ? 'Non-greedy i.e. above three special characters (as little as possible match).

' ^ ' Starts a matching string, i.e. the first row.

' $ ' For the end of the matched string (if the end of a line break \ n, to match \ n that the preceding character), i.e. the line.

' {M , n} ' in front of the pattern matching for m times to n-times (greedy), i.e. the minimum matching m times, matching the maximum n times.

' {M , n-} ? 'I.e. above' {m , n-} 'non-greedy version.

' \\ ': '\' is the escape character in front of the special characters plus \, special characters lose the meaning they represent, such as \ + plus sign on behalf of only + itself.

'[]' Are used to designate a group of characters, if ^ is the first character, the mark is a complement. Such as [ 0-9 ] represents all the figures [ ^ 0-9 ] except figures represent characters .

' | ', Such as A | B for matching A or B.

'( ... )' is used to match patterns in parentheses, may retrieve or match what we need in the string.

 

p1 = r"(?<={0}).+?(?= \\n)".format('Undistort Flag: ')

    = re.compile pattern1 ( p1 )                          # compile the regular expression

    = re.search matcher1 ( pattern1 ,  the Output )   # in the source text search complies with Part regular expressions

    rtv_lag =  matcher1.group ( 0 )

    print '~~~~~~~~ `' , rtv_lag

    return rtv_lag

 

python  regular expression study :

primary:

key = r "<h1> hello world <h1>" # source text

p1 = r "<h1>. + <h1>" # we write regular expressions, the following will be Why

pattern1 = re.compile(p1)

Print  pattern1.findall ( Key ) # hair did not find, how I wrote findall up? Ye changed it?

 

findall returns a list of all the elements to meet the requirements, including a list is only one element, it is returned to you.

Regular Expressions

Matching characters represented

[0-9]

0123456789 any one

[a-z]

One of arbitrary lowercase letters

[A-Z]

One of arbitrary capital letters

\d

It is equivalent to [ 0-9]

\D

It is equivalent to [ ^ 0-9] non-matching digital

\w

It is equivalent to [ A-z0-9A-Z_] match case letters, numbers and underscores

\W

It is equivalent to [ ^ A-z0-9A-Z_] equivalent to the previous one is negated

 

Some examples of inaccurate match

@ I want to match with. Ygomi between:

 

key='aaaapiaoyu.qiu@ygomi.com.cn'

>>> p1=r"@.+\."

>>> pattern1=re.compile(p1

>>> pattern1.findall(key)

[ '@ Ygomi.com .' ]     Was matched to more content

>>> 

>>> p1 = r "@. +     ? \." Increase?

>>> pattern1=re.compile(p1)

>>> pattern1.findall(key)

[@ygomi. ]     

 

The reason is that regular expressions defaults to "greed", " +" represents a character is repeated one or more times. But we did not elaborate on how many times this many times in the end. So it will try to "greedy" to give us matching character, in this case that is a match to the last " .".

How do we solve this problem? As long as the " +" plus a behind the "?" Just fine. Plus a " ?" We will be greedy " +" to the lazy " +." This [ ABC] +, the same applies \ w * and the like.

 

To accurately control the number of repetitions : {A , B}  where, a <= the number of matches <= b

For Li, we have sas , saas , saaas , we want sas and saas

 

key = r"saas and sas and saaas"

p1 = r"sa{1,2}s"

pattern1 = re.compile(p1)

print pattern1.findall(key)

Output: [ 'SaaS' , 'SAS' ]

 

If you omit {1, 2} is 2, then on behalf of at least one match, then it is equivalent to?

If you omit {1, 2} is 1, then match up the representative twice.

 

Regular expression meta-characters and their roles:

Metacharacters

Explanation

.

Stands for any character

|

Logical OR operator

[ ]

Matches any character or sub-patterns

[^]

Character set and negated

-

Define a range

\

The next character to be negated (usually a special variable common, ordinary special variant)

*

Matches the preceding character or subexpression zero or more times

*?

A matching inert

+

Matches the preceding character or subexpression one or more times

+?

A matching inert

?

A character or sub-expression before the match 0 or 1 repetitions

{n}

Matches the preceding character or subexpression

{m,n}

Before a character or subexpression matches at least m times at most n times

{n,}

Match the previous character or sub-expression at least n times

{n,}?

Before a matching inert

^

Matches the beginning of the string

\A

Matches the beginning of string

$

The end of the match string

[\b]

Backspace character

\c

Matches a control character

\d

Match any digit

\D

Characters other than numbers match

\t

Matching tabs

\w

Matches any alphanumeric underscore

\W

Does not match the alphanumeric underscore



 

Guess you like

Origin www.cnblogs.com/mianbaoshu/p/12068720.html