Regular expressions python achieve introductory tutorial [Classic]

This article describes the regular expressions python implementation, detailed analysis of the Python regular expression commonly used in a variety of symbols, functions, and so the use of methods and precautions need friends can refer to the following
article describes the realization of the positive python expression function. Share to you for your reference, as follows:

preamble:

First of all, what is the regular expression (Regular Expression)?

For example, we want to determine the string "adi_e32fv, Ls" which if it contains the substring "e32f", we have another example in a txt file containing one million names to find the name "King", the name to "five" at the end of the name, and then print it out. The result is: "Wang five", "Wang five", "King of the Big Five," "Wang Xiao Wu" ......

Before we are using the string functions to find, but the code will be very complicated to implement. Today, a regular expression requires only a re.findall ( 'king. *? Five', txt1) on it! Regular expressions are written the most basic knowledge of web crawler, you can use regular expressions in html collect some strings to meet the requirements of the URL. The following is a summary of some of the individuals on a regular basics of expressions.

(Operating Environment: 32 Win8 system, the running tool: python2.7.9 + Eclipse.)

text:

1, first introduced python re module.

2, metacharacters ^ $ * + {} [] \ |.? ()

re module findall (str1, str2) method returns a string that matches a string str1 str2 format. For example, in the string 'dit dot det, dct dit dot' match 'dit' results:

str1 = 'dit dot det,dct dit dot'
print re.findall('dit',str1)

The results: ['dit', 'dit']
| role: 'dit | dct' represents a dit or dct.

str1 = 'dit dot det,dct dit dot'
print re.findall('dit|dct',str1)

result:

['dit', 'dct', 'dit']

[] Function: [ic] or C i represents, for example, 'd [ic] t' represents both dit and dct, and 'dit | dct' equivalent

str1 = 'dit dot det,dct dit dot'
print re.findall('d[ic]t',str1)

```bash

result:

['dit', 'dct', 'dit']

A role: [ IC] ^ represents the negative, i.e., in addition to i and c

str1 = 'dit dot det,dct dit dot'
print re.findall('d[^ic]t',str1)

Results: The ['dot', 'det', 'dot']
effect of two: dit dit at the beginning of a substring position, rather than at the beginning dct:

str1 = 'dit dot det,dct dit dot'
print re.findall('^dit',str1)
print re.findall('^dct',str1)

result:

['dit'][]

Make use d O t Role: dot a substring in the dot to the end position, rather than at the end dct:

str1 = 'dit dot det,dct dit dot'
print re.findall('dot$',str1)
print re.findall('dct$',str1)

result:

['dot'][]`

Role: dt represents an arbitrary character is omitted between d and t:

str1 = 'dit dot det,dct dit dot'
print re.findall('d.t',str1)

Results:
['dit', 'dot', 'det', 'dct', 'dit', 'dot']
+ action: di + t represents omit one or more of 'i' and T d between:

str1 = 'd dt dit diit det'
print re.findall('d.+t',str1)

Results: The
['dit', 'diit']
effect: DI T represents omitted from zero to a plurality of 'i' and T d between:

str1 = 'd dt dit diit det'
print re.findall('d.*t',str1)

Results:
['dt', 'dit', 'diit']
'.' Frequently, and '+' or ' ' with use. '+' Indicates to omit a plurality of arbitrary elements ' represents zero to omit any of the plurality of elements':

str1 = 'd dt dit diit det'
print re.findall('d.+t',str1)
print re.findall('d.*t',str1)

Results:['d dt dit diit det']['d dt dit diit det']
? Effect a: See + matching result, 'dit', 'dot' satisfies 'd + t.' Matching condition, and the output is to satisfy the matching condition longest substring 'dit dot det, dct dit dot ' this is called greedy match. If you want to output the shortest match string, just in the '+' followed by '? ':( Note: For' 'is the same, only in' 'followed by' ')?

str1 = 'd dt dit diit det'
print re.findall('d.+?t',str1)

Results:['dit', 'dot', 'det', 'dct', 'dit', 'dot']
? Role two:? Di t i represents the essential, that is, dt, dit matching conditions are met:

str1 = 'd dt dit diit det'
print re.findall('di?t',str1)

Results: ['dt', 'dit']
{a} effect: di {n} t expressed between d and t n number 'i':

str1 = 'dt dit diit diiit diiiit'
print re.findall('di{2}t',str1)

result:

['diit']

{} Effect two: di {n, m} t represents an n to m 'i' between d and t:

str1 = 'dt dit diit diiit diiiit'
print re.findall('di{1,3}t',str1)

Results: ['dit', 'diit', 'diiit']
wherein, n and m are both can be omitted. {n} represents one to any number n; {,} m represents 0 to m; {,} represents any number, and '*' function as:

str1 = 'dt dit diit diiit diiiit'
print re.findall('di{1,}t',str1)
print re.findall('di{,3}t',str1)
print re.findall('di{,}t',str1)

result:

['dit', 'diit', 'diiit', 'diiiit']
   ['dt', 'dit', 'diit', 'diiit']
   ['dt', 'dit', 'diit', 'diiit', 'diiiit']

\ A role: the abolition of meta-characters into the escape character

str1 = '^abc ^abc'
print re.findall('^abc',str1)
print re.findall('\^abc',str1)

result:

[]['^abc', '^abc']

\ Role two: pre-defined character

str1 = '12 abc 345 efgh'
print re.findall('\d+',str1)
print re.findall('\w+',str1)

result:

['12', '345']
   ['12', 'abc', '345', 'efgh']

() Action: After matching string, only output matching string '()' contents inside:

str1 = '12abcd34'
print re.findall('12abcd34',str1)
print re.findall('1(2a)bcd34',str1)
print re.findall('1(2a)bc(d3)4',str1)

result:

['12abcd34']
   ['2a']
   [('2a', 'd3')]

3, re module's main method: findall (), finditer (), match (), search (), compile (), split (), sub (), subn ().

re.findall(pattern,string,flags = 0)

Action: from left to right and pattern matching search string in the string, the result returned in list form.

str1 = 'ab cd'
print re.findall('\w+',str1)

Results: [ 'ab', 'cd']

re.finditer(pattern,string,flags = 0)

Role: its function and re.findall same, but the results returned as iterator.

str1 = 'ab cd'
iter1 = re.finditer('\w+',str1)
for a in iter1:
  print a.group(),a.span()

result:

Ab (0, 2)
Cd (3, 5)

(Note: a.group () Returns a string satisfies matching adjustment, a.span () returns the starting position and the end position of the string)

re.search(pattern,string,flags = 0)

Action: from left to right and pattern matching search string in string, no match result is returned None, otherwise a search instance.

str1 = 'ab cd'
result = re.search('cd',str1)
if result == None:
  print 'None'
else:
  print result.group(),result.start(),result.end()

The results: cd 3 5

re.match(pattern,string,flags = 0)

Action: determining whether the head and the string matches the pattern, a match instance is returned, otherwise None.

str1 = 'ab cd'
result = re.match('cd',str1)
if result == None:
  print 'None'
else:
  print result.group(),result.start(),result.end()

The results: None

re.compile(pattern,flags = 0)

Effect: pattern matching compiled format, returns an instance of an object. Regular expression to compile, can significantly improve matching speed.


str1 = 'ab cd'
pre = re.compile('ab')
print pre.findall(str1)

Results: [ 'ab']

re.split(pattern,string,maxsplit = 0,flags = 0)

Role: When do split the string pattern matching:

str1 = 'ab.c.de'
str2 = '12+34-56*78/90'
print re.split('\.',str1)
print re.split('[\+\-\*/]',str2)

result:

[ 'ab', 'c', 'a']
[ '12', '34', '56', '78', '90']

re.sub(pattern,repl,string,count = 0,flags = 0)

Action: in which the replacement string satisfies a regular pattern to the repl string:

str1 = 'abcde'
print re.sub('bc','123',str1)

The results: a123de

re.subn(pattern,repl,string,count = 0,flags = 0)

Role: its function and re.sub () the same, but the results returned more than a number that represents the number of replacement

str1 = 'abcdebce'
print re.subn('bc','123',str1)

Results :( 'a123de123e', 2)

We recommend learning Python buckle qun: 774711191, look at how seniors are learning! From basic web development python script to, reptiles, django, data mining, etc. [PDF, actual source code], zero-based projects to combat data are finishing. Given to every little python partner! Every day, Daniel explain the timing Python technology, to share some of the ways to learn and need to pay attention to small details, click on Join us python learner gathering

Published 10 original articles · won praise 0 · Views 1870

Guess you like

Origin blog.csdn.net/haoxun05/article/details/104286450