The string processing python

Split string

The division of a delimiter

>>> a = '1,2,3,4'
>>> a.split(',')
['1', '2', '3', '4']

The plurality of divided delimiter

Line = >>> ' asdf fjdk; afed, fjek, asdf, foo ' 
>>> Import re
>>> re.split (R & lt ' [;, \ S] \ S * ' , Line) # delimiter matching with re ,
[ ' asdf ' , ' fjdk ' , ' afed ' , ' fjek ' , ' asdf ' , ' foo ' ]

If you keep these delimiters in the results list, you can capture packets:

>>> fields = re.split(r'(;|,|\s)\s*', line)
>>> fields
['asdf', ' ', 'fjdk', ';', 'afed', ',', 'fjek', ',', 'asdf', ',', 'foo']

If you do not keep these delimiters, but would like to use grouping regular expressions, you can use the non-capturing group:

>>> re.split(r'(?:,|;|\s)\s*', line)
['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']

 

Match the beginning or end of a string

Check if the start or end of a string available startswith () and endswith () to a character:

>>> filename = 'spam.txt'
>>> filename.endswith('.txt')
True
>>> filename.startswith('file:')
False
>>> url = 'http://www.python.org'
>>> url.startswith('http:')
True

If you check a variety of possible matches, you can pass a tuple matches:

>>> import os
>>> filenames = os.listdir('.')
>>> filenames
[ 'Makefile', 'foo.c', 'bar.py', 'spam.c', 'spam.h' ]

>>> [name for name in filenames if name.endswith(('.c', '.h')) ]
['foo.c', 'spam.c', 'spam.h'
>>> any(name.endswith('.py') for name in filenames)
True

Other means can match with a slice or re:

>>> url = 'http://www.python.org'
>>> url[:5] == 'http:' or url[:6] == 'https:' or url[:4] == 'ftp:'
True
>>> import re
>>> url = 'http://www.python.org'
>>> re.match('http:|https:|ftp:', url)
<_sre.SRE_Match object at 0x101253098>

 

Shell wildcard match string:

 *   Match any number of characters, including 0
 ?  Matches any character, there must be a character
 [char]   Matches any character brackets
 [!char]  Matches any character characters in brackets do not belong
 [: Scooping]  Matches any letter or number
 [:alpha:]  Matches any one letter
 [:digit:]  Match any digit
 [:lower:]  Matches any lowercase letter
 [:upper:]  Matches any one capital letter

 

 

 

 

 

 

>>> from fnmatch import fnmatch, fnmatchcase
>>> fnmatch('foo.txt', '*.txt')
True
>>> fnmatch('foo.txt', '?oo.txt')
True
>>> fnmatch('Dat45.csv', 'Dat[0-9]*')
True
>>> names = ['Dat1.csv', 'Dat2.csv', 'config.This ','foo.py']
>>> [name for name in names if fnmatch(name, 'Dat*.csv')]
['Dat1.csv', 'Dat2.csv']

the fnmatch () function uses the underlying operating system case sensitive rule (not the same operating system) matching:

>>> # On OS X (Mac)
>>> fnmatch('foo.txt', '*.TXT')
False
>>> # On Windows
>>> fnmatch('foo.txt', '*.TXT')
True

If you are very concerned about this distinction, you can use fnmatchcase () instead. It is completely your use pattern matching. such as:

>>> fnmatchcase('foo.txt', '*.TXT')
False

  >>> fnmatchcase('foo.txt', '*.txt')
  True

 

This function is also useful in the treatment of non-filename string:

addresses = [
'5412 N CLARK ST',
'1060 W ADDISON ST',
'1039 W GRANVILLE AVE',
'2122 N CLARK ST',
'4802 N BROADWAY',
]
>>> from fnmatch import fnmatchcase
>>> [addr for addr in addresses if fnmatchcase(addr, '* ST')]
['5412 N CLARK ST', '1060 W ADDISON ST', '2122 N CLARK ST']
>>> [addr for addr in addresses if fnmatchcase(addr, '54[0-9][0-9] *CLARK*')]
['5412 N CLARK ST']

Summary: The ability fnmatch ranged between string methods and regular expressions, if the data processing simply wildcards can be done, or fnmatchcase fnmatch would be a good choice. If you need to do to match the file name, it is best to use glob module.

 

String matching and search

If only a simple string matching string enough to use a method, for example: str.find (), str.startswith (), str.endswith ().

For complex matching using regular expressions and need to re module:

>>> text1 = '11/27/2012'
>>> text2 = 'Nov 27, 2012'
>>>
>>> import re
>>> # Simple matching: \d+ means match one or more digits
>>> if re.match(r'\d+/\d+/\d+', text1):
... print('yes')
... else:
... print('no')
...
yes
>>> if re.match(r'\d+/\d+/\d+', text2):
... print('yes')
... else:
... print('no')
...
no
>>>

re.match () always starts from the character string to match, if matched, return Match object. If there is no match to return None.

If you want to reuse the same regular, pattern string can be compiled into the object model:

>>> datepat = re.compile(r'\d+/\d+/\d+')
>>> if datepat.match(text1):
... print('yes')
... else:
... print('no')
...
yes
>>> if datepat.match(text2):
... print('yes')
... else:
... print('no')
...
no

If you do not match from the beginning of the string, may be used the re.search () or re.findall (), re.search () Returns a Match object matched to the first position, if not matched, None is returned.

the re.findall () to put into the list of all matching strings returned.

 

In regular use, if the packet contains an expression, the re.findall () returns a list containing groups, groups is matched to a packet containing all tuples.

 

>>> m = datepat.match('11/27/2012')
>>> m
<_sre.SRE_Match object at 0x1005d2750>
>>> # Extract the contents of each group
>>> m.group(0)
'11/27/2012'
>>> m.group(1)
'11'
>>> m.group(2)
'27'
>>> m.group(3)
'2012'
>>> m.groups()
('11', '27', '2012')
>>> month, day, year = m.groups()
>>>
>>> # Find all matches (notice splitting into tuples)
>>> text
'Today is 11/27/2012. PyCon starts 3/13/2013.'
>>> datepat.findall(text)
[('11', '27', '2012'), ('3', '13', '2013')]
>>> for month, day, year in datepat.findall(text):
... print('{}-{}-{}'.format(year, month, day))
...
2012-11-27
2013-3-13

findall () will return results in a list, if you want to return in the form of iteration, you can use finditer ():

>>> for m in datepat.finditer(text):
... print(m.groups())
...
('11', '27', '2012')
('3', '13', '2013')

 

Search and replace strings

For a simple search and replace, you can use str.replace ():

>>> text = 'yeah, but no, but yeah, but no, but yeah'
>>> text.replace('yeah', 'yep')
'yep, but no, but yep, but no, but yep'

For complex search and replace, you can use re.sub ():

>>> text = 'Today is 11/27/2012. PyCon starts 3/13/2013.'
>>> import re
>>> re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', text)
'Today is 2012-11-27. PyCon starts 2013-3-13.'

Wherein the packet \ 3, point matching mode

For more complex substitutions, you can pass a callback function:

>>> from calendar import month_abbr
>>> def change_date(m):
... mon_name = month_abbr[int(m.group(1))]
... return '{} {} {}'.format(m.group(2), mon_name, m.group(3))
...
>>> datepat.sub(change_date, text)
'Today is 27 Nov 2012. PyCon starts 13 Mar 2013.'

The results after replacing outside, if you want to know how many replacement, you can use re.subn () instead:

>>> newtext, n = datepat.subn(r'\3-\1-\2', text)
>>> newtext
'Today is 2012-11-27. PyCon starts 2013-3-13.'
>>> n
2

If you want to match the time, ignoring the case, you can provide a flag argument to re, re.IGNORECASE:

>>> text = 'UPPER PYTHON, lower python, Mixed Python'
>>> re.findall('python', text, flags=re.IGNORECASE)
['PYTHON', 'python', 'Python']
>>> re.sub('python', 'snake', text, flags=re.IGNORECASE)
'UPPER snake, lower snake, Mixed snake'

This example has a small defect, and replacement string does not match the case of strings consistent, can be modified as follows:

def matchcase(word):
    def replace(m):
        text = m.group()
        if text.isupper():
            return word.upper()
        elif text.islower():
            return word.lower()
        elif text[0].isupper():
            return word.capitalize()
        else:
            return word
    return replace
>>> re.sub('python', matchcase('snake'), text, flags=re.IGNORECASE)
'UPPER SNAKE, lower snake, Mixed Snake'

 

Guess you like

Origin www.cnblogs.com/BeautifulWorld/p/11719709.html