python3-cookbook Note: chapter II (character string and text)

python3-cookbook to each bar in question, solutions and discusses the three components discussed Python3 optimal solution in certain problems, or is to explore data structure, functions, etc. characteristics of certain problems in itself Python3 on how to better use. This book is for use Python3 deepen and enhance the Python programming capabilities have a significant help, especially on how to improve the performance of Python programs will have a good help if you have time to look at is strongly recommended.

This article is the study notes, text content only according to their operational needs and usually use the written part of the contents of the book, and most of the sample code in the text directly attached to the original code, though, Python3.6 code on the verification environment after the. Interested parties can see the full text.

python3-cookbook：https://python3-cookbook.readthedocs.io/zh_CN/latest/index.html

2.1 a plurality of divided string delimiter

General string splitting str.split competent enough, but look at the complex text string split, the regular expression is undoubtedly the first choice of tools, re module also has a split function split the string, it is necessary to note that the regular expression If you type in parentheses grouping, the result will be grouped in the results list.

>>> Import Re
 >>> Line = ' asdf fjdk; afed, fjek, asdf, foo ' 
>>> re.split (R & lt ' [;, \ S] \ S * ' , Line) 
[ ' asdf ' , ' fjdk ' , ' afed ' , ' fjek ' , ' asdf ' , ' foo ' ]
 >>> = re.split Fields (R & lt ' ; (| |, \ S) \ S * ' , Line)   # content packets are also It will appear in the result 
>>> fields
['asdf', ' ', 'fjdk', ';', 'afed', ',', 'fjek', ',', 'asdf', ',', 'foo']
>>>

2.3 with a wildcard match string Shell

When the matching string can not satisfy the general procedure, but do not want to use regular expressions so complex, consider using fnmatch.fnmatch or fnmatch.fnmatchcase, both Unix Shell may be used in common wildcard match string, except that the former uses the rule is case-sensitive operating system, which is exactly as you write content to match.

>>> from fnmatch import fnmatch, fnmatchcase
>>> fnmatch('foo.txt', '*.txt')
True
>>> fnmatch('foo.txt', '?oo.txt')
True
>>> fnmatch('Dat45.csv', 'Dat[0-9]*')
True
>>>

2.13 string alignment

Align the string is part of a string formatted for ordinary left, right can align and center ljust, rjust center and string methods can be used built-in functions and format string format method, the paper recommended use format, because the latter is more rich and powerful on the format function string.

Align the work string does not seem common, but I met a usage scenario is to use the binary string representation, you may need to use 0 or 1 string filled to 8 or 16 strings , time alignment function on string rafts handy.

>>> text = 'Hello World'
>>> text.ljust(20)
'Hello World         '
>>> text.rjust(20)
'         Hello World'
>>> text.center(20)
'    Hello World     '
>>> text.rjust(20, '=')
'=========Hello World'
>>> text.center(20, '*')
'****Hello World*****'
>>>

>>> # 格式化字符串
>>> format(text, '>20')
'         Hello World'
>>> format(text, '<20')
'Hello World         '
>>> format(text, '^20')
'    Hello World     '
>>> format(text, '=<20s')
'Hello World========='
>>> format(text, '*^20s')
'****Hello World*****'
>>> #Format numbers 
>>> X = 1.2345 
>>> the format (X, ' ^ 10.2f ' )
 '    1.23    ' 
>>> # the format string methods 
>>> ' {:>} {10s:> 10s} ' . the format ( ' the Hello ' , ' World ' )
 '      the Hello World '

Column width 2.16 to specify the format string

The problems you may encounter when printing or display the information in the information terminal, in which case you can use textwrap to specify the output column width.

>>> import textwrap
>>> s = "Look into my eyes, look into my eyes, the eyes, the eyes, the eyes, not around the eyes, don't look around the eyes, look into my eyes, you're under."
>>> print(textwrap.fill(s, 70))
Look into my eyes, look into my eyes, the eyes, the eyes, the eyes,
not around the eyes, don't look around the eyes, look into my eyes,
you're under.
>>> print(textwrap.fill(s, 40))
Look into my eyes, look into my eyes,
the eyes, the eyes, the eyes, not around
the eyes, don't look around the eyes,
look into my eyes, you're under.
>>> print(textwrap.fill(s, 40, initial_indent='    '))
    Look into my eyes, look into my
eyes, the eyes, the eyes, the eyes, not
around the eyes, don't look around the
eyes, look into my eyes, you're under.
>>> print(textwrap.fill(s, 40, subsequent_indent='    '))
Look into my eyes, look into my eyes,
    the eyes, the eyes, the eyes, not
    around the eyes, don't look around
    the eyes, look into my eyes, you're
    under.
>>>

2.17 html and xml processing in the string

In dealing with HTML or XML text you want to as & entity; or & # code; replaced with a corresponding text, or vice versa operation, just use the corresponding parser utility functions can, of course, if you are familiar with the corresponding parser, then perhaps there is a better way.

>>> import html
>>> s = 'Elements are written as "<tag>text</tag>".'
>>> print(s)
Elements are written as "<tag>text</tag>".
>>> print(html.escape(s))
Elements are written as &quot;&lt;tag&gt;text&lt;/tag&gt;&quot;.
>>> print(html.escape(s, quote=False))
Elements are written as "&lt;tag&gt;text&lt;/tag&gt;".
>>> 
>>> from html.parser import HTMLParser
>>> s = 'Spicy &quot;Jalape&#241;o&quot.'
>>> p = HTMLParser()
>>> p.unescape(s)
'Spicy "Jalapeño".'
>>> 
>>> from xml.sax.saxutils import unescape
>>> t = 'The prompt is &gt;&gt;&gt;'
>>> unescape(t)
'The prompt is >>>'
>>>

2.18 string token analyzing

Token strings can use regular expressions to name capturing group syntax is "(? P <group_name>)", to resolve this issue in a user-defined string formula would be useful.

Solving this problem, a method can be considered to use scanner model object, and packed into a builder used.

import re

NAME = r'(?P<NAME>[a-zA-Z_][a-zA-Z_0-9]*)'
NUM = r'(?P<NUM>\d+)'
PLUS = r'(?P<PLUS>\+)'
TIMES = r'(?P<TIMES>\*)'
EQ = r'(?P<EQ>=)'
WS = r'(?P<WS>\s+)'

master_pat = re.compile('|'.join([NAME, NUM, PLUS, TIMES, EQ, WS]))
scanner = master_pat.scanner('foo = 42')
for m in iter(scanner.match, None):
    print(m.lastgroup, m.group())

NAME foo
WS  
EQ =
WS  
NUM 42

python3-cookbook Note: chapter II (character string and text)

Guess you like