第二章收尾，这里查看效果更佳噢！

2.13 字符串对齐

问题：如何通过某种对其方式格式化字符串
方案：对于基本的对齐方式可以使用ljust()、rjust()、center()方法

text = 'Hello world'
text.ljust(20)

'Hello world         '

text.rjust(20)

'         Hello world'

text.center(20)

'    Hello world     '

上述方法接收一个可选的参数，用于填充字符

text.center(20,'-')

'----Hello world-----'

函数format()也可以实现，需要使用>,<或者^字符后面紧跟一个指定的宽度

format(text,'>20')

'         Hello world'

format(text,'<20')

'Hello world         '

format(text,'^20')

'    Hello world     '

同样可以指定填充字符

format(text,'*>20')

'*********Hello world'

format(text,'*^20')

'****Hello world*****'

当需要格式化多个值的时候，也可以使用format()

'{:*>10s}{:->10s}'.format('hello','world')

'*****hello-----world'

format() 还可以格式化任何值，不仅仅局限于字符串

x = 1.23545
format(x,'>10')

'   1.23545'

format(x,'^10.2f')

'   1.24   '

老的代码也有是使用%，来格式化

'%-20s'%text

'Hello world         '

'%20s'%text

'         Hello world'

2.14合并拼接字符串

问题：将几个字符串拼接成一个
方案：如果要合并的字符串是在一个序列或者iterable中，最快的就是使用join()函数

parts = ['Is','Chicago','Not','Chicago']
' '.join(parts)

'Is Chicago Not Chicago'

','.join(parts)

'Is,Chicago,Not,Chicago'

如果只是连接少量的字符串，使用 + 就可以
使用+ 连接大量字符串是非常低效的，因为+会引起内存复制以及垃圾回收操作

a = 'Is Chicago'
b = 'Not Chicago'
a+' '+b

'Is Chicago Not Chicago'

如果想要将两个字面的字符串合并，只需要将他们放在一起就可以了

a = 'Hello' 'World'
a

'HelloWorld'

永远不要用下面的方法连接字符串

s = ' '
for p in parts:
    s += p

一个小技巧：可以使用生成器表达式转换数据为字符串的同时合并字符串

data = ['AVMC',100,23.33]
','.join(str(d) for d in data)

'AVMC,100,23.33'

a = 'hello'
b = 'world'
c = 'haha'
print(a + ':' + b + ':' + c)#ugly
print(':'.join([a,b,c]))#ugly too
print(a,b,c,sep=':')#better,but...

hello:world:haha
hello:world:haha
hello:world:haha

使用I/O操作和字符串拼接的技巧
如果两个字符串很小，那么第一个版本性能会更好些，因为 I/O 系统调用天生就慢。另外一方面，如果两个字符串很大，那么第二个版本可能会更加高效，因为它避免了创建一个很大的临时结果并且要复制大量的内存块数据

#version 1
f.write(str1 + str2)

#version 2
f.write(str1)
f.write(str2)

如果想要师实现大量小字符串的输出，最好使用生成器，利用yield语句产生输出片段

def sample():
    yield 'Is'
    yield 'Chicago'
    yield 'Not'
    yield 'Chicago'
text = ' '.join(sample())
text

'Is Chicago Not Chicago'

或者将字符串片段重定向到I/O

for part in parts:
    f.write(part)

也可以编写一个混合的方案

def combine(source,maxsize):
    parts = []
    size = 0
    for part in source:
        parts.append(part)
        size += len(part)
        if size > maxsize:
            yield ''.join(parts)
            parts = []
            size = 0
        yield ''.join(parts)

def sample():
    yield 'Is'
    yield 'Chicago'
    yield 'Not'
    yield 'Chicago'

with open('data_file\ch2_3_test','w') as f :
    for part in combine(sample(),32768):
        f.write(part)

2.15字符串中插入变量

问题：创建一个内嵌变量的字符串，变量被它的值所代表的字符串替换
使用format() 函数

s = '{name} has {n} messages'
s.format(name='Gucci',n=34)

'Gucci has 34 messages'

如果要被替换的变量能够在变量域中找到，可以使用format_map()和vars()

name = 'Gucci'
n = 34
s.format_map(vars())

'Gucci has 34 messages'

vars()还可以用于对象实列

class Info:
    def __init__(self,name,n):
        self.name = name
        self.n = n

a = Info('GUUCI',23)
a.name

'GUUCI'

s.format_map(vars(a))

'GUUCI has 23 messages'

format和format_map()的缺陷就是不能处理变量缺失的情况

s.format(name = 'Jack')

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

<ipython-input-45-fd35ed0a4233> in <module>()
----> 1 s.format(name = 'Jack')


KeyError: 'n'

一种方式可以另外定义一个__missing__方法的字典对象,然后包装输入后送给format_map()

class safesub(dict):
    def __missing__(self,key):
        return '{'+ key +'}'

#del n# 确定n没有定义
s.format_map(safesub(vars()))

'Gucci has {n} messages'

2.16以指定列宽格式化字符串

问题：有一个长的字符串，想要以指定的列宽重新格式化他们
方案：可以使用textwrap（）模块

s = "Look into my eyes, look into my eyes,the eyes,the eyes,\
the eyes,not around the eyes,don't look around the eyes,\
look into my eyes,you're under."

import textwrap
print(textwrap.fill(s,70))

Look into my eyes, look into my eyes,the eyes,the eyes,the eyes,not
around the eyes,don't look around the eyes,look into my eyes,you're
under.

print(textwrap.fill(s,40))

Look into my eyes, look into my eyes,the
eyes,the eyes,the eyes,not around the
eyes,don't look around the eyes,look
into my eyes,you're under.

print(textwrap.fill(s,40,initial_indent="    "))

    Look into my eyes, look into my
eyes,the eyes,the eyes,the eyes,not
around the eyes,don't look around the
eyes,look into my eyes,you're under.

print(textwrap.fill(s,40,subsequent_indent='    '))

Look into my eyes, look into my eyes,the
    eyes,the eyes,the eyes,not around
    the eyes,don't look around the
    eyes,look into my eyes,you're under.

textwrap() 函数对于字符串打印很有用，特别是当你希望输出自动匹配终端大小的时候

import os
os.get_terminal_size().columns

2.17在字符串中处理HTML和xml

你想将 HTML 或者 XML 实体如 &entity; 或 &#code; 替换为对应的文本。再者，你需要转换文本中特定的字符 (比如 <, >, 或 &)。
使用html.escape()

s = 'Element are written as "<tag>text</tag>".'
import html
print(s)

Element are written as "<tag>text</tag>".

print(html.escape(s))

Element are written as &quot;&lt;tag&gt;text&lt;/tag&gt;&quot;.

print(html.escape(s,quote=False))

Element are written as "&lt;tag&gt;text&lt;/tag&gt;".

如果你正在处理的是 ASCII 文本，并且想将非 ASCII 文本对应的编码实体嵌入进去，可以给某些 I/O 函数传递参数 errors=’xmlcharrefreplace’ 来达到这个目。

s = 'Spicy Jalapeño'
s.encode('ascii',errors = 'xmlcharrefreplace')

b'Spicy Jalape&#241;o'

s = 'Spicy &quot;Jalape&#241;o&quot.'
from html.parser import HTMLParser
p = HTMLParser()
p.unescape(s)

d:\program filles\python\lib\site-packages\ipykernel_launcher.py:4: DeprecationWarning: The unescape method is deprecated and will be removed in 3.5, use html.unescape() instead.
  after removing the cwd from sys.path.





'Spicy "Jalapeño".'

t = 'The prompt is &gt;&gt;&gt;'
from xml.sax.saxutils import unescape
unescape(t)

'The prompt is >>>'

2.18字符串令牌解析

有一个字符串，想从左至右将其解析为一个令牌流
方案：见下面的例子
\? P\< TOKENNAME>用于给一个模式命名

text = 'foo = 23 + 42 * 10'
tokens = [('NAME','foo'),('EQ','='),('NUM','23'),
          ('PLUS','+'),('NUM','42'),('TIMES','*'),('NUM','10')]
import re 
NAME = r'(?P<NAME>[a-zA-Z][a-zA-Z_0-9]*)'
NUM = r'(?P<NUM>\D+)'
PLUS = r'(?P<PLUS>\+)'
TIMES = r'(?P<TIMES>\*)'
EQ = r'(?P<EQ>=)'
WS = r'(?P<WS>\s)'

master_pat = re.compile('|'.join([NAME,NUM,PLUS,TIMES,EQ,WS]))
scanner = master_pat.scanner('foo=42')
scanner.match()

<_sre.SRE_Match object; span=(0, 3), match='foo'>

scanner.match()

<_sre.SRE_Match object; span=(3, 4), match='='>

_.lastgroup,_.group()

('NUM', '=')

scanner.match()

2.19没看懂，以后补上

2.20 字节字符串上的字符串操作

问题：如何在字节字符串上执行普通的文本操作（移除，搜索等）
方案：采用内置的操作，具体如下：

data = b'Hello World'
data[0:5]

b'Hello'

data.startswith(b'Hell')

True

data.split()

[b'Hello', b'World']

data.replace(b'Hello',b'Hello Jack')

b'Hello Jack World'

上述操作同样适用于字节数组

data = bytearray(b'Hello World')
data[0:4]

bytearray(b'Hell')

data.startswith(b'Hello')

True

data.split()

[bytearray(b'Hello'), bytearray(b'World')]

data.replace(b'Hello',b'Peace')

bytearray(b'Peace World')

也可以使用正则表达式，但是正则表达式本身也需是字节串

data = b'FOO:BAR,SPAM'
import re 
re.split('[:,]',data)

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-104-ca780206615e> in <module>()
      1 data = b'FOO:BAR,SPAM'
      2 import re
----> 3 re.split('[:,]',data)


d:\program filles\python\lib\re.py in split(pattern, string, maxsplit, flags)
    210     and the remainder of the string is returned as the final element
    211     of the list."""
--> 212     return _compile(pattern, flags).split(string, maxsplit)
    213 
    214 def findall(pattern, string, flags=0):


TypeError: cannot use a string pattern on a bytes-like object

re.split(b'[:,]',data)

[b'FOO', b'BAR', b'SPAM']

多数情况下，文本字符串上的操作均可用于字节字符串，但是也有区别：字节字符串的索引操作返回整数而不是单独的字符

a = 'Hello World'
b = b'Hello World'
print(a[0],'   ',b[0])
print(a[1],'   ',b[1])

H     72
e     101

另外，字节字符串不能提供美观的字符串表示，除非先解码

print(b)

b'Hello World'

print(b.decode('ascii'))

Hello World

同样，字节字符串不能格式化输出

Python Cookbook学习笔记ch2_03

第二章收尾，这里查看效果更佳噢！

2.13 字符串对齐

2.14合并拼接字符串

2.15字符串中插入变量

2.16以指定列宽格式化字符串

2.17在字符串中处理HTML和xml

2.18字符串令牌解析

2.19没看懂，以后补上

2.20 字节字符串上的字符串操作

猜你喜欢