python3小技巧之:妙用string.punctuation

前言

    在对字符串操作时,如果感觉自己写的很复杂时,可以试试string模块,里面有很多实用的属性。

>>> import string
>>> dir(string)
['Formatter', 'Template', '_ChainMap', '_TemplateMetaclass', '__all__', '__built
ins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__packag
e__', '__spec__', '_re', '_string', 'ascii_letters', 'ascii_lowercase', 'ascii_u
ppercase', 'capwords', 'digits', 'hexdigits', 'octdigits', 'printable', 'punctua
tion', 'whitespace']
>>> string.ascii_lowercase  #所有的小写字母
'abcdefghijklmnopqrstuvwxyz'
>>> string.ascii_uppercase  #所有的大写字母
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> string.hexdigits        #所有的十六进制字符
'0123456789abcdefABCDEF'
>>> string.whitespace       #所有的空白字符
' \t\n\r\x0b\x0c'
>>> string.punctuation      #所有的标点字符
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

问题

   统计一个文件或一个字符串中所有单词出现的次数。由于句子中存在标点符号,直接对字符串切割的话会把单词和标点切割在一起,比如:

   如果指定标点符号切割的话,如果句子很长,里面有很多种标点的话操作起来相当的麻烦。

解决方法

思路:先把句子中的标点符号统一替换为空格,然后在split()切割即可搞定。这时候就可以用上string.punctuation

上代码:

import string    #注意使用前要先将string模块导入

>>> s="We met at the wrong time, but separated at the right time. The most urgen
t is to take the most beautiful scenery!!! the deepest wound was the most real e
motions."
>>> for i in s:
...     if i in string.punctuation:  #如果字符是标点符号的话就将其替换为空格
...         s = s.replace(i," ")
...
>>> s
'We met at the wrong time  but separated at the right time  The most urgent is t
o take the most beautiful scenery    the deepest wound was the most real emotion
s '
>>> s.split()#按空白切割
['We', 'met', 'at', 'the', 'wrong', 'time', 'but', 'separated', 'at', 'the', 'ri
ght', 'time', 'The', 'most', 'urgent', 'is', 'to', 'take', 'the', 'most', 'beaut
iful', 'scenery', 'the', 'deepest', 'wound', 'was', 'the', 'most', 'real', 'emot
ions']
>>>

当然这个问题也可以用正则解决:

>>> import re
>>> s="We met at the wrong time, but separated at the right time. The most urgen
t is to take the most beautiful scenery!!! the deepest wound was the most real e
motions."
>>> re.findall(r'\b\w+\b',s)
['We', 'met', 'at', 'the', 'wrong', 'time', 'but', 'separated', 'at', 'the', 'ri
ght', 'time', 'The', 'most', 'urgent', 'is', 'to', 'take', 'the', 'most', 'beaut
iful', 'scenery', 'the', 'deepest', 'wound', 'was', 'the', 'most', 'real', 'emot
ions']

解决一个问题的方法有很多种,可以多尝试几种,锻炼自己的思维。在字符串操作的时候,如果感觉写的很麻烦的话,一定记得string模块,看是否能更加简单的解决问题。

猜你喜欢

转载自blog.csdn.net/kongsuhongbaby/article/details/83181768