目录
问题描述:
英文是以空格为分隔符进行划分,但是标点符号等其他字符与英文单词之间并没用分隔开来。若想实现英文文本以单词或标点符号等特殊字符为分隔单位,
问题解决:
定义一个函数process_sentence(),输入待处理的文本sentence,得到处理之后的sentence'
import re
def process_sentence(sentence):
# 定义要添加空格的特殊字符, 比如’s
special_chars = [',', '.', '\'', '’', '“', '”', '(', ')', '[', ']', '{', '}', ':', ';', '?', '!'] # '-', 因为sub,obj中存在很多以'-'为连字符的sub,obj,所以原始句子中,这部分不可以加空格
# 在特殊字符前添加空格
for char in special_chars:
if char == '(': # 特别的,左括号是在后面加空格
sentence = sentence = re.sub(rf'([{char}])', r'\1 ', sentence)
else:
sentence = re.sub(rf'([{char}])', r' \1', sentence)
return sentence
sentence = "You'll notice a significant difference in height between this compact camera and a DSLR camera."
new_sentence = process_sentence(sentence)
new_sentence
运行结果是: