Given an English text, how to use words, punctuation marks, etc. as separation units, separated by spaces.

Table of contents

Problem Description:

problem solved:


Problem Description:

English is divided by spaces as separators, but other characters such as punctuation marks are not separated from English words. If you want to realize that English text is separated by special characters such as words or punctuation marks,

problem solved:

Define a function process_sentence(), input the text sentence to be processed, and get the sentence after processing

import re

def process_sentence(sentence):
    # 定义要添加空格的特殊字符, 比如’s
    special_chars = [',', '.', '\'', '’', '“', '”', '(', ')', '[', ']', '{', '}', ':', ';', '?', '!'] # '-', 因为sub,obj中存在很多以'-'为连字符的sub,obj,所以原始句子中,这部分不可以加空格


    # 在特殊字符前添加空格
    for char in special_chars:
        if char == '(': # 特别的,左括号是在后面加空格
            sentence = sentence = re.sub(rf'([{char}])', r'\1 ', sentence)
        else:
            sentence = re.sub(rf'([{char}])', r' \1', sentence)
    return sentence

sentence = "You'll notice a significant difference in height between this compact camera and a DSLR camera."
new_sentence = process_sentence(sentence)
new_sentence

The result of the operation is:

 

Guess you like

Origin blog.csdn.net/weixin_41862755/article/details/131648070