python: split English words, punctuation marks, etc. by spaces

Table of contents

Problem Description:

problem solved:


Problem Description:

Common English sentences use spaces as separators to split words. There will be no spaces between punctuation marks and words. In particular, if two words are connected by ', they will be mistaken for one word.
The purpose of this program is to convert English sentences strictly to words, punctuation marks, etc., and to use spaces as separators.

problem solved:

Given a sentence as follows:

sentence = "The Sony A7 III's write speed, is faster than my previous camera (the Canon EOS've Rebel T6) allowing me."

import re

sentence = "The Sony A7 III's write speed, is !faster than my previous camera (the Canon EOS've Rebel T6) allowing me."

# 定义要添加空格的特殊字符
special_chars = [',', '.', '\'', '’', '“', '”', '(', ')', '[', ']', '{', '}', ':', ';', '?', '!', '-', '--']


# 在特殊字符前添加空格
for char in special_chars:
    if char == '(': #特别的,左括号是在后面加空格
        sentence = sentence = re.sub(rf'([{char}])', r'\1 ', sentence)
    else:
        sentence = re.sub(rf'([{char}])', r' \1', sentence)

print(sentence)

operation result:

Guess you like

Origin blog.csdn.net/weixin_41862755/article/details/130636572