1. Topics

There are 2 Python source files and 3 text files in the attachment, which correspond to two problems respectively. Please refer to the py file of the programming template and modify the code to achieve the following functions: "The Analects of Confucius" is one of the classic works of the Confucian school, which mainly records the words and deeds of Confucius and his disciples. Here is an online version of "The Analects of Confucius". The file name is "The Analects of Confucius.txt". (There are a total of 2 questions in this question, and the first question is answered locally)

Question 1: Modify the code in the PY301-1.py file , extract the original content in the "Analects of Confucius.txt" file, and save the output to the examinee's folder, and the file name is "Analects of Confucius-Original Text.txt". Specific requirements: Only keep all the content under the [Original Text] label in the "Analects of Confucius.txt" file, do not keep the label, and remove the spaces at the beginning and end of each line, without blank lines. The parentheses and internal numbers in the original text are the marks of comment items in the source file, please keep them. For the sample output file format, please refer to the "Analects-Original Text-Output Example.txt" file. Note: The sample output files are only for understanding the output format, not for other purposes. ‪‬‪‬‪‬‪‬‪‬‮‬‫‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬

Question 2: Modify the code in the PY301-2.py file , further purify the "Analects-Original Text.txt" or "The Analects of Confucius.txt" file, remove all parentheses and internal numbers in each line of text, and save it as a "Thesis-Purified Original Text.txt" file. For the sample output file format, please refer to the "Analects-Purified Original Text-Output Example.txt" file. Note: The sample output files are only for understanding the output format, not for other purposes. ‪‬‪‬‪‬‪‬‪‬‮‬‫‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬

Tip: It is recommended to use the Python integrated development environment IDLE to write, debug and verify programs

2. Answer to Question 1

1. The official solution

Question 1. Extracting the content of the area behind [Original Text] is different from single-line extraction. The text of the area extraction needs to process several lines. To this end, it is necessary to create a tag, which is the a parameter. When the word "[Original Text]" is encountered, mark a as 1, and when other lines in this area are subsequently read, it will be processed according to the text of the corresponding area block of [Original Text]. When the [comment] mark is encountered, mark a as 0, indicating that the program has left the [original text] area. According to the value of variable a, determine whether to output the text content to a new file.

# -*- coding:utf-8 -*-
'''
This is a python123.io file.
'''
fi = open("论语.txt", "r")
fo = open("论语-原文.txt", "w")
a=0
for line in fi:
    if a==1 and line.count("【注释】")==0 and line.count("【原文】")==0:
        line = line.strip(" \n")
        if line.strip():#判断line是否为空串
            fo.write('{}\n'.format(line))
    if line.count("【原文】")>0:
        a=1
    if line.count("【注释】")>0:
        a=0
fi.close()
fo.close()

2. Personal Optimized Edition

Replace line.count() with not in, and replace line.count("【Original】")>0 with line.strip()=="【Original】". The optimized version is as follows:

fi = open("论语.txt", "r",encoding="utf-8")
fo = open("论语-原文.txt", "w")
flag = 0
for line in fi:
    if flag == 1 and "【原文】" not in line and "【注释】" not in line :
        line = line.strip(" \n")
         if line.strip():
              fo.write(line.strip()+"\n")
    if line.strip() == "【原文】":
        flag = 1
    if line.strip() == "【注释】":
        flag = 0
fi.close()
fo.close()

3. Regular expression solution

We use re.findall() in regular expressions to find the content between [Original Text] and [Comment], and then format it and output it. The code is as follows:

import re
with open("论语.txt", "r",encoding="utf-8") as fi, open("论语-原文.txt", "w",encoding="utf-8") as f:
    text = re.findall("(原文】)(.*?)(【)",fi.read(),re.S)
    for sen in text:
        lst = list(sen[1:-1])
        for elm in lst:
            f.write(elm.strip()+"\n")

2. Answer to Question 2

The requirement is to remove brackets and numbers, which is simpler than the previous question

1. Official solution: use replace to replace

fi = open("论语-原文.txt",'r')
fo = open("论语-提纯原文.txt",'w')
for line in fi:
    for k in range(100):
        line=line.replace('(' + str(k) + ')', '')
    fo.write(line)
fi.close()
fo.close()

2. Personal solution: use re.sub() to replace

Directly use re.sub(pattern, replacement, string) to replace the number with brackets with empty, the code is as follows:

import re
fi = open("论语-原文.txt",'r',encoding="utf-8")
fo = open("论语-提纯原文.txt",'w',encoding="utf-8")
text = re.sub("\(\d+\)","",fi.read())
fi.close()
fo.close()

4. Matters needing attention

It is relatively simple to replace with regular expressions, and re is a built-in module, no additional installation is required.
It is not necessary to add the parameter encoding = "utf-8" when doing questions online, but it must be used during the exam.
It is more convenient to open the file with open(). The method of defining the handle is also possible, but be sure to close it at the end.

Python Level 2 Comprehensive Application Questions: "The Analects of Confucius"