Python extracts the content of a certain area in a file

1. Problem description:

There are 2 Python source files and 3 text files in the candidate's folder, corresponding to two questions respectively. Please modify the code according to the instructions in the file to achieve the following functions:
"The Analects" is one of the classic works of the Confucian school, mainly recording Confucius The words and deeds of his disciples. Here is an online version of "The Analects", the file name is "The Analects.txt", and its content is organized in the form of a combination of sentence-by-sentence "original" and sentence-by-sentence "notes", and the original text of "The Analects" is marked by [Original] For the content, mark the annotation content of "The Analects" through [Comment], please refer to "The Analects of Confucius txt" file for its body file format frame
Question 1 (10 points): Modify the code in the PY301-1.py file, extract the original content in the "The Analects of Confucius.txt" file, and save the output to the candidate's folder with the file name "The Analects of Confucius-Original txt. Specific requirements: Only keep all the content under the [original] tags in the "Analects of Confucius.txt", do not keep the tags, and remove the spaces at the beginning and end of each line, and no blank lines. The original parentheses and internal numbers are the comment items in the source file Please keep the mark of the sample output file. Please refer to the "Analects-Original-Output Example.txt" file for the sample output file format. Note: The sample output file helps candidates understand the output format and does not use it for other purposes.
Question 2 (10 points): in PY301-2 Modify the code in the .py file to further refine the "The Analects-Original txt" or "The Analects-txt" file, remove all parentheses and internal numbers in each line of text, and save it as a "Thesis-Purified Original Text-.txt" file. Example For the output file format, please refer to the "The Analects of Confucius-Purified Original-Output Example.txt" file. Note: The sample output file helps candidates understand the output format, and is not used for other purposes.

Source: This question is the last comprehensive application question in the second level of python computer

2. Thinking analysis:

① Question 1: Mainly to master the method of reading a txt file and extracting the content of the file and writing it into another txt. The difference from the previous operation of reading the file is that the content of a certain area in the file needs to be extracted here. So we need to judge the content read in the file, here we can use the readline() method of reading line by line or directly for traversing the object obtained by the open() method to get each line of data, and because we need to extract certain areas The content of the text, that is, several lines of text need to be processed. At this time, you need to use a variable to mark. When you encounter the [original], mark the variable a as 1, and then read other lines in the area according to the [original] The text of the area block is processed. When you encounter [comment], you need to set the value of the tag variable a to 0 to indicate that the program has left the [original] area. Determine whether to output the text content to the new one according to the value of the variable File. The method provided in the answer is a direct for loop to traverse the file object opened in open(), without any method of reading the file, which is not the same as the previous file reading method, and it is worth learning , When traversing the open() file object, it feels that it is read according to each line

② Question 2: On the basis of question 1, further clean up the extracted original content, remove the parentheses and internal numbers, such as (1) (2), etc. A simple idea is to replace the above-mentioned strings one by one with empty strings. This replacement is equivalent to deleting the above-mentioned strings, and you can use the replace function to replace them.

③ The method of directly for loop to traverse the file object returned by the open() method can also be learned. To a certain extent, it is equivalent to reading the entire file and then traversing the content of each line in the file line by line

3. The code is as follows:

Question 1:

Written by myself:

if __name__ == '__main__':
    fi = open("论语.txt", "r")
    fo = open("论语-原文.txt", "w")
    txt = fi.readline()
    flag = False
    # readline方法读取可以直接判断读取到的内容是否为空来判断是否到达了文件末尾
    while txt:
        if "【原文】" in txt:
            flag = True
            txt = fi.readline()
            continue
        if "【注释】" in txt:
            flag = False
        txt = txt.strip("\n")
        if flag and txt:
            fo.write(txt + "\n")
        txt = fi.readline()
    fi.close()
    fo.close()

Answer provided:

if __name__ == '__main__':
    fi = open("论语.txt", "r")
    fo = open("论语-原文.txt", "w")
    flag = False
    # 直接for循环遍历open()文件对象
    for line in fi:
        if "【原文】" in line:
            flag = True
            continue
        if "【注释】" in line:
            flag = False
        line = line.strip("\n")
        if flag and line:
            fo.write(line + "\n")
    fi.close()
    fo.close()

Question 2:

if __name__ == '__main__':
    fi = open("论语-原文.txt", "r")
    fo = open("论语-提纯原文.txt", "w")
    # 直接遍历open文件对象也属于逐行读取
    for line in fi:
        # 遍历每一行得到的是str字符串类型
        for i in range(23):
            line = line.replace("(" + str(i) + ")", "")
        fo.write(line)
    fi.close()
    fo.close()

 

Guess you like

Origin blog.csdn.net/qq_39445165/article/details/115218309