Python batch merge text files

table of Contents

These days in Coursela on (a famous MOOC platform) enrolled in a course: Georgia Institute of Technology opened in Machine Design.

Machine Design is a highly professional course, mainly about the damage such as static and fatigue damage theory, the common mechanical structure, mechanical systems design and so on.

Looked at the overall information and course curriculum, courses taught in English, no Chinese resources. And which used a lot of machinery professional terms, although some understanding of these words In previous work and school life, but only Chinese level, the level of English or a blank state. In order to ensure the smooth behind the study, I will draft curriculum subtitles video downloads down and make a plan for myself: first draft preview subtitle before the beginning of each week to learn, to understand the main contents of the course the following week, especially proprietary name associated machinery used.

A first circumferential final download completion Submissions 11 parts by much. Since the study Python, then use Pythonto write a program to read txt file in batch Submissions and unified summarized and save it to a Markdowndocument. In this way we can avoid open 11 repetitions open txt file to copy and paste Markdowntedious documents, can improve the conversion efficiency.

First, the necessary materials

  • Several draft text subtitle file
  • Python development environment

Second, step

  • By iteration in order to open the document and read the caption draft text content within the file

    # 迭代读取字幕稿文本文档中的内容
    for n in range(rangew):
      f=open(fname,encoding='utf-8')
      ftxt = f.read()
      text.append(ftxt)

    among them:

    Variable fnameto store the filename of the file.

    txt document content provided by Python file.read()acquisition method, and assign content to get to the text ftxt.

    Variable textas a Listfor storing contents of each subtitle version of the document. By List.append()method ftxtof content added to textthis Listmedium.

  • Save the acquired content to the new Markdowndocument

    After testing, Python can open and save the Markdownfile:

    # 写入Markdown文档
    fmd = open('subtitle.md','w')
    for t in range(len(text)):
      fmd.write(text[t])
    fmd.close()

    Here it is first applied to the Python open()function: open()function can be used to open a file, create an fileobject.

    open()Function takes two parameters: a is 文件名name, i.e. above transcript.md, the other is a pattern modefor determining a file is opened, i.e., above w. By open()created and opened a called function, the program subtitle.mddocuments.

    After traversing through the top of the content in order to obtain written Markdowndocumentation.

  • Write effect

    FIG operation code generation top of document 1

    FIG operation code generation document 2 (between the two closed captions)

    In summary, the functions of the program be completed, but the manuscript looked really comfortable after the summary:

    1. The resulting file is not inconvenient to quickly locate a different directory subtitles draft;
    2. No clear boundaries between different subtitles draft, inconvenient to read and search;
    3. Sentence punctuation follows the video, the middle is a lot of line breaks off, read the inconvenience.

Third, modify

Analysis, in fact, is very simple to solve:

  1. MarkdownDocuments cataloged

    MarkdownIn to join the directory, you can just add `` before the body of the document.

    Here captions written by former first draft to a document written [toc]solution of.

  2. Increase text title

    In order to increase the degree of recognition and easy to read in a document, you need to add a title to each caption draft in position to start.

    Here solved by a front and a rear acquired text string corresponding to an increase manner.

  3. Find and replace line breaks in the document \n.

    Here selected by using Pythonthe replace()text line breaks method \nreplace, into the space.

    replace()Input method requires two parameters, the first parameter that is the old string (character string needs to be replaced), i.e., the second argument new string (required).

The revised Code:

# 1 迭代读取字幕稿文本文档中的内容
for n in range(rangew):
    f=open(fname,encoding='utf-8')
    ftxt = f.read()
    
    # 替换换行符为空格
    chtxt = ftxt.replace('\n', ' ')
    
    # 为Markdown文档插入标题
    addfilename = '#Subtitle-' + str(n+1) + '\n' + chtxt 
    text.append(addfilename)
    
# 2 写入Markdown文档
fmd = open('transcript.md','w+')

# 写入目录
fmd.write('[toc]\n')

for t in range(len(text)):
    fmd.write(text[t])
    
    # 当前文档内容写入完成后加入换行符隔开两段内容
    fmd.write('\n')
fmd.close()

After the code changes operating results:

FIG 3 generates the modified document codes

FIG modified code generation document 4 (between two closed captions)

IV Summary

Generated code changes before and after comparison Markdowndocument, the document color code values increase significantly modified, very easy to read.

After working in the encounter requires transcript batch summary, adjust the code can whiz complete, you can save more time to think for himself.

Guess you like

Origin www.cnblogs.com/mrsin/p/12514279.html