20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10

20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10
2023/8/9 19:02


Since I like to watch documentaries and other foreign videos, after identifying the subtitles through clipping/PR2023/AUTOSUB, I can use Google Translate to identify them as Simplified Chinese DOCX documents.
After the DOCX document is converted into a TXT document, the subtitle serial number needs to be modified to obtain the final required Simplified Chinese SRT document.


google.py

#f=open("./1574/%03d.ts"%(n+1),"wb")
f=open("12.txt","wb")


#f = open("p:\\ts\\1574.txt")
f1 = open("1.txt")
#for n in range(1,4000):
for n in range(1,4560):
    line = f1.readline()
    #f.write(response.content) 
    #f.write(line)
    f.decode().write(line)

f.close()


google12.py
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\py>python google12.py > test.srt

f_path=r'1.txt'

temp = 1
xuhao = 1;

with open(f_path) as f:
    lines = f.readlines()

for line in lines:
    if temp == 1:
        print(str(xuhao))
        temp=0
    else:
        if len(line) == 1:
            #print("jiangedian!")
            temp=1
            xuhao = xuhao+1
        print(line.rstrip())


txt2srt3all.py
[Process all ANSI-encoded TXT subtitles in the directory as SRT subtitles, but do not process the subtitle directory!

# coding=utf-8
import os

# Get the current directory
path = os.getcwd()
# View all files in the current directory files
= os.listdir(path)

# Traverse all files
for file in files:
    # Determine whether the file is a txt file
    if file.endswith('.txt'):
        # Construct a new file name
        #new_file = file.replace('.txt', '.json')
        #new_file = file.replace('.txt', '.srt')
        new_file = file.replace('.txt', '.cn.srt')
        # Rename the file
        #os.rename(os.path.join( path, file), os.path.join(path, new_file))
        
        
        
        f2=open(new_file,"wb")
        
        
        #f_path=r'C:\Users\Admin\Desktop\shapenetcore_partanno_segmentation_benchmark_v0_normal_2\00000001\0.txt'
        #f_path =r'1.txt'
        #f_path=file
        
        temp = 1
        xuhao = 1;
        
        #with open(f_path) as f:
        with open(file) as f:
            lines = f.readlines()
        
        for line in lines:
            if temp == 1:
                #print(str(xuhao))
                #f.decode().write(line)
                #f2.decode().write(str(xuhao))
                #f2.write(str(xuhao))
                f2.write(str(xuhao).encode())
                f2.write(str('\n').encode())
                temp=0
            else:
                if len(line) == 1:
                    #print("jiangedian!")
                    temp=1
                    xuhao = xuhao+1
                    #print(line.rstrip())
                    #f.decode().write(line)
                    #f2.decode().write(line.rstrip())
                    #f2.write(line.rstrip())
                f2.write(line.encode())

        #f=open(new_file,"wb")
        f2.close()

 


LOG:
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\ansi's TXT>dir
 drive J. The volume in drive J is 18680688682.
 The serial number of the volume is 2A59-69C0

 J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\ansi TXT directory

2023/08/09 19:11 <DIR> .
2023/08/09 19:11 <DIR> ..
2023/08/09 12:22 67,713 August 7.txt
2023/08/09 12:22 113,997 AC3EN2 .silhouette.txt
2023/08/09 12:22 67,713 path_to_your_word_file.txt 2023/08/09
12:22 75,347 Red.Eye.2005.2160p.BluRay.REMUX.HEVC.DTS-HD.MA.5.1-FGT.eng9. txt
2023/08/09 19:11 1,715 txt2srt3all.py
2023/08/07 22:29 1,671 txt2srt3xuhao56.py
               6 files 328,156 bytes
               2 directories 50,770,313,216 available bytes

J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\ansi's TXT>python txt2srt3all.py

J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\ansi's TXT>dir
 drive J. The volume in drive J is 18680688682.
 The serial number of the volume is 2A59-69C0

 J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\ansi TXT directory

2023/08/09 19:11 <DIR> .
2023/08/09 19:11 <DIR> ..
2023/08/09 19:11 71,024 August 7.cn.srt
2023/08/09 12:22 67,713 August 7.txt
2023/08/09 19:11 120,955 AC3EN2.silhouette.cn.srt
2023/08/09 12:22 113,997 AC3EN2.silhouette.txt
2023/08/09 19:11 71,024 path_to_your_word_ file.cn. srt
2023/08/09 12:22 67,713 path_to_your_word_file.txt
2023/08/09 19:11 81,213 Red.Eye.2005.2160p.BluRay.REMUX.HEVC.DTS-HD.MA.5.1-FGT.eng9.cn.srt
2023/08/09 12:22 75,347 Red.Eye.2005.2160p.BluRay.REMUX.HEVC.DTS-HD.MA.5.1-FGT.eng9.txt
2023/08/09 19:11 1,715 txt2srt3all.py
2023/08/07 22:29 1,671 txt2srt3xuhao56.py
              10 files 672,372 bytes
               2 directories 50,769,960,960 available bytes

J:\! ! ! ! Document organization 20230625\en2cn\20230809 Using python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\ansi TXT>

 


utf8txt2srt3all.py
[Processes all UTF8-encoded TXT subtitles in the directory as SRT subtitles, but does not process the subtitle directory!

# coding=utf-8
import os

# Get the current directory
path = os.getcwd()
# View all files in the current directory files
= os.listdir(path)

# Traverse all files
for file in files:
    # Determine whether the file is a txt file
    if file.endswith('.txt'):
        # Construct a new file name
        #new_file = file.replace('.txt', '.json')
        #new_file = file.replace('.txt', '.srt')
        new_file = file.replace('.txt', '.cn.srt')
        # Rename the file
        #os.rename(os.path.join( path, file), os.path.join(path, new_file))
        
        
        #f2=open(new_file,"wb")
        #with open(new_file, "w", encoding="UTF-8") as txt_file:
        #f2 = open(new_file, "wb", encoding="UTF-8")
        f2 = open(new_file, "w",encoding="UTF-8")
        
        temp = 1
        xuhao = 1;
        
        #with open(f_path) as f:
        #with open(file) as f:
        #with open(new_file, "w", encoding="UTF-8") as txt_file:
        #with open(file, "w", encoding="UTF-8") as f:
        with open(file, "r", encoding="UTF-8") as f:
            lines = f.readlines()
        
        for line in lines:
            if temp == 1:
                #f2.write(str(xuhao).encode())
                #f2.write(str('\n').encode())
                f2.write(str(xuhao))
                f2.write(str('\n'))
                temp=0
            else:
                if len(line) == 1:                 #f2.write(line.encode())                     xuhao = xuhao+1
                    temp=1


                f2.write(line)

        f2.close()


Reference:
https://pythonjishu.com/nwbuyryewwscpxl/How
to batch rename files using Python


python docx utf8 read and write
https://deepinout.com/python/python-qa/t_how-to-read-and-write-unicode-utf-8-files-in-python.html
How to read and write Unicode in Python ( UTF-8) file?

 

The debugging records of UTF8 scripts are written very differently!

Microsoft Windows [Version 10.0.19045.2311]
(c) Microsoft Corporation. all rights reserved.

C:\Users\Administrator>cd J:\! ! ! ! Document arrangement 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX)\utf8i TXT obtained by Google Translate under WIN10

C:\Users\Administrator>j:

J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\utf8i's TXT>dir
 drive J. The volume in drive J is 18680688682.
 The serial number of the volume is 2A59-69C0

 J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i TXT directory

2023/08/09 19:14 <DIR> .
2023/08/09 19:14 <DIR> ..
2023/08/09 12:27 78,650 August 7.txt
2023/08/09 12:27 133,327 AC3EN2 .silhouette.txt
2023/08/09 12:27 78,650 path_to_save_txt+utf8_file.txt
2023/08/09 12:27 78,650 path_to_your_word_file.txt
2023/08/09 19:11 1,715 txt2srt3all. py
               5 files 370,992 bytes
               2 Directory 50,769,956,864 available bytes

J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i TXT>
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\utf8i's TXT>python txt2srt3all.py
Traceback (most recent call last):
  File "J:\!!! Documentation Organize 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i's TXT\txt2srt3all.py", line 34, in <module> lines = f.readlines() UnicodeDecodeError:
    '
gbk ' codec can't decode byte 0xb7 in position 82: illegal multibyte sequence

J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i's TXT>python txt2srt3all.py

J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\utf8i's TXT>python utf8txt2srt3all.py
Traceback (most recent call last):
  File "J:\!!! Documentation Organize 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\utf8i's TXT\utf8txt2srt3all.py", line 23, in <module> f2 = open(new_file, "wb",
    encoding ="UTF-8")
ValueError: binary mode doesn't take an encoding argument

J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\utf8i's TXT>python utf8txt2srt3all.py
Traceback (most recent call last):
  File "J:\!!! Documentation Organize 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i's TXT\utf8txt2srt3all.py", line 33, in <module> lines = f.readlines()
    io.UnsupportedOperation
: not readable

J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\utf8i's TXT>python utf8txt2srt3all.py
Traceback (most recent call last):
  File "J:\!!! Documentation Organize 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i's TXT\utf8txt2srt3all.py", line 38, in <module> f2.write(str(xuhao).encode
    ( ))
TypeError: write() argument must be str, not bytes

J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\utf8i's TXT>python utf8txt2srt3all.py
Traceback (most recent call last):
  File "J:\!!! Documentation Organize 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i's TXT\utf8txt2srt3all.py", line 40, in <module> f2.write(str('\n'
    ) .encode())
TypeError: write() argument must be str, not bytes

J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i's TXT>python utf8txt2srt3all.py

J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i's TXT>python utf8txt2srt3all.py

J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i's TXT>python utf8txt2srt3all.py

J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\utf8i's TXT>dir
 drive J. The volume in drive J is 18680688682.
 The serial number of the volume is 2A59-69C0

 J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i TXT directory

2023/08/09  19:29    <DIR>          .
2023/08/09  19:29    <DIR>          ..
2023/08/09  19:29            75,580 8月7日.cn.srt
2023/08/09  12:27            78,650 8月7日.txt
2023/08/09  19:29           128,367 AC3EN2.剪影.cn.srt
2023/08/09  12:27           133,327 AC3EN2.剪影.txt
2023/08/09  19:29            75,580 path_to_save_txt+utf8_file.cn.srt
2023/08/09  12:27            78,650 path_to_save_txt+utf8_file.txt
2023/08/09  19:29            75,580 path_to_your_word_file.cn.srt
2023/08/09  12:27            78,650 path_to_your_word_file.txt
2023/08/09  19:29            86,176 Red.Eye.2005.2160p.BluRay.REMUX.HEVC.DTS-HD.MA.5.1-FGT.eng9.cn.srt
2023/08/09 19:28 89,228 Red.Eye.2005.2160p.BluRay.REMUX.HEVC.DTS-HD.MA.5.1-FGT.eng9.txt
2023/08/09 19:11 1,715 txt2srt3all.py
2023/08 /09 19:24 1,568 utf8txt2srt3all.py
              12 files 903,071 bytes
               2 directories 50,767,888,384 available bytes

J:\! ! ! ! Documentation 20230625\en2cn\20230809 Using python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i TXT>

 

Guess you like

Origin blog.csdn.net/wb4916/article/details/132196113