20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10
2023/8/9 19:02
Since I like to watch documentaries and other foreign videos, after identifying the subtitles through clipping/PR2023/AUTOSUB, I can use Google Translate to identify them as Simplified Chinese DOCX documents.
After the DOCX document is converted into a TXT document, the subtitle serial number needs to be modified to obtain the final required Simplified Chinese SRT document.
google.py
#f=open("./1574/%03d.ts"%(n+1),"wb")
f=open("12.txt","wb")
#f = open("p:\\ts\\1574.txt")
f1 = open("1.txt")
#for n in range(1,4000):
for n in range(1,4560):
line = f1.readline()
#f.write(response.content)
#f.write(line)
f.decode().write(line)
f.close()
google12.py
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\py>python google12.py > test.srt
f_path=r'1.txt'
temp = 1
xuhao = 1;
with open(f_path) as f:
lines = f.readlines()
for line in lines:
if temp == 1:
print(str(xuhao))
temp=0
else:
if len(line) == 1:
#print("jiangedian!")
temp=1
xuhao = xuhao+1
print(line.rstrip())
txt2srt3all.py
[Process all ANSI-encoded TXT subtitles in the directory as SRT subtitles, but do not process the subtitle directory! 】
# coding=utf-8
import os
# Get the current directory
path = os.getcwd()
# View all files in the current directory files
= os.listdir(path)
# Traverse all files
for file in files:
# Determine whether the file is a txt file
if file.endswith('.txt'):
# Construct a new file name
#new_file = file.replace('.txt', '.json')
#new_file = file.replace('.txt', '.srt')
new_file = file.replace('.txt', '.cn.srt')
# Rename the file
#os.rename(os.path.join( path, file), os.path.join(path, new_file))
f2=open(new_file,"wb")
#f_path=r'C:\Users\Admin\Desktop\shapenetcore_partanno_segmentation_benchmark_v0_normal_2\00000001\0.txt'
#f_path =r'1.txt'
#f_path=file
temp = 1
xuhao = 1;
#with open(f_path) as f:
with open(file) as f:
lines = f.readlines()
for line in lines:
if temp == 1:
#print(str(xuhao))
#f.decode().write(line)
#f2.decode().write(str(xuhao))
#f2.write(str(xuhao))
f2.write(str(xuhao).encode())
f2.write(str('\n').encode())
temp=0
else:
if len(line) == 1:
#print("jiangedian!")
temp=1
xuhao = xuhao+1
#print(line.rstrip())
#f.decode().write(line)
#f2.decode().write(line.rstrip())
#f2.write(line.rstrip())
f2.write(line.encode())
#f=open(new_file,"wb")
f2.close()
LOG:
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\ansi's TXT>dir
drive J. The volume in drive J is 18680688682.
The serial number of the volume is 2A59-69C0
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\ansi TXT directory
2023/08/09 19:11 <DIR> .
2023/08/09 19:11 <DIR> ..
2023/08/09 12:22 67,713 August 7.txt
2023/08/09 12:22 113,997 AC3EN2 .silhouette.txt
2023/08/09 12:22 67,713 path_to_your_word_file.txt 2023/08/09
12:22 75,347 Red.Eye.2005.2160p.BluRay.REMUX.HEVC.DTS-HD.MA.5.1-FGT.eng9. txt
2023/08/09 19:11 1,715 txt2srt3all.py
2023/08/07 22:29 1,671 txt2srt3xuhao56.py
6 files 328,156 bytes
2 directories 50,770,313,216 available bytes
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\ansi's TXT>python txt2srt3all.py
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\ansi's TXT>dir
drive J. The volume in drive J is 18680688682.
The serial number of the volume is 2A59-69C0
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\ansi TXT directory
2023/08/09 19:11 <DIR> .
2023/08/09 19:11 <DIR> ..
2023/08/09 19:11 71,024 August 7.cn.srt
2023/08/09 12:22 67,713 August 7.txt
2023/08/09 19:11 120,955 AC3EN2.silhouette.cn.srt
2023/08/09 12:22 113,997 AC3EN2.silhouette.txt
2023/08/09 19:11 71,024 path_to_your_word_ file.cn. srt
2023/08/09 12:22 67,713 path_to_your_word_file.txt
2023/08/09 19:11 81,213 Red.Eye.2005.2160p.BluRay.REMUX.HEVC.DTS-HD.MA.5.1-FGT.eng9.cn.srt
2023/08/09 12:22 75,347 Red.Eye.2005.2160p.BluRay.REMUX.HEVC.DTS-HD.MA.5.1-FGT.eng9.txt
2023/08/09 19:11 1,715 txt2srt3all.py
2023/08/07 22:29 1,671 txt2srt3xuhao56.py
10 files 672,372 bytes
2 directories 50,769,960,960 available bytes
J:\! ! ! ! Document organization 20230625\en2cn\20230809 Using python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\ansi TXT>
utf8txt2srt3all.py
[Processes all UTF8-encoded TXT subtitles in the directory as SRT subtitles, but does not process the subtitle directory! 】
# coding=utf-8
import os
# Get the current directory
path = os.getcwd()
# View all files in the current directory files
= os.listdir(path)
# Traverse all files
for file in files:
# Determine whether the file is a txt file
if file.endswith('.txt'):
# Construct a new file name
#new_file = file.replace('.txt', '.json')
#new_file = file.replace('.txt', '.srt')
new_file = file.replace('.txt', '.cn.srt')
# Rename the file
#os.rename(os.path.join( path, file), os.path.join(path, new_file))
#f2=open(new_file,"wb")
#with open(new_file, "w", encoding="UTF-8") as txt_file:
#f2 = open(new_file, "wb", encoding="UTF-8")
f2 = open(new_file, "w",encoding="UTF-8")
temp = 1
xuhao = 1;
#with open(f_path) as f:
#with open(file) as f:
#with open(new_file, "w", encoding="UTF-8") as txt_file:
#with open(file, "w", encoding="UTF-8") as f:
with open(file, "r", encoding="UTF-8") as f:
lines = f.readlines()
for line in lines:
if temp == 1:
#f2.write(str(xuhao).encode())
#f2.write(str('\n').encode())
f2.write(str(xuhao))
f2.write(str('\n'))
temp=0
else:
if len(line) == 1: #f2.write(line.encode()) xuhao = xuhao+1
temp=1
f2.write(line)
f2.close()
Reference:
https://pythonjishu.com/nwbuyryewwscpxl/How
to batch rename files using Python
python docx utf8 read and write
https://deepinout.com/python/python-qa/t_how-to-read-and-write-unicode-utf-8-files-in-python.html
How to read and write Unicode in Python ( UTF-8) file?
The debugging records of UTF8 scripts are written very differently!
Microsoft Windows [Version 10.0.19045.2311]
(c) Microsoft Corporation. all rights reserved.
C:\Users\Administrator>cd J:\! ! ! ! Document arrangement 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX)\utf8i TXT obtained by Google Translate under WIN10
C:\Users\Administrator>j:
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\utf8i's TXT>dir
drive J. The volume in drive J is 18680688682.
The serial number of the volume is 2A59-69C0
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i TXT directory
2023/08/09 19:14 <DIR> .
2023/08/09 19:14 <DIR> ..
2023/08/09 12:27 78,650 August 7.txt
2023/08/09 12:27 133,327 AC3EN2 .silhouette.txt
2023/08/09 12:27 78,650 path_to_save_txt+utf8_file.txt
2023/08/09 12:27 78,650 path_to_your_word_file.txt
2023/08/09 19:11 1,715 txt2srt3all. py
5 files 370,992 bytes
2 Directory 50,769,956,864 available bytes
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i TXT>
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\utf8i's TXT>python txt2srt3all.py
Traceback (most recent call last):
File "J:\!!! Documentation Organize 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i's TXT\txt2srt3all.py", line 34, in <module> lines = f.readlines() UnicodeDecodeError:
'
gbk ' codec can't decode byte 0xb7 in position 82: illegal multibyte sequence
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i's TXT>python txt2srt3all.py
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\utf8i's TXT>python utf8txt2srt3all.py
Traceback (most recent call last):
File "J:\!!! Documentation Organize 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\utf8i's TXT\utf8txt2srt3all.py", line 23, in <module> f2 = open(new_file, "wb",
encoding ="UTF-8")
ValueError: binary mode doesn't take an encoding argument
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\utf8i's TXT>python utf8txt2srt3all.py
Traceback (most recent call last):
File "J:\!!! Documentation Organize 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i's TXT\utf8txt2srt3all.py", line 33, in <module> lines = f.readlines()
io.UnsupportedOperation
: not readable
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\utf8i's TXT>python utf8txt2srt3all.py
Traceback (most recent call last):
File "J:\!!! Documentation Organize 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i's TXT\utf8txt2srt3all.py", line 38, in <module> f2.write(str(xuhao).encode
( ))
TypeError: write() argument must be str, not bytes
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\utf8i's TXT>python utf8txt2srt3all.py
Traceback (most recent call last):
File "J:\!!! Documentation Organize 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i's TXT\utf8txt2srt3all.py", line 40, in <module> f2.write(str('\n'
) .encode())
TypeError: write() argument must be str, not bytes
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i's TXT>python utf8txt2srt3all.py
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i's TXT>python utf8txt2srt3all.py
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i's TXT>python utf8txt2srt3all.py
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 under WIN10 to process SRT format subtitles (DOCX) obtained by Google Translate\utf8i's TXT>dir
drive J. The volume in drive J is 18680688682.
The serial number of the volume is 2A59-69C0
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Use python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i TXT directory
2023/08/09 19:29 <DIR> .
2023/08/09 19:29 <DIR> ..
2023/08/09 19:29 75,580 8月7日.cn.srt
2023/08/09 12:27 78,650 8月7日.txt
2023/08/09 19:29 128,367 AC3EN2.剪影.cn.srt
2023/08/09 12:27 133,327 AC3EN2.剪影.txt
2023/08/09 19:29 75,580 path_to_save_txt+utf8_file.cn.srt
2023/08/09 12:27 78,650 path_to_save_txt+utf8_file.txt
2023/08/09 19:29 75,580 path_to_your_word_file.cn.srt
2023/08/09 12:27 78,650 path_to_your_word_file.txt
2023/08/09 19:29 86,176 Red.Eye.2005.2160p.BluRay.REMUX.HEVC.DTS-HD.MA.5.1-FGT.eng9.cn.srt
2023/08/09 19:28 89,228 Red.Eye.2005.2160p.BluRay.REMUX.HEVC.DTS-HD.MA.5.1-FGT.eng9.txt
2023/08/09 19:11 1,715 txt2srt3all.py
2023/08 /09 19:24 1,568 utf8txt2srt3all.py
12 files 903,071 bytes
2 directories 50,767,888,384 available bytes
J:\! ! ! ! Documentation 20230625\en2cn\20230809 Using python3 to process SRT format subtitles (DOCX) obtained by Google Translate under WIN10\utf8i TXT>