Data processing | Operations on txt files (python script files)

1. Delete duplicate values ​​in the txt file

def remove_duplicates():
    f_read=open('./newFile.txt','r',encoding='utf-8')     #将需要去除重复值的txt文本重命名text.txt
    f_write=open('./test.txt','w',encoding='utf-8')  #去除重复值之后,生成新的txt文本 后的文本.txt”
    data=set()
    for a in [a.strip('\n') for a in list(f_read)]:
        if a not in data:
            f_write.write(a+'\n')
            data.add(a)
    f_read.close()
    f_write.close()
remove_duplicates()
print('Done')

Example:

The original file is as shown in the figure: it has the same content, and the txt file has only one column

After deletion, this column does not have the same number repeated

2. Extract a column in the batch txt file and save it to a new txt file

Code explanation: flatten all txt files in the folder, read the desired part of the file and write to a new file

import glob

files = glob.glob("/workspace/yolo/data/dataset/labels0208/*.txt")  #dir表示文件所在的目录,代码意思为获取该目录下所有以txt作为后缀的文件

newFile = open("newFile.txt",'w')  #新建文件,默认在你运行的目录下生成

for file in files:

    with open(file,'r') as FA:

        for line in FA:

            line = line.strip().split(" ") #默认你文件里的分割符为\t,其他的话可以替换。
            newFile.write(line[0]+'\t' +'\n') #填写文件的第1列信息
            #newFile.write(line[0]+'\t'+ file +'\n') #填写文件的第1列信息,和文件名称

newFile.close()

Example:

Generated file content after running

Guess you like

Origin blog.csdn.net/weixin_44649780/article/details/129040854