Use Python to batch delete files specified in folders under Windows or Linux

  • Situation explanation: When there are dozens or hundreds of files under a folder that need to be deleted, it takes time and effort to select one by one, especially under Linux. Therefore, you need to delete files in batches.

    When evaluating training samples (images) and test samples (images), you need to check whether it is the problem of the data itself or the problem of your own model. Therefore, you need to select the samples (images) that are misclassified to see whether the problem is the label itself, or The model is not well trained. When it is a problem with the sample itself, the wrong sample needs to be deleted.

    Among the 3W multiple training samples, more than 400 incorrectly labeled data were queried and need to be deleted in the data set.
   

    Put the path of the file to be deleted into txt, use Python's os to read the file and save it in txt. Then use os.remove() to delete the file pointed to by the read path.

    Because it is convenient for me to view the sample under Windows, I write the relative path of the file into txt under Windows. Here is the first code.

    Then on the Linux server, execute the code for deleting files in batches. This is the second code. (Therefore, you need to pay attention to the path problem when copying this code!!!)

    

 

  1. First select the wrong file and put it under data_reduce.txt in a folder.
    import os
    from os import listdir
    # 错误文件放在reduce下面。
    anchor_dir='E:/WrongData/reduce/'
    anchor_files = [t for t in listdir(anchor_dir)]
    i=0
    #在wrongdata文件夹下面创建一个txt,保存错误文件的名字。
    with open('E:/WrongData/data_reduce.txt','w') as f:
        for f1 in anchor_files:
            path1 = anchor_dir+f1+'\n'
            print(path1)
            #设置文件对象
            f.write(path1)
            i +=1
        print(i)
  2. Read the file path in data_reduce.txt, and then use os.remove() to delete these files.
    import os
    data_dir = "E:/train/4_classes/"
    file_handle=open('E:/WrongData/data_reduce_4.txt',mode='r')
    all_line = file_handle.readlines()
    for line in all_line:
        image_path=line
        print(image_path)
        # 需要去掉路径最后的换行符号。'\n'
        os.remove(image_path.strip('\n'))
    print("remove ok!")

 

Guess you like

Origin blog.csdn.net/qq_32998593/article/details/87981162