基于Python清除破损图片需求实现

  处理同事爬取的图片时,其因爬取过程中因图片类型/网络等问题,获取到较大批次破损图片,现需清除破损文件,并做简要记录.

  要点:

  在python中,可以使⽤imghdr模块中的what()⽅法判断图⽚⽂件是否损坏,若⽂件损坏,则返回None,否则返回图⽚⽂件的类型,如jpeg等。imgh 内容⻅: https://docs.python.org/3/library/imghdr.html

  progressbar模块,则可以展示代码处理进度

  os模块用以本地文件夹及文件的相关操作

  业务:

  选取需处理图片所在的文件夹(含其子文件),获取图片集,判断文件类型,损坏(类型为 None),则删除,并记录到本地txt文件

  代码:

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# __author__ = "NYA"

import os
import imghdr
from progressbar import ProgressBar

"""
    imghdr what 类型判断,去除损坏文件
"""

path = '/home/lab/images'
original_images = []
for root, dirs, filenames in os.walk(path):
    for filename in filenames:
        original_images.append(os.path.join(root, filename))
original_images = sorted(original_images)
print('totalNum:', len(original_images))
f = open('/home/lab/check_error.txt', 'wb')
error_images = []
progress = ProgressBar()
for filename in progress(original_images):
    check = imghdr.what(filename)
    if check == None:
        f.write(filename)
        f.write('\n')
        os.remove(filename)
        error_images.append(filename)
print('errorFileNum:',len(error_images))
f.close()

  

猜你喜欢

转载自www.cnblogs.com/nyatom/p/10782898.html