Python office automation (2)

Connected to python office automation (1)

File and Directory Operations

Use the shutil library

The shutil library is also a Python standard library. It can handle files, folders, and compressed packages, and can implement functions such as file copy, move, compression, and decompression.

function illustrate
copy Copy files and permissions
copy2 Copy files and metadata
copyfile Copy the contents of one file to another
copyfileobj Copy the contents of one file to another
copytree copy entire file directory
move Move a file or directory recursively, the original file or directory does not exist
rmtree Delete a directory and all its contents
make_archive Create a compressed package and return the file path
unpack_archive unzip the file

Copying files is a more complicated matter. A file is mainly composed of two parts, one part is the data of the file, and the other part is the metadata used to describe the file. Metadata refers to information such as file access time, modification time, and author. So when copying files, figure out whether to just copy the content or also copy the metadata.

# shutil.copyfile(A,B)仅仅是复制A文件的内容到B文件。A和B必须是文件,不可以是目录。B文件得有写入权限
shutil.copyfile('./python_zen.txt','./copy01.txt')
#手动新建文件夹copyfiles
#复制文件到某个文件夹
shutil.copy('./copy01.txt','./copyfiles/')
#整体复制文件夹
shutil.copytree('./copyfiles/','./copyfiles01/')
#移动文件
shutil.move('./copyfiles01/copy01.txt','./copyfiles/copy02.txt')
#整体移动目录,移动目录本身
shutil.move('./copyfiles01/','./copyfiles/')

In the os module, both the os.rmdir method and the os.removedirs method require the deleted directory to be non-empty, otherwise an error will be reported. The shutil.rmtree method deletes the entire directory directly regardless of whether the directory is not empty.

#删除整个目录
shutil.rmtree('./copyfiles/copyfiles01/')
#删除单个文件
import os
os.unlink('./copy01.txt')
#压缩
#参数分别为(保存)压缩包路径及名称,格式,要压缩文件的路径
shutil.make_archive('./压缩包','zip',base_dir='./copyfiles/')

More parameters can be directly searched online, this should be enough

#解压
#压缩包路径及名称,解压至指定文件夹
shutil.unpack_archive('./压缩包.zip','./解压文件')

file search

glob

glob is a file operation-related module that comes with Python, and it can be used to find files that meet the conditions.

import glob
#匹配条件,*匹配任意个字符,?匹配单个字符,[]匹配指定范围字符如[0-9]
glob.glob('*.txt')

insert image description here

fnmatch

Fnmatch is also a library that comes with Python. It is a module specially used for file name matching. It can be used to complete more complex file name matching.

#找出目标文件夹里所有结尾带数字的文件
import os,fnmatch
for foldName,subfolders,filenames in os.walk('./'):
    for filename in filenames:
        if fnmatch.fnmatch(filename,'*[0-9].*'):
            print(filename)

insert image description here
fnmatchcase is similar to the fnmatch function, except that the fnmatchcase function enforces case-sensitive letters.

The above two functions return True or False, and the filter function returns a list of matching file names.

fileList=[]
for foldName,subfolders,filenames in os.walk('./'):
    for filename in filenames:
        fileList.append(filename)
print('fileList:\n',fileList)
print(fnmatch.filter(fileList,'*[0-9].*'))

insert image description here

hashlib

If you want to find duplicate files, the duplicate files may have different file names, and you cannot simply use the file name and file size to judge. From a scientific point of view, the easiest way is to use MD5 to determine whether two files are the same.
The hashlib library that comes with Python provides a method to obtain the MD5 value of the file.

import hashlib
m=hashlib.md5()
f=open('./python_zen.txt','rb')
m.update(f.read())
f.close()
md5_value=m.hexdigest()
print(md5_value)

insert image description here

#文件树状图
import os
def filetree(path,depth):
    if depth==0:
        print('文件夹:'+path)
    for file in os.listdir(path):
        print('|    '*depth+'+--'+file)
        directory=path+'/'+file
        if os.path.isdir(directory):
            filetree(directory,depth+1)
filetree('./',0)

insert image description here

# 手动随意创建一个空文件夹,删除空文件夹
import os,shutil
path='./'
for file in os.listdir(path):
    directory=path+file
    if os.path.isdir(directory) and len(os.listdir(directory))==0:
        print(directory,os.listdir(directory))
        shutil.rmtree(directory)
# 删除重复文件
import os,shutil,hashlib
path='./重复文件'
list=[]
print('重复文件文件夹内容:')
for foldName,subfolders,filenames in os.walk(path):
    for filename in filenames:
        print(foldName,filename)
print('重复文件为:')
for file in os.listdir(path):
    fileName=path+'/'+file
    m=hashlib.md5()
    with open(fileName,'rb') as mfile:
        m.update(mfile.read())
    md5_value=m.hexdigest()
    if md5_value in list:
        print(fileName)
        os.unlink(fileName)#删除重复文件
    else:
        list.append(md5_value)

insert image description here

Guess you like

Origin blog.csdn.net/weixin_46322367/article/details/129501509