python to achieve data compression and archiving

wedge

Despite the growing storage capacity of modern computer systems, but the growth generated data it is never-ending. Lossless (lossless) compression algorithm to compress or decompress the time it takes to exchange data space to store the data needed to make up for a shortage of storage capacity. Python provides an interface to some of the most popular compression library, which can read and write files using different compression library. zlib and gzip GNU zip library provides, in addition to allowing access to updated bz2 bzip2 format. These formats are processed regardless of the input data stream format, and may provide an interface to read and write compressed files transparently. These modules can be used to compress a single data file or library also include some source module to manage the archive (Archive) format, a plurality of files can be merged into one file, the file can be managed as a unit. tarfile write unix tape archive format, which is an old standard, but because of its flexibility, the current is still widely used. According zipfile zip archive format to deal with, because this format pc program pkzip be popular, originally used under MS-DOS and Windows, but because of the simplicity and portability of the format of its API, is now also used in other platforms

zlib: GNU zlib compression

Introduction

zlib module provides low-level interface to the GNU Project zlib compression library for many functions

Processing data in memory

import zlib
import binascii
 
 
'''
使用zlib最简单的方法要求把所有将要压缩或解压缩的数据放到内存中
'''
original_data = b"this is a original text"
print(f"original: {len(original_data)}, {original_data}")
'''
original: 23, b'this is a original text'
'''

# 压缩
compressed_data = zlib.compress(original_data)
print(f"compressed: {len(compressed_data)}, {compressed_data}, {binascii.hexlify(compressed_data)}")
'''
compressed: 29, b'x\x9c+\xc9\xc8,V\x00\xa2D\x85\xfc\xa2\xcc\xf4\xcc\xbc\xc4\x1c\x85\x92\xd4\x8a\x12\x00d\xb7\x08\x90', b'789c2bc9c82c5600a24485fca2ccf4ccbcc41c8592d48a120064b70890'
'''

# 解压缩
decompressed_data = zlib.decompress(compressed_data)
print(f"decompressed: {len(decompressed_data)}, {decompressed_data}, {binascii.hexlify(decompressed_data)}")
'''
decompressed: 23, b'this is a original text', b'746869732069732061206f726967696e616c2074657874'
'''
 
'''
compress函数和decompress函数都取一个字节序列参数,并且返回一个字节序列。
从该例子可以看到,少量数据的压缩版本可能比未压缩的版本还要大。具体的结果取决于输入的数据
 
zlib支持不同的压缩级别,默认压缩机别为zlib.Z_DEFAULT_COMPRESSION为-1,这对应者一个硬编码值。
压缩机别总共有十个,分别为-1到9,压缩级别越高,计算的越多,但同时生成的数据也就越小。压缩级别为0等于没有压缩。
'''
input_data = b"this is a input_data" * 1024
print(f"{'压缩级别':^6} {'原始数据大小':^6} {'压缩后的大小':^6}")
print(f"{'-'*8:^} {'-'*8:^} {'-'*8:^}")
for i in range(0, 10):
    compressed_data = zlib.compress(input_data, i)
    print(f"{i:>8} {len(input_data):>8} {len(compressed_data):>8}")
'''
 压缩级别  原始数据大小 压缩后的大小
-------- -------- --------
       0    20480    20491
       1    20480      167
       2    20480      167
       3    20480      167
       4    20480       93
       5    20480       93
       6    20480       93
       7    20480       93
       8    20480       93
       9    20480       93
'''
# 可以看到当压缩级别为4的时候,就已经到极限了,所以级别再高也无法压缩的更小了

Delta compression and decompression

import zlib
import binascii
 
 
'''
刚才的那种基于内存中的压缩方法有一些缺点,主要是系统需要有足够大的内存,可以在内存中同时驻留压缩版本和未压缩版本,因此这种方法对于真实世界的用例并不适用。
另一种方法是以增量的方式处理数据,这样就不需要将整个数据集都放在内存中。
'''
compressor = zlib.compressobj(1)
with open(r"C:\python37\Lib\asyncio\base_futures.py", "rb") as f:
    while True:
        # 每次从文本文件中读取小数据快
        block = f.read(64)
        if not block:
            break
        # 调用compressobj对象下的compress方法,不再使用zlib模块下的compress方法
        # 将小数据快传入compress函数中
        compressed = compressor.compress(block)
        # 压缩器维护压缩数据的一个内部缓冲区
        # 由于压缩算法依赖于校验和以及最小快大小,所以压缩器每次接收更多输入时可能并没有准备好返回数据。
        # 如果没有准备好一个完整的压缩块,那么变会返回一个空字节串
        if compressed:
            print(f"compressed: {binascii.hexlify(compressed)}")
        else:
            print("buffing...")
    # 当所有数据都已经输入时,flush方法会强制压缩器结束最后一个快,并返回余下的压缩数据
    remaining = compressor.flush()
    print(f"flushed: {binascii.hexlify(remaining)}")
'''
compressed: b'7801'
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
buffing...
flushed: b'9555df6fdb36107e37e0ffe1963c48061c61e963310718d2b40b5018039237232028ea54b3a14581a4daba86fff7dd919222bb4db7d1862d9177dfdd7df783424863848015e48bf96c3ed3bbd6ba00ca36aa730e9b50d45de81cfa4294d2e328e0b0754697ac523bbb83027acddaba9d0c628ba645e7f9fcce39ebc8c06b98451498cf6e65a3d018ac7ea1702a339f3dea1dda2efc42632ac1decc67ca48efe1bef9228dae1e820c18d5f3f8bb783b9f01ad8b8b8bc72d82a52064d0b601eda1b101882dfb152bd00d842ded79d62f483a615f4204f44034c0fbc85c319f89bfefd6efeed71f8884ac7fcc68f7f6cff5edddc78f77ef787f7ce193f7f7ebfb87bfd2c1f04cfbfca9b0265f5252725b7e9e387cbb45f51c2dcb17db299c47f6d52165b2f1f0e83a84af5b6c80f439b0419ca2a278280b40def37ef5055dd05e379f128c0e1e4d0dd243d5a9e7abb06f91d2ba6b89a2d220947bf018028983907edf286d45f25494c6aa673a203698de078c8a3b2a30a63231155ddf5987506190daf85e38b2cb5a2900c8b7d2cb101c875f0811f329c412b2d78c660b904d952c330eafa8fb8a9343b2d7b6c1d8158976d1d7b6a22228a57af6b92a27f4a79287ba6b54ac98490908c1fd22c41889d7df91b26eb0618c9edc3a56181ff545c87eaa92ab23269f5f39fd831b653e783278c1123d49bd50df87c5b9efc2dbce291c1196d4ff315486d035240757707deecac478b9f9fd89bebdfb68266a6fced5b2c3710987635624fdfc2730cb6960e5e63a62ff007e033fc7fee37004ae9d9b7f35735a071cef6b2b9270056f96ff5de524aeab931886c464aa5c6d0eaa3c3e4d5a7ae8935827baa96d9e1a67c8ec25e4a9491670754353c72597a8a0fe4fd9313095d3266117224d2f9e682e5f3c8d6578720cab158c03699256caf62087df14b6d329c98d3311657ed97421db169b2aafb3516375f801e43777ccfab4b3221a7f8e75099d472af4780541b060f44e071ac8c81df5296cc1d6f18dae86b60b4b40dfa2d2d4b6fbd34c5ec69143436e0fc6d2d4225e6946d1adc5868745975f6702d1d61b2cf8bf4f4f418dcda7137f59ed34d824b23aa4ff31b80981431ffb0969538cf3eef5a3f95193bb379a7e494b6a71119c54c8f36a025e3bb9e30134907f2ebaa1d27d6161ea4a9d298774e75520031c220ecd80e3dbfef1fa698cafaf77569ecffe01c96e8a7a'
'''

Mixed content stream

import zlib

'''
在压缩数据和未压缩数据混合在一起的情况下,还可以使用decompressobj
'''
data1 = open(r"C:\python37\Lib\asyncio\base_futures.py", "rb").read()
data2 = open(r"C:\python37\Lib\asyncio\queues.py", "rb").read()
# 对data1进行压缩
compressed = zlib.compress(data1)
# 将data1压缩之后的结果直接和data2组合起来
combined = compressed + data2

decompressor = zlib.decompressobj()
# combined包含压缩后的data1和未压缩的data2
decompressed = decompressor.decompress(combined)

print(decompressed == data1)  # True
print(decompressor.unused_data == data2)  # True
'''
对压缩数据(compressed)和未压缩数据(data2)进行解压缩,得到的decompressed是对compressed进行解压缩的结果,解压之后等于data1
而decompressor.unused_data是没有进行解压的数据,显然是data2,因为data2本来就不是压缩数据,所以没有解压,那么unused_data是data2。

所以注意:虽然combined包含压缩和未压缩数据,但是在解压的时候,得到是对压缩数据解压之后的结果

'''

Checksum

import zlib
 
 
'''
除了压缩和解压缩函数,zlib还包括两个用于计算数据的校验和的函数,分别是adler32和crc32.
这两个函数计算出的校验和都不能认为是密码安全的,它们只是用于数据完整性的验证
'''
data = open(r"C:\python37\Lib\asyncio\base_futures.py", "rb").read()
cksum = zlib.adler32(data)
print(cksum)  # 3379464826
print(zlib.adler32(data, cksum))  # 3762754818
 
 
cksum = zlib.crc32(data)
print(cksum)  # 2713689583
print(zlib.crc32(data, cksum))  # 1555820967
 
'''
这两个函数取相同的参数,包括一个包含数据的字节串和一个可选值,这个值可以作为校验和的起点。
这些函数会返回一个32位有符号整数值,这个整数值可以作为一个新的起点参数再传给后续的调用,以生成一个动态变化的校验和
'''

gzip: GNU zip file read and write

Introduction

gzip module provides an interface similar to a file for GNU zip file, which uses zlib to compress and decompress data

Write compressed files

import gzip
import os
import io
 
 
'''
模块级函数open创建类似文件的GzipFile类的一个实例。它提供了读写字节串的一般方法
'''
with gzip.open("1.zip", "wb") as output:
    # 为了把数据写到一个压缩文件,需要使用wb模式打开文件,得到一个文件句柄
    # 然后使用io.TextIOWrapper进行包装,然后将Unicode文本编码为适合压缩的字节
    with io.TextIOWrapper(output, encoding="utf-8") as enc:
        enc.write("这里一份样例")
print(f"1.zip contains {os.stat('1.zip').st_size} bytes")  # 1.zip contains 52 bytes
# 通过传入一个压缩级别(compresslevel)参数,可以使用不同的压缩量。合法值为0-排,包括0和9
# 值越小便会得到越快的处理,但是压缩的程度也越小。值越大,可能压的慢,但是压缩的程度越大。所以值为0的话,等于没压
 
 
import hashlib
data = open(r"C:\python37\Lib\asyncio\base_futures.py", "r", encoding="utf-8").read()
for i in range(0, 10):
    with gzip.open(f"{i}.zip", "wb", compresslevel=i) as output:
        with io.TextIOWrapper(output, encoding="utf-8") as enc:
            enc.write(data)
    size = os.stat("1.zip").st_size
    cksum = hashlib.md5(open(f"{i}.zip", "rb").read()).hexdigest()
    print(f"压缩级别:{i} size:{size} checksum:{cksum}")
'''
压缩级别:0 size:52 checksum:ea861ad27ef828cf9f80483e0822cd6b
压缩级别:1 size:31 checksum:ca5ee0bc498b7d2f7405cb9cfc908934
压缩级别:2 size:31 checksum:6dd8091f40801ef855a21d7c20f097e7
压缩级别:3 size:31 checksum:da1cd3487569e32b46b9cb6f2d001dd2
压缩级别:4 size:31 checksum:518db8ed8da67cc14b108231a91f9e7c
压缩级别:5 size:31 checksum:1b70e02b2ae177ab64b39af4e6a682b9
压缩级别:6 size:31 checksum:3fe7959c01a0c8086fa306bf7135479d
压缩级别:7 size:31 checksum:70ef8711a74f823c985021b04811ae9e
压缩级别:8 size:31 checksum:055f666e3c7aae0df8b6ffd728c8e895
压缩级别:9 size:31 checksum:ad9740d40737b05c9840e28a612cd267
'''
# 输出中,中间一列的数字显示了压缩输入所生成文件的大小(字节数),对于输入的数据,压缩值更高并不一定能减少存储空间。根据输入数据的不同,结果会有所变化

Read compressed data

import gzip
import io
 
 
'''
要从之前压缩的文件中读回数据,可以用二进制读模式打开文件,这样就不会对行尾完成基于文本的转换或Unicode解码了
'''
with gzip.open(r"1.zip", "wb") as output:
    with io.TextIOWrapper(output, encoding="utf-8") as enc:
        enc.write("蛤蛤蛤蛤嗝")
 
with gzip.open(r"1.zip", "rb") as output:
    with io.TextIOWrapper(output, encoding="utf-8") as enc:
        print(enc.read())  # 蛤蛤蛤蛤嗝
 
 
# 此外可以使用seek定位
with gzip.open(r"1.zip", "rb") as output:
    with io.TextIOWrapper(output, encoding="utf-8") as enc:
        # 这里跳三个字节,也就是一个中文字符,记得是3的倍数,否则会报错
        enc.seek(3)
        print(enc.read())  # 蛤蛤蛤嗝

Process Flow

import gzip
import io
import binascii
 
 
'''
GzipFile类可以用来包装其他类型的数据流,使它们也能使用压缩。
通过一个套接字或一个现有的(已经打开的)文件句柄传输数据时,这种方法也很有用。
还可以对GzipFile使用BytesIO缓冲区,以对内存中的数据完成操作。
'''
 
uncompressed_data = b"this same line, over and over\n"*10
print("uncompressed: ", len(uncompressed_data))  # uncompressed:  300
print(uncompressed_data)
'''
uncompressed:  300
b'this same line, over and over\nthis same line, over and over\nthis same line, over and over\nthis same line, over and over\nthis same line, over and over\nthis same line, over and over\nthis same line, over and over\nthis same line, over and over\nthis same line, over and over\nthis same line, over and over\n'
'''
 
buf = io.BytesIO()
# 此时调用f会把内容都写到obj里面去,写的时候会把内容压缩
with gzip.GzipFile(mode="wb", fileobj=buf) as f:
    f.write(uncompressed_data)
 
compressed_data = buf.getvalue()
print("compressed: ", len(compressed_data))
print(binascii.hexlify(compressed_data))
'''
compressed:  51
b'1f8b08004b87945c02ff2bc9c82c56284ecc4d55c8c9cc4bd551c82f4b2d5248cc4b0133b84a46659164018d0503202c010000'
'''
 
# 那我怎么把压缩之后的数据再转回去呢?
# 就把模式由wb改成rb,把压缩的数据再放到BytesIO里面即可
with gzip.GzipFile(mode="rb", fileobj=io.BytesIO(compressed_data)) as f:
    print(f.read())  # b'this same line, over and over\nthis same line, over and over\nthis same line, over and over\nthis same line, over and over\nthis same line, over and over\nthis same line, over and over\nthis same line, over and over\nthis same line, over and over\nthis same line, over and over\nthis same line, over and over\n'
 
 
'''
写数据:wb模式打开,指定fileobj,然后写数据,那么会自动将写入的数据转换为压缩数据放到BytesIO实例里面去
with gzip.GzipFile(mode="wb", fileobj=io.BytesIO()) as f:
    f.write(uncompressed_data)
    
读数据:rb模式打开,将压缩的数据放到BytesIO实例里面去,传给fileobj,那么read读取的时候会自动将BytesIO里的压缩数据读取出来并转化为原始的未压缩数据
with gzip.GzipFile(mode="wb", fileobj=io.BytesIO(compressed_data)) as f:
    f.write(uncompressed_data)
'''
 
# 和zlib模块功能类似,但是gzip更强大,建议使用gzip

bz2: bzip2 compression

Introduction

bz2 bzip2 excuse module is a library for compressing data for storage or transmission. api and zlib almost exactly the same, no longer introduced

tarfile: tar archive access

Introduction

tarfile module provides read and write Unix tar archive files (including archives) access. In addition to the POSIX standard, also supports multiple GNU tar extensions. Further it can handle some special Unix file types (such as hard and soft connection links) and device nodes. Although tarfile implements a Unix format, but it can also be used to create and read the tar archive file in Windows

Test tar file

import tarfile
 
'''
is_tarfile函数可以返回一个布尔值,判断传入的文件名是不是指向一个合法的tar文件
'''
# 为False是因为我把zip文件的后缀名改成了tar,这并不意味着这就是一个合法的tar文件
print(tarfile.is_tarfile("1.tar"))  # False
# 如果文件不存在,那么会引发一个异常

Reading metadata from the archive

import tarfile
 
'''
可以使用TarFile类来直接处理一个tar归档文件。
'''
with tarfile.open("1.tar", "r") as t:
    # 可以调用getnames函数获取归档里面所有的文件名
    print(t.getnames())  # ['1.docx', '1.txt', '2.txt']
    # 可以调用getmembers函数获取归档里面所有的文件信息
    print(t.getmembers())  # [<TarInfo '1.docx' at 0x9e05d90>, <TarInfo '1.txt' at 0x9e05e58>, <TarInfo '2.txt' at 0x9e05f20>]
    # 如果知道文件名,那么也可以调用函数getmember并传入文件名
    print(t.getmember("1.txt"))
    print(t.getmember("1.txt").name)  # 1.txt
    print(t.getmember("1.txt").size)  # 17

Extract files from the archive

import tarfile
 
'''
要在程序中访问一个归档成员的数据,可以extractfile方法,并且传入这个成员名
'''
with tarfile.open("1.tar", "r") as t:
    # 此时这个文件就被我们打开了
    # 这个f就类似于我们打开一个文件之后的句柄
    f = t.extractfile("1.txt")
    # 获取的是字节形式
    print(f.read().decode("utf-8"))  # aaaaa三生三世
 
# 要解开归档,并将归档里面的文件写到文件系统,可以使用extract或者extractall
with tarfile.open("1.tar", "r") as t:
    # 解开哪个文件,到哪个目录
    t.extract("1.txt", r"C:\python37")
    import os
    print("1.txt" in os.listdir(r"C:\python37"))  # True
    os.remove(r"C:\python37\1.txt")
    print("1.txt" in os.listdir(r"C:\python37"))  # False
     
 
# 还可以使用extractall,直接传入指定目录。会将归档里面的所有文件解压至该目录

Creating a new archive

import tarfile
 
'''
要创建一个新归档,需要用模式w打开TarFile
'''
with tarfile.open("python.tar", "w") as t:
    # 需要归档哪些文件,直接add进去就可以了
    t.add(r"C:\python37\python.exe")
    t.add(r"C:\python37\python37.dll")
    t.add(r"C:\python37\python3.dll")
 
 
with tarfile.open("python.tar", "r") as t:
    for m in t.getmembers():
        print(m.name, m.size)
    '''
    python37/python.exe 99992
    python37/python37.dll 3844760
    python37/python3.dll 59032
    '''

Archiving members using the candidate name

import tarfile
 
'''
向归档增加一个文件时,通过gettarinfo得到TarInfo对象,再传递给addfile
'''
 
with tarfile.open("python.tar", "w") as t:
    # 不管open打开的是什么tar文件,都可以用实例调用gettarinfo方法,传入文件名。
    # 同时也可以通过arcname自己指定文件名,那么此时增加的文件就叫sss.py而不是3.py
    info = t.gettarinfo("3.py", arcname="sss.py")
    print(info.name)  # sss.py
    # 此时传入info就将文件添加进去了,注意不能直接传入文件名,需要转换成TarInfo对象才可以
    t.addfile(info)
 
with tarfile.open("python.tar", "r") as t:
    print(t.getnames())  # ['sss.py']
# 咦,有人会问,之前的文件呢?不好意思被清空了,因为我们指定的模式是w,会清空的。

Write data from non-source files

import tarfile
import io

'''
有时候可能需要将数据从内存直接写到一个归档里面去,并不是先将数据写到一个文件,然后再把这个文件增加到归档
'''
text = "this is the data to write to archive"
data = text.encode("utf-8")

with tarfile.open("addfile.tar", "w") as t:
    # 这个文件是没有的,但是我们可以构造出来这个文件TarFile对象
    # 文件存在,可以使用t.gettarinfo得到TarInfo对象,不存在可以构造
    info = tarfile.TarInfo("exists.txt")
    # 通过addfile方法,其实可以接收两个参数,一个是TarInfo对象,另一个是包含了内容的BytesIO缓存区
    # 那么会自动将BytesIO缓冲区里面的内容放到文件(会自动创建)里面,然后添加到归档里面去
    t.addfile(info, io.BytesIO(data))

with tarfile.open("addfile.tar", "r") as t:
    print(t.getmembers())  # [<TarInfo 'exists.txt' at 0x2857edb98e0>]

Append to archive

import tarfile
import io
 
# 说白了还是w和a的区别,追加的话,不要使用w,而是使用a 
def w():
    with tarfile.open("addfile.tar", "w") as t:
        t.add(r"C:\python37\python.exe")
 
    with tarfile.open("addfile.tar", "w") as t:
        t.add(r"C:\python37\python37.dll")
 
    with tarfile.open("addfile.tar", "w") as t:
        t.add(r"C:\python37\python3.dll")
 
    with tarfile.open("addfile.tar", "r") as t:
        print(t.getnames())
 
 
w()  # ['python37/python3.dll']
 
 
def a():
    with tarfile.open("addfile.tar", "a") as t:
        t.add(r"C:\python37\python.exe")
 
    with tarfile.open("addfile.tar", "a") as t:
        t.add(r"C:\python37\python37.dll")
 
    with tarfile.open("addfile.tar", "a") as t:
        t.add(r"C:\python37\python3.dll")
 
    with tarfile.open("addfile.tar", "r") as t:
        print(t.getnames())
 
 
a()  # ['python37/python3.dll', 'python37/python.exe', 'python37/python37.dll', 'python37/python3.dll']

zipfile: zip archive access

Introduction

zipfile module can be used to read and write zip archive file, the format and therefore the PC program PKZIP popularity

Test zip file

import zipfile
 
'''
is_zipfile返回一个布尔值,用来测试传入的文件名是否指向一个合法的zip文件
'''
print(zipfile.is_zipfile("1.zip"))  # True
 
# 和tarfile不同的是,如果文件不存在,tarfile会报错,但是zipfile不会报错,而是返回False
print(zipfile.is_zipfile("不存在的文件.zip"))  # False

Reading metadata from the archive inside

import zipfile
 
'''
使用ZipFile类可以处理一个zip归档。
这个类支持一些方法来读取现有归档的有关数据,还可以通过增加更多的文件来修改归档。
'''
with zipfile.ZipFile("1.zip", "r") as zf:
    print(zf.namelist())  # ['1.txt']
 
# namelist和tarfile的getnames类似,只是获取文件名
# 如果想像tarfile.open的getmembers一样,获取详细信息,可以使用infolist
# 当然可以像tarfile.open的getmember,传入文件名获取指定文件的信息,这对应于zip.ZipFile的getinfo方法
with zipfile.ZipFile("1.zip", "r") as zf:
    print(zf.infolist())  # [<ZipInfo filename='1.txt' external_attr=0x20 file_size=0>]
    f = zf.getinfo("1.txt")
    print(f.filename)  # 1.txt
    print(f.file_size)  # 0

Extract files from the archive

import zipfile
 
'''
要访问归档里面的成员,可以使用read方法,传入文件名即可。类似于tarfile.open下的extractfile
不同的是,zipfile里面的read得到的是不是文件句柄,而是直接就把文件读出来了
'''
with zipfile.ZipFile("1.zip", "r") as zf:
    # 由于文件为空格,所以啥也没有
    print(zf.read("1.txt"))  # b''

Creating a new archive

import zipfile
 
'''
要创建一个新归档,需要使用w实例化ZipFile。
其会删除所有现有的文件,并创建一个新归档。要增加文件,可以使用write方法
'''
with zipfile.ZipFile("python.zip", "w") as zf:
    zf.write(r"C:\python37\python.exe")
    zf.write(r"C:\python37\python37.dll")
    zf.write(r"C:\python37\python3.dll")
 
with zipfile.ZipFile("python.zip", "r") as zf:
    print(zf.namelist())  # ['python37/python.exe', 'python37/python37.dll', 'python37/python3.dll']
 
# 默认地,归档的内容不会被压缩
# 想要增加压缩,可以指定compresslevel,前提需要zlib模块
 
 
with zipfile.ZipFile("python.zip", "w") as zf:
    zf.write(r"C:\python37\python.exe", compresslevel=9)
    zf.write(r"C:\python37\python37.dll", compresslevel=9)
    zf.write(r"C:\python37\python3.dll", compresslevel=9)
     
# 这一次归档成员会被压缩

Archiving members using the candidate name

import zipfile
 
'''
还可以在添加文件的时候,单独指定成员名。
像tarfile:需要先调用t.gettarinfo("old.py", arcname="new.py"),然后通过t.addFile
但是对ZipFile,直接还是zf.write("old.py", arcname="new.py")即可,只需要在write里面添加参数
'''
with zipfile.ZipFile("python.zip", "w") as zf:
    zf.write(r"C:\python37\python.exe", "PYTHON.EXE")
    zf.write(r"C:\python37\python37.dll", "PYTHON37.DLL")
    zf.write(r"C:\python37\python3.dll", "PYTHON3.DLL")
 
with zipfile.ZipFile("python.zip", "r") as zf:
    print(zf.namelist())  # ['PYTHON.EXE', 'PYTHON37.DLL', 'PYTHON3.DLL']

Write data from non-source files

import zipfile
 
'''
有时,可能需要使用其他来源的数据(而非来自一个现有文件)写一个zip归档。
不是先将数据写到一个文件里,再将文件增加到zip归档。
'''
msg = b"this data did not exist in a file"
 
with zipfile.ZipFile("writestr.zip", "w") as zf:
    zf.writestr("from_string.txt", msg)
'''
from_string.txt这个文件并不存在,那么writestr会先创建这个文件,再将msg写到里面去,然后将文件增加到zip归档。
这个也比tarfile要简单。
tarfile是先调用tarfile.TarInfo传入文件名创建一个TarInfo实例,
然后将字符串转成bytes格式放到BytesIO里面,然后调用addFile函数传入TarInfo实例和BytesIO缓存
 
zipfile直接调用writestr方法就简单多了
'''
with zipfile.ZipFile("writestr.zip", "r") as zf:
    print(zf.namelist())  # ['from_string.txt']
    print(zf.read("from_string.txt"))  # b'this data did not exist in a file'

Append to archive

import zipfile
 
'''
和tarfile一样,只需要把mode由w改成a即可。
'''

Guess you like

Origin www.cnblogs.com/traditional/p/11876817.html
Recommended