python: python handles tar.bz2 files in memory

If the tar.bz2 file is downloaded through the network, you can directly read the file content after decompressing in the memory, instead of caching the file locally and then decompressing it before reading it, which can save IO.

For the method of processing tar files compressed by gzip, see: https://stackoverflow.com/questions/15352668/download-and-decompress-gzipped-file-in-memory

How to deal with bz2 compressed tar files: https://stackoverflow.com/questions/46291529/how-to-decompress-tar-bz2-in-memory-with-python

import requests
import tarfile
from io import BytesIO

url = "www.google.com"   # the url you get tar.bz2 file from, need to change according to your application.
filename = "res_test.csv"  # the filename in your tar.bz2 file.


def decompress_tar_bz2_from_net(url, filename):
    """
    decompress the tar.bz2 format file in memory, instead of buffer it on disk
    and then decompress.
    :param url:
    :param filename:
    :return:
    """
    fileobj = BytesIO(requests.get(url).content)
    contents = tarfile.open(fileobj=fileobj).extractfile(filename).read()
    return contents

For more usage methods, see: https://github.com/buxizhizhoum/tool_scripts/blob/master/app/bin/decompress_in_memory.py

Guess you like

Origin blog.csdn.net/zhizhengguan/article/details/130428215