I have some files and I want to see if the ones in one directory are the same files as the ones in the other directory. The problem is that they are gzipped in one of the directories. The only way to do this that I know of is to decompress all of them, run diff
in bash, then compress the file again. There's ~200 files that are each about 5 GB so this is not an option I want to do if possible.
Is there another way to do this? Perhaps in Python (3)? I found this module: https://docs.python.org/3/library/filecmp.html
I'm not sure how I can compare a gzip file with a regular file since one will be read in as bytes and the other as unicode?
import gzip, filecmp
path_1 = "path/to/query_1.txt"
path_2 = "path/to/query_2.txt.gz"
In bash
diff path/to/query_1.txt <(zcat path/to/query_2.txt.gz)
<(command)
is a command redirection that connects the enclosed command's standard output to a filename that can then be opened and read from in another process.
It's not understood by bare bones /bin/sh
, but bash
, zsh
and ksh
all understand it.