When learning Linux systems, you will learn several compression tools: gzip, bzip2, zip, xz, and related decompression tools. For the use of these tools and the comparison of the compression ratio and compression time between them, see: Learning Archive Compression Tools in Linux
So what is Pigz? Simply put, it is gzip that supports parallel compression. Pigz uses the current number of logical CPUs for concurrent compression by default. If the number cannot be detected, it defaults to 8 concurrent threads. You can also use -p to specify the number of threads. It should be noted that its CPU usage is relatively high.
Official website: http://zlib.net/pigz
Without further ado, let's start testing.
1
|
$ yum install pigz
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
|
$ pigz --help
Usage: pigz [options] [files ...]
will compress files in place, adding the suffix '.gz'. If no files are
specified, stdin will be compressed to stdout. pigz does what gzip does,
but spreads the work over multiple processors and cores when compressing.
Options:
-0 to -9, -11 Compression level (11 is much slower, a few % better)
--fast, --best Compression levels 1 and 9 respectively
-b, --blocksize mmm Set compression block size to mmmK (default 128K)
-c, --stdout Write all processed output to stdout (won't delete)
-d, --decompress Decompress the compressed input
-f, --force Force overwrite, compress .gz, links, and to terminal
-F --first Do iterations first, before block split for -11
-h, --help Display a help screen and quit
-i, --independent Compress blocks independently for damage recovery
-I, --iterations n Number of iterations for -11 optimization
-k, --keep Do not delete original file after processing
-K, --zip Compress to PKWare zip (.zip) single entry format
-l, --list List the contents of the compressed input
-L, --license Display the pigz license and quit
-M, --maxsplits n Maximum number of split blocks for -11
-n, --no-name Do not store or restore file name in/from header
-N, --name Store/restore file name and mod time in/from header
-O --oneblock Do not split into smaller blocks for -11
-p, --processes n Allow up to n compression threads (default is the
number of online processors, or 8 if unknown)
-q, --quiet Print no messages, even on error
-r, --recursive Process the contents of all subdirectories
-R, --rsyncable Input-determined block locations for rsync
-S, --suffix .sss Use suffix .sss instead of .gz (for compression)
-t, --test Test the integrity of the compressed input
-T, --no-time Do not store or restore mod time in/from header
-v, --verbose Provide more verbose output
-V --version Show the version of pigz
-z, --zlib Compress to zlib (.zz) instead of gzip format
-- All arguments after "--" are treated as files
|
original directory size
1
2
|
$ du -sh /tmp/hadoop
2.3G /tmp/hadoop
|
Compress with gzip (1 thread)
1
2
3
4
5
6
7
8
9
|
# 压缩耗时;
$ time tar -zvcf hadoop.tar.gz /tmp/hadoop
real 0m49.935s
user 0m46.205s
sys 0m3.449s
# 压缩大小;
$ du -sh hadoop.tar.gz
410M hadoop.tar.gz
|
Decompress gzip compressed files
1
2
3
4
5
|
$ time tar xf hadoop.tar.gz
real 0m17.226s
user 0m14.647s
sys 0m4.957s
|
Use pigz compression (4 threads)
1
2
3
4
5
6
7
8
9
|
# 压缩耗时;
$ time tar -cf - /tmp/hadoop | pigz -p 4 > hadoop.tgz
real 0m13.596s
user 0m48.181s
sys 0m2.045s
# 压缩大小;
$ du -sh hadoop.tgz
411M hadoop.tgz
|
Unzip the pigz file
1
2
3
4
5
|
$ time pigz -p 4 -d hadoop.tgz
real 0m17.508s
user 0m12.973s
sys 0m5.037s
|
It can be seen that the time of pigz is more than two-thirds faster than that of gzip, but the CPU consumption is several times that of gzip. I am only a virtual machine with 4 threads here. Of course, the CPU usage of pigz is also very impressive. 100%. Therefore, it is very suitable to use pigz in scenarios that require high compression efficiency, but are not affected by high CPU consumption in a short period of time.
Of course, pigz is not faster as the number of threads increases. There is also a bottleneck area. Someone on the Internet has compared it: the concurrent 8 threads are 41.2% higher than 4 threads, the 16 threads are 27.9% higher than 8 threads, and the 32 threads are higher than 16 threads. 3%. It can be seen that the higher the number of threads, the slower the speed increase. More can be tested by yourself.
transferred from
Linux command: pigz multi-threaded compression tool - operation and maintenance
http://www.ywnds.com/?p=10332
refer to
tar+pigz+ssh realizes big data compression and
transmission