dd, split, csplit command

The most commonly used in the Linux file generation and slice tool is dd, it features a more comprehensive, but can not extract the file data in units can not be directly file sharing (unless the aid circulation) by size or number of rows. The other two data segmentation tools split and csplit can compare easily implement these requirements. csplit is split upgraded version.

When dealing with very large files, a very efficient idea is to a large file into a plurality of small pieces of the file, and then each small file operations by multiple processes / threads, the total number of last merged residence. Like the sort command to sort it in the realization of the underlying algorithm involves a large file to a temporary cut into multiple small files.

dd command                    

From the specified file if the read data is written to the file of the specified . Use bs read and write the specified block size, use count specifies the number of blocks of data reading and writing, and bs count is multiplied by the total size of the file. Ignore read skip can be specified if the specified file before the number of blocks, seek ignored when specified before the number of blocks specified file is written to.

[root@master nginx]# dd if=/dev/zero of=/tmp/abc.1 bs=1M count=20
[root@master nginx]# ls /tmp/abc.1 -lh
-rw-r--r-- 1 root root 20M 11月 29 16:45 /tmp/abc.1

if is the input file, of a output file; bs there c (1byte), w (2bytes), b (512bytes), kB (1000bytes), K (1024bytes), MB (1000), M (1024) and GB, G and other types of units. Therefore, do not arbitrarily after the unit with the letter B.

The size of the existing file access.log.bak 271M, needs to be restored after it segmentation, segmentation of the first small file size 20M.

[the root Master @ ~] # dd  IF = / the root / access.log.bak of = / the root / access.txt BS = 1M COUNT = 20 
records 20 + 0 read
Record the 20 + 0 written
20971520 bytes ( 21 is MB) have been copied, 0.651758 seconds, 32.2 MB / sec
[root@master ~]# ls -lh access.txt
-rw-r--r-- 1 root root 20M 11月 29 16:49 access.txt

The second generation of small files, because the files do not know the second small size, it does not specify the count option. Since the second small files on 20M at the beginning of segmentation from, so the need to ignore the first 20M access.log.bak. Suppose bs = 1M, the number of data blocks 20 is thus out of the skip.

[the root Master @ ~] # dd  IF = / the root / access.log.bak of = / the root / Skip = 1M access1.txt BS = 20 is 
recorded 250 + 1 read
Recorded 250 + 1 written
263095966 bytes ( 263 MB) have been copied, 10.0785 seconds, 26.1 MB / sec
[root@master ~]# ls access1.txt -lh
-rw-r--r-- 1 root root 251M 11月 29 16:54 access1.txt

split command                          

Function split tool is to cut the file into multiple small files. As to generate multiple small files, be sure to specify the file unit segmentation, segmentation and supported by line segmentation by file size, and the need to solve the problem of small files named. For example, the file name prefix, suffix. If you do not explicitly specify a prefix, the default prefix is ​​"x".

The following is a description of the command syntax:

split [OPTION]... [INPUT [PREFIX]]

-a N: generating a suffix of length N, default = N 2 
- B N: N each small file, i.e. segmentation file by file size. Support K, M, G, T (conversion unit 1024) or KB, MB, GB (1000 conversion unit) or the like, the default bytes
 - L N: each small file has N rows, i.e. rows file segmentation
 -d N: generating a specified value instead of the default format suffix letter suffix, from the value N, the default is 0. For example suffix length of two 01/ 02 / 03 
--additional-suffix = String : append additional suffix for each small file, such as adding " .log " . Some older versions do not support this option, CentOS 7 already support .2.
-n CHUNKS: The file is divided according to the specified CHUNKS way. CHUNKS are effective form (see below specific usage): N, L / N, L / K / N, K / N, R & lt / N, R & lt / K / N
 - filter the CMD: no cut directly output to a file , but as an input pipe after cutting, the cut through the pipeline to pass data CMD executed. If you need to specify the file, then split automatically uses $ FILE variable. See below example

INPUT: Specifies an input file to be sliced, such as standard input to be sliced, using " - " 
the PREFIX: specified prefix small files, if not specified, it defaults to " X "

 Basic Usage                                

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/liujunjun/p/11958956.html