Linux cut large files

The need for their daily work log file analysis, when the log file is too large, Linux use vim, cat, grep, awk and other tools for large log files for analysis will become a nightmare, in particular in:

  • Slow pace of implementation, the contents of the file to be loaded into memory, involving a large number of disk read;

  • Consume too many resources, the log file space needs at least a 4G 4G of memory, more of it?

  • SUMMARY difficult to reuse, the output file will be processed when the document analysis filter pipe, it is difficult to reuse large files;

  • File transfer difficulties, need to transfer large files to other people to analyze file is too large, the whole amount of bandwidth consumed large

1 Check out the pain of large files

Big data offline processing framework hadoop can handle these scenarios, however hadoop also takes a long time to compute, but also need to write MapReduce task, admittedly bring greater difficulty and challenges of this approach. hadoop is cut into large files by multiple small files, mapreduce by multiple tasks to do parallel processing, Linux provides an easy to use split tool, you can achieve the file into a plurality of small files.

split provides two ways to cut the file:

  • The number of cut lines, designated by the number of lines to be cut -l parameter

  • The size of the cut, cut to size by the need to specify parameter -b

2.1 Cutting the number of rows

No local large log, log Take small demonstration, specify the file name split-line, -d parameters are displayed digitally

[root@iZ1la3d1xbmukrZ ~]# wc -l err_20190907.log
3427 err_20190907.log
[root@iZ1la3d1xbmukrZ ~]# split -l 300 -d --verbose err_20190907.log split-line
creating file ‘split-line00’
creating file ‘split-line01’
creating file ‘split-line02’
creating file ‘split-line03’
creating file ‘split-line04’
creating file ‘split-line05’
creating file ‘split-line06’
creating file ‘split-line07’
creating file ‘split-line08’
creating file ‘split-line09’
creating file ‘split-line10’
creating file ‘split-line11’
[root@iZ1la3d1xbmukrZ ~]# ls -lh split-line0[0-9]
-rw-r--r-- 1 root root 28K Mar 11 19:50 split-line00
-rw-r--r-- 1 root root 27K Mar 11 19:50 split-line01
-rw-r--r-- 1 root root 24K Mar 11 19:50 split-line02
-rw-r--r-- 1 root root 24K Mar 11 19:50 split-line03
-rw-r--r-- 1 root root 23K Mar 11 19:50 split-line04
-rw-r--r-- 1 root root 18K Mar 11 19:50 split-line05
-rw-r--r-- 1 root root 26K Mar 11 19:50 split-line06
-rw-r--r-- 1 root root 25K Mar 11 19:50 split-line07
-rw-r--r-- 1 root root 24K Mar 11 19:50 split-line08
-rw-r--r-- 1 root root 24K Mar 11 19:50 split-line09

After the specified number of rows will automatically do the cutting, which reached 300 after the automatic cutting line automatically digitally named by -d parameter file name, after cutting, each file size of 24K, this time to parse the file will be a lot easier, while the number of files will be a lot of ways you can increase the number of lines cut, convenient analysis.

2.2 according to the size of the cut

In addition to cutting in accordance with the number of rows, Split supports file size by cutting, cutting by specifying the -b parameter specifies file size, file size support units K, M, G, T, P, E, Z, as follows to cut 30K presentations cutting process

[root@iZ1la3d1xbmukrZ ~]# split -b 30K -d --verbose err_20190907.log split-size
creating file ‘split-size00’
creating file ‘split-size01’
creating file ‘split-size02’
creating file ‘split-size03’
creating file ‘split-size04’
creating file ‘split-size05’
creating file ‘split-size06’
creating file ‘split-size07’
creating file ‘split-size08’
creating file ‘split-size09’
[root@iZ1la3d1xbmukrZ ~]# ll -h
total 916K
-rw-r--r--  1 root root 273K Mar 11 19:47 err_20190907.log
-rw-r--r--  1 root root  28K Mar 11 19:50 split-line00
-rw-r--r--  1 root root  27K Mar 11 19:50 split-line01
-rw-r--r--  1 root root  24K Mar 11 19:50 split-line02
-rw-r--r--  1 root root  24K Mar 11 19:50 split-line03
-rw-r--r--  1 root root  23K Mar 11 19:50 split-line04
-rw-r--r--  1 root root  18K Mar 11 19:50 split-line05
-rw-r--r--  1 root root  26K Mar 11 19:50 split-line06
-rw-r--r--  1 root root  25K Mar 11 19:50 split-line07
-rw-r--r--  1 root root  24K Mar 11 19:50 split-line08
-rw-r--r--  1 root root  24K Mar 11 19:50 split-line09
-rw-r--r--  1 root root  24K Mar 11 19:50 split-line10
-rw-r--r--  1 root root 9.8K Mar 11 19:50 split-line11
-rw-r--r--  1 root root  30K Mar 11 19:52 split-size00
-rw-r--r--  1 root root  30K Mar 11 19:52 split-size01
-rw-r--r--  1 root root  30K Mar 11 19:52 split-size02
-rw-r--r--  1 root root  30K Mar 11 19:52 split-size03
-rw-r--r--  1 root root  30K Mar 11 19:52 split-size04
-rw-r--r--  1 root root  30K Mar 11 19:52 split-size05
-rw-r--r--  1 root root  30K Mar 11 19:52 split-size06
-rw-r--r--  1 root root  30K Mar 11 19:52 split-size07
-rw-r--r--  1 root root  30K Mar 11 19:52 split-size08
-rw-r--r--  1 root root 2.8K Mar 11 19:52 split-size09
drwxr-xr-x 13 root root 4.0K Mar  3 18:03 utils
[root@iZ1la3d1xbmukrZ ~]#

More than 2.3 file merge

split cut a large file into multiple small files, if multiple small files into one file you need to how to deal with it?
You can use file redirection manner, following the presentation of two small files into one file

 Slide the right to view the full command

[root@iZ1la3d1xbmukrZ ~]# cat split-line00 split-line01 >two-file-merge
[root@iZ1la3d1xbmukrZ ~]# ll
total 624
-rw-r--r--  1 root root 279331 Mar 11 19:47 err_20190907.log
-rw-r--r--  1 root root  28017 Mar 11 19:50 split-line00
-rw-r--r--  1 root root  27386 Mar 11 19:50 split-line01
-rw-r--r--  1 root root  24354 Mar 11 19:50 split-line02
-rw-r--r--  1 root root  24409 Mar 11 19:50 split-line03
-rw-r--r--  1 root root  23434 Mar 11 19:50 split-line04
-rw-r--r--  1 root root  18207 Mar 11 19:50 split-line05
-rw-r--r--  1 root root  26139 Mar 11 19:50 split-line06
-rw-r--r--  1 root root  25057 Mar 11 19:50 split-line07
-rw-r--r--  1 root root  24536 Mar 11 19:50 split-line08
-rw-r--r--  1 root root  23926 Mar 11 19:50 split-line09
-rw-r--r--  1 root root  23863 Mar 11 19:50 split-line10
-rw-r--r--  1 root root  10003 Mar 11 19:50 split-line11
-rw-r--r--  1 root root  55403 Mar 11 19:51 two-file-merge

Merger by reading the file the way + output redirection, there will be performance for large files the same problem, it is recommended to use as needed

Guess you like

Origin www.cnblogs.com/dalianpai/p/12464980.html