Text-processing tools - text analysis tool

 

 

 

 

A text analysis tool

 

Text statistics: wc
organize text: sort
compare files: diff and patch

 

 

 

 

(A) collect statistical data --wc text


Counting the total number of words, the total number of rows, number of characters and the total number of bytes, or a data file can run on STDIN.

Common options
-l counting only the number of lines
-w only count the total number of words
-c only count the total number of bytes
-m only count the total number of characters
-L shows the length of the longest line in the file

 

 

(1) shows the total number of words, the total number of rows, the total number of bytes

[root@centos72 ~]# wc  /app/passwd 
 19  27 841 /app/passwd [root@centos72 ~]# wc /app/f1 4 4 12 /app/f1 [root@centos72 ~]# ll /app/passwd -rw-r--r--. 1 root root 841 May 7 18:00 /app/passwd [root@centos72 ~]# ll /app/f1 -rw-r--r--. 1 root root 12 May 7 20:35 /app/f1

 

 

 

 

 

(2) wc binding conduit

wc as head and interactive commands, read the keyboard input, then the pipe may be combined

As long as the words are separated by spaces

[root@centos72 ~]# w  |  wc
      5      41     336 [root@centos72 ~]# w 20:53:56 up 8:33, 3 users, load average: 0.16, 0.05, 0.06 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT root tty1 13Jan19 8:12m 0.28s 0.28s -bash root pts/0 192.168.137.1 15:13 2:39m 0.10s 0.10s -bash root pts/1 192.168.137.1 12:41 4.00s 2.00s 0.02s w

 

 

 

 

 

 

[root@centos72 ~]# echo  0  1  2  3  >  /app/f4 [root@centos72 ~]# /app/f4 | wc -bash: /app/f4: Permission denied 0 0 0 [root@centos72 ~]# cat /app/f4 | wc 1 4 8 [root@centos72 ~]# cat /app/f4 0 1 2 3 [root@centos72 ~]# echo 0 1 2 3 > /app/f5 [root@centos72 ~]# cat /app/f5 0 1 2 3 [root@centos72 ~]# cat /app/f5 | wc 1 4 8

 

 

 

 

 

(3) individual statistical lines, words, bytes, and total number of characters

[root@centos72 ~]# cat  /app/passwd  |  wc  -l
19 [root@centos72 ~]# cat /app/passwd | wc -w 27 [root@centos72 ~]# cat /app/passwd | wc -c 841 [root@centos72 ~]# cat /app/passwd | wc -m 841

 

 

 

 

 

Note bytes and characters are not the same, the byte is occupied disk space.

A Chinese character is a character, but it is several bytes

Is 4 bytes, characters are 2

[root@centos72 ~]# echo   我   >   f1
[root@centos72 ~]#wc  f1
1 1 4 f1 

 

 

 

 

[root@centos72 ~]# wc  -m  f1
2 f1

 

 

 

 

Using a binary view

[root@centos72 ~]# hexdump  -C  f2
00000000  e4 bd a0 0a                                       |....|
00000004

 

 

 

 

See how many people sign-on system

[root@centos72 ~]# who
root     tty1         2019-01-13 00:35 root pts/0 2019-05-07 15:13 (192.168.137.1) root pts/1 2019-05-07 12:41 (192.168.137.1) [root@centos72 ~]# who | wc -l 3

 

 

 

 

 

(4) a display file -L length of the longest line

[root@centos72 ~]# who
root     tty1         2019-01-13 00:35 root pts/0 2019-05-07 15:13 (192.168.137.1) root pts/1 2019-05-07 12:41 (192.168.137.1) [root@centos72 ~]# who | wc -l 3 [root@centos72 ~]# who | wc -L 54

 

 

 

 

 

 

 

 

(B) the text sort sort

 

The finishing off the text shown in the STDOUT, without changing the original file
sort [options] file (s)
common options
-r perform the reverse direction (top to bottom) sort
-n perform numerical size Finishing
-f option to ignore (fold) character string character case
-u option (unique, unique) to delete duplicate rows in the output
-t c c option to use as the field delimiter
X column option -k X separated in accordance with the use of c characters can be used multiple times to organize

 

 

(1) The default is to read the keyboard input, according to the character sort

Separator is a colon, -k represents the column

[root@centos72 ~]# sort  -t:   -k3  /app/passwd 
root:x:0:0:root:/root:/bin/bash wang:x:1000:1000:wang:/home/wang:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin bin:x:1:1:bin:/bin:/sbin/nologin games:x:12:100:games:/usr/games:/sbin/nologin ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin mail:x:8:12:mail:/var/spool/mail:/sbin/nologin dbus:x:81:81:System message bus:/:/sbin/nologin postfix:x:89:89::/var/spool/postfix:/sbin/nologin polkitd:x:999:998:User for polkitd:/:/sbin/nologin nobody:x:99:99:Nobody:/:/sbin/nologin

 

 

 

 

 

(2) -n perform numerical size sorting

[root@centos72 ~]# sort -n  -t:   -k3  /app/passwd 
root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt mail:x:8:12:mail:/var/spool/mail:/sbin/nologin operator:x:11:0:operator:/root:/sbin/nologin games:x:12:100:games:/usr/games:/sbin/nologin ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin dbus:x:81:81:System message bus:/:/sbin/nologin postfix:x:89:89::/var/spool/postfix:/sbin/nologin nobody:x:99:99:Nobody:/:/sbin/nologin systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin polkitd:x:999:998:User for polkitd:/:/sbin/nologin wang:x:1000:1000:wang:/home/wang:/bin/bash

 

 

 

 

 

 

 (3) -r perform the reverse direction (top to bottom) finishing

[root@centos72 ~]# sort   -nr   -t:   -k3  /app/passwd 
wang:x:1000:1000:wang:/home/wang:/bin/bash polkitd:x:999:998:User for polkitd:/:/sbin/nologin systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin nobody:x:99:99:Nobody:/:/sbin/nologin postfix:x:89:89::/var/spool/postfix:/sbin/nologin dbus:x:81:81:System message bus:/:/sbin/nologin sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin games:x:12:100:games:/usr/games:/sbin/nologin operator:x:11:0:operator:/root:/sbin/nologin mail:x:8:12:mail:/var/spool/mail:/sbin/nologin halt:x:7:0:halt:/sbin:/sbin/halt shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown sync:x:5:0:sync:/sbin:/bin/sync lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin bin:x:1:1:bin:/bin:/sbin/nologin root:x:0:0:root:/root:/bin/bash

 

 

 

 

 

[root@centos72 ~]# sort     -t:   -k1  /app/passwd 
adm:x:3:4:adm:/var/adm:/sbin/nologin bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin dbus:x:81:81:System message bus:/:/sbin/nologin ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin games:x:12:100:games:/usr/games:/sbin/nologin halt:x:7:0:halt:/sbin:/sbin/halt lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin mail:x:8:12:mail:/var/spool/mail:/sbin/nologin nobody:x:99:99:Nobody:/:/sbin/nologin operator:x:11:0:operator:/root:/sbin/nologin polkitd:x:999:998:User for polkitd:/:/sbin/nologin postfix:x:89:89::/var/spool/postfix:/sbin/nologin root:x:0:0:root:/root:/bin/bash shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin sync:x:5:0:sync:/sbin:/bin/sync systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin wang:x:1000:1000:wang:/home/wang:/bin/bash

 

 

 

 

 

It shows the user name and ID, and the ID is sorted by size

[root@centos72 ~]# cut  -d:  -f1,3   /app/passwd  |  sort  -nr   -t:  -k2
wang:1000 polkitd:999 systemd-network:192 nobody:99 postfix:89 dbus:81 sshd:74 ftp:14 games:12 operator:11 mail:8 halt:7 shutdown:6 sync:5 lp:4 adm:3 daemon:2 bin:1 root:0

 

 

 

 

 

Create a file

[root@centos72 ~]# cat  >  f3
1
2
3
2 4 5 6 2 3 5 ^C^C [root@centos72 ~]# cat f3 1 2 3 2 4 5 6 2 3 5

 

 

 

 

 

 

The default sort order in accordance with the character of

[root@centos72 ~]# sort f3

1
2
2
2 3 3 4 5 5 6

 

 

 

 

 

 

(4) -u option (unique, unique) to delete duplicate rows output

Delete duplicate numbers

[root@centos72 ~]# sort  -u  f3

1
2
3
4 5 6

 

 

 

 

[root@centos72 ~]# echo   11 >>  f3
[root@centos72 ~]# echo   22 >> f3 [root@centos72 ~]# echo 33 >> f3 [root@centos72 ~]# cat f3 1 2 3 2 4 5 6 2 3 5 11 22 33 [root@centos72 ~]# sort -u f3 1 11 2 22 3 33 4 5 6

 

 

 

 

 

 

Sort according to numbers

[root@centos72 ~]# sort  -nu  f3

1
2
3
4 5 6 11 22 33 [root@centos72 ~]# sort -un f3 1 2 3 4 5 6 11 22 33

 

 

 

 

 

 

(C) uniq


uniq command: after dropping from the input contact duplicate rows
uniq [the OPTION] ... [the FILE] ...
-C: displaying the number of times each line is repeated
-d: Only duplicate rows through
-u: only never duplicate rows displayed
continuously and exactly the same party is repeated
often used in conjunction with commands and sort: sort userlist.txt | uniq -c

 

 

Create a file

(1) after dropping the input contact from duplicate row

[root@centos72 ~]# cat  >  f4
a
b
a
a
bb
bb
c
bb
cc
cc
^C
[root@centos72 ~]# cat f4 a b a a bb bb c bb cc cc

 

 

 

 

[root@centos72 ~]# uniq  f4
a
b
a
bb
c
bb
cc

 

 

 

 

 

(2) -c: shows the number of lines per recurring

[root@centos72 ~]# uniq -c  f4
      1 a
      1 b 2 a 2 bb 1 c 1 bb 2 cc

 

 

 

 

 

(3) -u: Display only never duplicate rows

[root@centos72 ~]# uniq -u  f4
a
b
c
bb

 

 

 

 

(4) -d: Display only been repeated rows

[root@centos72 ~]# uniq -d  f4
a
bb
cc

 

 

 

 

(5) using the tunnel

A space as a separator

[root@centos72 ~]# cut  -d" "    -f1  /var/log/httpd/access_log 
192.168.137.1 192.168.137.1 192.168.137.1 192.168.137.1 192.168.137.1 192.168.137.1 192.168.137.1 192.168.137.1 192.168.137.1 192.168.137.1 [root@centos72 ~]# cut -d" " -f1 /var/log/httpd/access_log | uniq 192.168.137.1

 

 

 

 

 

 

 

Example 1: Remove the access log which visited three or IP address, and the descending order

 

 

To start the following services can, first of all to go to the snapshot.

Failure to do so can only be a snapshot to save the file to your computer

[root@centos72 ~]# cut  -d" "    -f1  /var/log/httpd/access_log   |   sort -n  |  uniq -c  | sort -nr | head | tr -s ' ' | cut -d " " -f3 192.168.137.1

 

 

 

 

 

 

 

A remote file transfer

[root@centos72 ~]# rz

[root@centos72 ~]# ls
aaa  aa.txt  access_log  anaconda-ks.cfg  f1  f2  f3  f4
[root@centos72 ~]# ll  -ht
total 14M
-rw-r--r--. 1 root root 25 May 7 22:12 f4 -rw-r--r--. 1 root root 30 May 7 22:04 f3 -rw-r--r--. 1 root root 4 May 7 21:08 f2 -rw-r--r--. 1 root root 4 May 7 21:03 f1 -rw-r--r--. 1 root root 27 May 7 19:11 aa.txt -rw-r--r--. 1 root root 9 May 7 13:28 aaa -rw-------. 1 root root 1.6K Jan 13 00:22 anaconda-ks.cfg -rw-r--r--. 1 root root 14M Dec 1 15:45 access_log [root@centos72 ~]# ll -hS total 14M -rw-r--r--. 1 root root 14M Dec 1 15:45 access_log -rw-------. 1 root root 1.6K Jan 13 00:22 anaconda-ks.cfg -rw-r--r--. 1 root root 30 May 7 22:04 f3 -rw-r--r--. 1 root root 27 May 7 19:11 aa.txt -rw-r--r--. 1 root root 25 May 7 22:12 f4 -rw-r--r--. 1 root root 9 May 7 13:28 aaa -rw-r--r--. 1 root root 4 May 7 21:03 f1 -rw-r--r--. 1 root root 4 May 7 21:08 f2

 

 

 

 

 

 

[root@centos72 ~]#  cut  -d" "    -f1 access_log   |   sort -n  |  uniq -c  | sort -nr 159091 172.18.56.3 4004 192.168.27.6 24 172.18.0.100

 

 

 

 

 

 If the number of host access no more than 10, then you can not add head

[root@centos72 ~]#  cut  -d" "    -f1 access_log   |   sort -n  |  uniq -c  | sort -nr | head 159091 172.18.56.3 4004 192.168.27.6 24 172.18.0.100

 

 

 

 

 

Remove the extra space

[root@centos72 ~]#  cut  -d" "    -f1 access_log   |   sort -n  |  uniq -c  | sort -nr | head | tr -s ' ' 159091 172.18.56.3 4004 192.168.27.6 24 172.18.0.100

 

 

 

 

 

A space as delimiter, take the first three fields

[root@centos72 ~]# cut  -d" "    -f1 access_log   |   sort -n  |  uniq -c  | sort -nr | head | tr -s ' ' | cut -d " " -f3 192.168.56.3 192.168.27.6 192.168.0.100

 

 

 

 

 

 

Example 2: Statistical connections


ss -nt query a remote IP most of the first three concurrent IP connections

[root@centos72 ~]# ss -tn
State      Recv-Q Send-Q       Local Address:Port                      Peer Address:Port              
ESTAB      0      52          192.168.137.72:22 192.168.137.1:57568 ESTAB 0 0 192.168.137.72:22 192.168.137.1:58228

 

 

 

 

 Remove the extra spaces, and is replaced with a colon delimiter

[root@centos72 ~]# ss -tn  | grep  ESTAB |    tr  -s  ' ' : 
ESTAB:0:52:192.168.137.72:22:192.168.137.1:57568: ESTAB:0:0:192.168.137.72:22:192.168.137.1:58228:

 

 

 

 

 

Colon as a delimiter, taking the results of the sixth field

[root@centos72 ~]# ss -tn  | grep  ESTAB |    tr  -s  ' ' : | cut  -d:  -f6 192.168.137.1 192.168.137.1

 

 

 

 

 

Sort numerically

[root@centos72 ~]# ss -tn  | grep  ESTAB |    tr  -s  ' ' : | cut  -d:  -f6  |  sort -n 192.168.137.1 192.168.137.1

 

 

 

 

 

[root@centos72 ~]# ss -tn  | grep  ESTAB |    tr  -s  ' ' : | cut  -d:  -f6  |  sort -n | uniq 192.168.137.1

 

 

 

 

 

Each line displays the number of recurring

[root@centos72 ~]# ss -tn  | grep  ESTAB |    tr  -s  ' ' : | cut  -d:  -f6  |  sort -n | uniq -c 2 192.168.137.1

 

Guess you like

Origin www.cnblogs.com/wang618/p/11063855.html
Recommended