A useful data processing tool awk (detailed enough!)

Useful data processing tool awk

One, awk editor

1. Working principle:

Read text line by line, separated by space or tab by default, save the separated fields to built-in variables, and execute editing commands according to the mode or condition.

The sed command is often used to process a whole line, while awk tends to divide a line into multiple "fields" and then process it. The reading of awk information is also read line by line, and the execution result can be printed and displayed with the field data through the print function. In the process of using the awk command, you can use the logical operators "&&" to indicate "and", "||" to indicate "or", "!" to indicate "not"; you can also perform simple mathematical operations, such as +,- , *, /, %, ^ represent addition, subtraction, multiplication, division, remainder and power respectively.

2. Command format:

awk  选项   '模式或条件 {操作}'  文件1   文件2...
awk  -f   脚本文件   文件1    文件2...

3.The common built-in variables of awk (can be used directly) are as follows:

FS: Column separator. Specify the field separator for each line of text, the default is a space or a tab stop. It has the same effect as "-F".
NF: The number of fields in the row currently being processed.
NR: Line number (ordinal number) of the
currently processed line $0: The entire line of the currently processed line.
$n: The nth field (column n) of the currently processed line
FILENAME: The name of the file being processed.
RS: Line separator. When awk reads data from a file, it will cut the data into many records according to the definition of RS, while awk only reads one record at a time for processing. The default value is'\n'

Two, how to use awk

1. Output text by line

①Output all content

awk  '{print}'  file
awk '{print $0}’ file
[root@localhost ~] # awk  '{print}'  shuzi.txt 
one
two
three
four
five
[root@localhost ~] # awk  '{print $0}'  shuzi.txt 
one
two
three
four
five

②Output the content of the specified line

awk 'NR==1, NR==3{print}'  file  #输出第1-3行内容
awk '(NR>=1) && (NR<=3) {print}' file #输出第1-3行内容

awk 'NR==1 || NR==3 {print}' file #输出第1行、第3行内容

[root@localhost ~] # awk 'NR==1, NR==3{print}' shuzi.txt 
one
two
three
[root@localhost ~] # awk '(NR>=1) && (NR<=3)  {print}' shuzi.txt 
one
two
three
[root@localhost ~] # awk  'NR==1 || NR==3 {print}' shuzi.txt 
one
three

③Output the content of odd or even lines

awk '(NR%2)==1{print}'  file #除以2余数为1为奇数,输出奇数行
awk '(NR%2)==0{print}'  file #除以2余数为0为偶数,输出偶数行
[root@localhost ~] # awk '(NR%2)==1{print}' shuzi.txt 
one
three
five
[root@localhost ~] # awk '(NR%2)==0{print}' shuzi.txt 
two
four

④ Output the content of the line starting or ending with the specified character string

awk '/^root/{print}' /etc/passwd #输出以root开头的内容
awk '/nologin$/{print}' /etc/passwd #输出以nologin结尾的内容
[root@localhost ~] # awk '/^root/{print}' /etc/passwd
root:x:0:0:root:/root:/bin/bash
[root@localhost ~] # awk '/nologin$/{print}' /etc/passwd
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
...

⑤ Count the number of rows related to the specified character string

Note: BEGIN mode means: before processing the specified text, you need to perform the action specified in the BEGIN mode; awk processes the specified text, and then executes the action specified in the END mode. In the END{} statement block, it is often put Input statements such as printing results.

awk 'BEGIN {x=0};/\/bin\/bash$/{x++};END {print x}' /etc/passwd
#统计以/bin/bash结尾的行数,等同于
grep  -c "/bin/bash$"  /etc/passwd
[root@localhost ~] # awk 'BEGIN {x=0};/\/bin\/bash$/{x++};END {print x}' /etc/passwd
2
[root@localhost ~] # grep  -c "/bin/bash$"  /etc/passwd
2
[root@localhost ~] # grep "/bin/bash$"  /etc/passwd
root:x:0:0:root:/root:/bin/bash
muhonghuan:x:1000:1000:muhonghuan:/home/muhonghuan:/bin/bash

2. Specify the field to output text

#输出每行中(以”:“分割的)的第三个字段
awk -F ":" '{print $3}' /etc/passwd          

#输出每行中(以”:“分割的)的第1,3个字段
awk -F ":" '{print $1,$3}' /etc/passwd      

#输出第三个字段的值小于5的行的第1,3个字段
awk -F ":" '$3<5{print $1,$3}' /etc/passwd   

#输出第三个字段的值不小于200的行的内容
awk -F ":" '!($3<200){print}' /etc/passwd    
awk 'BEGIN {FS=":"};{if ($3>=200){print}}'  /etc/passwd   

#($3>$4)?$3:$4是三元运算符,如果第三个字段的值大于第四个字段的值,则把第三个字段的值赋给max,否则把第四个字段的值赋给max
awk -F ":" '{max=($3>$4)?$3:$4;{print max}}' /etc/passwd

#输出内行的内容和行号,每处理完一条记录,NR值(当前处理的行的行号)加1
awk -F ":" '{print NR,$0}' /etc/passwd        

#输出以冒号分隔且第7个字段中包含/bash的行的第1个字段
awk -F ":" '$7~"/bash"{print $1}' /etc/passwd    

#输出第1个字段中包含root且有7个字段的行的第1、2个字段(NF:当前处理的行的字段个数)
awk -F ":"  '($1~"root")&&(NF==7){print $1,$2}' 
/etc/passwd    

#输出第7个字段不为/bin/bash,也不为/sbin/nologin的所有行
awk -F ":" '($7!="/bin/bash")&&($7!="/sbin/nologin"){print}' /etc/passwd  

3. Invoke Shell commands through pipes and double quotes

① Count the number of text paragraphs separated by colons. In the END{} statement block, statements such as printing results are often placed

[root@localhost ~] # echo $PATH | awk 'BEGIN{RS=":"};END{print NR}'
5
[root@localhost ~] # echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

②Count the number of users using bash

[root@localhost ~] # awk -F ":" '/bash$/{print | "wc -l"}' /etc/passwd
2
[root@localhost ~] # grep -c "bash$" /etc/passwd
2

③View the proportion of memory usage

First check through free -m, and then use Mem (the value of the third field divided by (the value of the third field + the value of the fourth field)) and multiply 100 to get the memory usage ratio

[root@localhost ~] # free -m
              total        used        free      shared  buff/cache   available
Mem:           1823         278        1145           9         398        1336
Swap:          4095           0        4095
[root@localhost ~] # free -m | awk '/Mem:/ {print int($3/($3+$4)*100)}'
19

④Check the CPU idle rate

Because top is a dynamic view, (-b -n 1) means that the output result is only needed once. The
whole sentence command means: dynamically output the result of one process (top -b -n 1); filter out the Cpu line (grep Cpu); separated by commas, print out the fourth column (awk -F',''{print $4}'); then print out the first value of the filtered fourth column (awk'{print $1}' )

[root@localhost ~] # top -b -n 1 | grep Cpu | awk -F ',' '{print $4}' | awk '{print $1}'
100.0
[root@localhost ~] # top -b -n 1
top - 20:54:28 up  8:28,  2 users,  load average: 0.00, 0.01, 0.05
Tasks: 139 total,   1 running, 138 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  6.2 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

⑤Check the last system restart time

#显示上次系统重启时间,等同于uptime;second ago为显示多少秒前的时间,+"%F %H:%M:%S"等同于+"%Y-%m-%d %H:%M:%S"的时间格式
date -d "$(awk -F "." '{print $1}' /proc/uptime) second ago" +"%F %H:%M:%S"

[root@localhost ~] # uptime
 20:57:15 up  8:31,  1 user,  load average: 0.00, 0.01, 0.05
[root@localhost ~] # date -d "$(awk -F "." '{print $1}' /proc/uptime) second ago" +"%F %H:%M:%S"
2021-01-03 12:25:41

⑥Call the w command and use it to count the number of online users

The first two lines are only for information display, not for online users, so subtract 2 when counting

[root@localhost ~] # awk 'BEGIN {while ("w" | getline) n++ ; {print n-2}"%"}'
1
[root@localhost ~] # w
 20:59:45 up  8:34,  1 user,  load average: 0.00, 0.01, 0.05
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    192.168.2.1      19:57    1.00s  0.06s  0.00s w

⑦Call hostname and output the current hostname

[root@localhost ~] # hostname
localhost.localdomain
[root@localhost ~] # awk 'BEGIN {"hostname" | getline ; {print $0}}'
localhost.localdomain

⑧View cpu usage rate

cpu_us='top -b -n 1 | grep Cpu | awk '(print $2}'
cpu_sy='top -b -n 1 | grep Cpu | awk -F ','  '{print $2}'  | awk  '{print $1}'  
cpu_sum=$ ( ($cpu_us+$cpu_sy))
echo $cpu_sum

Supplement: getline

  • When there is no redirection character "<" or "|" on the left and right of getline, getline acts on the current file, and reads the first line of the current file to the variable var or $0 followed by it; because awk has already read a line before processing the getline , So the return result of getline is interlaced.
  • When there are redirection characters "<" or "|" on the left and right of getline, getline acts on the directional input file. Since the file is just opened and has not been read into a line by awk, it is only read by getline, then getline returns this The first line of the file, not every other line.
[root@localhost ~] # cat shuzi.txt 
one
two
three
four
five
six
seven
eight
nine
ten

#输出奇数行
awk读取第一行,print打印第一行,getline获取第二行;
awk读取第三行,print打印第三行,getline获取第四行;
awk读取第五行,print打印第五行,getline获取第六行;
以此类推
[root@localhost ~] # seq 10 | awk '{print $0; getline}' shuzi.txt 
one
three
five
seven
nine

#输出偶数行
awk读取第一行,getline获取第二行,print打印第二行;
awk读取第三行,getline获取第四行,print打印第四行;
awk读取第五行,getline获取第六行,print打印第六行;
以此类推
[root@localhost ~] # seq 10 | awk '{getline; print $0}' shuzi.txt 
two
four
six
eight
ten

Guess you like

Origin blog.csdn.net/qq_35456705/article/details/112155852