[shell] basic usage of awk



1. Overview and usage scenarios of awk

  • Awk is a programming language, mainly used to process text and data under linux/unix, and is a tool under linux/unix. Data can come from standard input, one or more files, or the output of other commands.

  • Awk's way of processing text and data: scan the file line by line, from the first line to the last line by default, look for lines that match a specific pattern, and perform the operations you want on these lines.

  • awk stands for the first letter of its author's last name respectively. Because its authors are three people, namely Alfred Aho, Brian Kernighan, and Peter Weinberger.

  • gawk is the GNU version of awk, which provides some extensions from Bell Labs and GNU.

[wqf@b1i10 test]$ which awk
/usr/bin/awk
[wqf@b1i10 test]$ ll /usr/bin/awk
lrwxrwxrwx. 1 root root 4 Sep 21  2021 /usr/bin/awk -> gawk
  • The awk introduced below is based on GNU's gawk as an example. In the Linux system, awk has been linked to gawk, so the following is all introduced with awk.

2. How to use awk

1. Using the command line mode

Grammatical structures:

awk 选项 '命令部分' 文件名
特别说明:命令部分用单引号,假如我在这个命令部分引用shell变量需用双引号引起 

Common options:

-F 定义字段分割符号,默认的分隔符是空格
-v 定义变量并赋值

'command section' Description:

  • Regular expression: Regular matching should be enclosed in //, expressed as /regular expression/
'/root/{awk语句}'                
'NR==1,NR==5{awk语句}'        
'/^root/,/^ftp/{awk语句}' 
  • {awk statement 1; awk statement 2;...}
'{print $0;print $1}'  
'NR==5{print $0}' 
注:awk命令语句间用分号间隔
  • BEGIN…END…
    • BEGIN: means to execute before the program starts
    • END: Indicates that all files will be executed after processing
'BEGIN{awk语句};{处理中};END{awk语句}' 
'BEGIN{awk语句};{处理中}' 
'{处理中};END{awk语句}'

2. Use in script mode (awk -f awkScript.sh file)

Scripting:

#!/bin/awk -f         定义魔法字符
以下是awk引号里的命令清单,不要用引号保护命令,多个命令用分号间隔
BEGIN{
    
    FS=":"}
NR==1,NR==3{
    
    print $1"\t"$NF}

Script execution:

方法1:
awk 选项 -f awk的脚本文件  要处理的文本文件
awk -f awk.sh filename

sed -f sed.sh -i filename

方法2:
./awk的脚本文件(或者绝对路径)    要处理的文本文件
./awk.sh filename

./sed.sh filename

Three, awk internal related variables

variable variable specification Remark
$0 All records of the currently processed row
1,2,3…n The different fields in each line of the file separated by spacers awk -F: ‘{print 1,3}’
NF The number of fields (columns) of the current record awk -F: ‘{print NF}’
$NF last row $(NF-1) indicates the penultimate column
FNR/NR line number
FS define spacer ‘BEGIN{FS=“:”};{print 1,3}’
OFS Define output field separator, default space ‘BEGIN{OFS=“\t”};print 1,3}’
RS Enter record separator, newline by default ‘BEGIN{RS=“\t”};{print $0}’
ORS output record separator, newline by default ‘BEGIN{ORS=“\n\n”};{print 1,3}’
FILENAME current input filename

Examples of common variable separators

Document preparation:

[wqf@b1i10 rm_test]$ cat -n 1.txt
     1  root:x:0:0:root:/roots:/bin/bash
     2  bin:x:1:1:bin:/bins:/sbin/nologin
     3  daemon123:x:2:2:daemon:/sbins:/sbin/nologin
     4  adm:x:3:4:adm:/var/adms:/sbin/nologin
     5  lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin
     6  298374837483
     7  172.16.0.254
     8  10.1.1.1
## 打印所有行数
[wqf@b1i10 rm_test]$ awk '{print $0}' 1.txt 
root:x:0:0:root:/roots:/bin/bash
bin:x:1:1:bin:/bins:/sbin/nologin
daemon123:x:2:2:daemon:/sbins:/sbin/nologin
adm:x:3:4:adm:/var/adms:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin
298374837483
172.16.0.254
10.1.1.1

## 打印第1行至第5行
[wqf@b1i10 rm_test]$ awk 'NR==1,NR==5{print $0}' 1.txt 
root:x:0:0:root:/roots:/bin/bash
bin:x:1:1:bin:/bins:/sbin/nologin
daemon123:x:2:2:daemon:/sbins:/sbin/nologin
adm:x:3:4:adm:/var/adms:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin
[wqf@b1i10 rm_test]$ awk 'NR==1||NR==5{print $0}' 1.txt
root:x:0:0:root:/roots:/bin/bash
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin

## 打印第2行至第5行
[wqf@b1i10 rm_test]$ awk 'NR>=2&&NR<=5{print $0}' 1.txt
bin:x:1:1:bin:/bins:/sbin/nologin
daemon123:x:2:2:daemon:/sbins:/sbin/nologin
adm:x:3:4:adm:/var/adms:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin

## 打印每一行的第一列和最后一列,用:分隔开。
[wqf@b1i10 rm_test]$ awk -F: '{print $1,$NF}' 1.txt
root /bin/bash
bin /sbin/nologin
daemon123 /sbin/nologin
adm /sbin/nologin
lp /sbin/nologin
298374837483 298374837483
172.16.0.254 172.16.0.254
10.1.1.1 10.1.1.1

## 打印当前所有行的列数(以:为分隔符)
[wqf@b1i10 rm_test]$ awk -F: '{print NF}' 1.txt
7
7
7
7
7
1
1
1

## 打印每一行的第一列和倒数一列,用:分隔开。
[wqf@b1i10 rm_test]$ awk -F: '{print $1,$(NF-1)}' 1.txt
root /roots
bin /bins
daemon123 /sbins
adm /var/adms
lp /var/spool/lpds
298374837483 298374837483
172.16.0.254 172.16.0.254
10.1.1.1 10.1.1.1

## 打印每一行的第一列、倒数一列、最后一列和列数,用:分隔开。
[wqf@b1i10 rm_test]$ awk -F: '{print $1,$(NF-1),$NF,NF}' 1.txt
root /roots /bin/bash 7
bin /bins /sbin/nologin 7
daemon123 /sbins /sbin/nologin 7
adm /var/adms /sbin/nologin 7
lp /var/spool/lpds /sbin/nologin 7
298374837483 298374837483 298374837483 1
172.16.0.254 172.16.0.254 172.16.0.254 1
10.1.1.1 10.1.1.1 10.1.1.1 1

## 打印含有root字符串的行
[wqf@b1i10 rm_test]$ awk '/root/{print $0}' 1.txt
root:x:0:0:root:/roots:/bin/bash

[wqf@b1i10 rm_test]$ awk '/root/' 1.txt
root:x:0:0:root:/roots:/bin/bash

## 打印含有root字符串的行的第一列和最后一列
[wqf@b1i10 rm_test]$ awk -F: '/root/{print $1,$NF}' 1.txt
root /bin/bash

## awk命令语句间用分号间隔
[wqf@b1i10 rm_test]$ awk 'NR==1,NR==5;/^root/{print $0}' 1.txt 
root:x:0:0:root:/roots:/bin/bash ## 符合命令 NR==1,NR==5
root:x:0:0:root:/roots:/bin/bash ## 符合命令 /^root/{print $0}
bin:x:1:1:bin:/bins:/sbin/nologin ## 符合命令 NR==1,NR==5
daemon123:x:2:2:daemon:/sbins:/sbin/nologin ## 符合命令 NR==1,NR==5
adm:x:3:4:adm:/var/adms:/sbin/nologin ## 符合命令 NR==1,NR==5
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin ## 符合命令 NR==1,NR==5

## ~是匹配 !~不匹配
## 打印第一行至第五行且以root开头的行(输出所有列)
[wqf@b1i10 rm_test]$ awk 'NR>=1 && NR<=5 && $0 ~/^root/{print $0}' 1.txt 
root:x:0:0:root:/roots:/bin/bash        hello   world

## 打印第一行至第五行且不以root开头的行
[wqf@b1i10 rm_test]$ awk 'NR>=1 && NR<=5 && $0 !~/^root/{print $0}' 1.txt 
ibin:x:1:1:bin:/bins:/sbin/nologin      heima
daemon123:x:2:2:daemon:/sbins:/sbin/nologin
adm:x:3:4:adm:/var/adms:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin

Examples of built-in variable separators

FS: Column separator when reading files; OFS: Column separator when outputting files

## 默认输出字段分隔符为空格
[wqf@b1i10 rm_test]$ awk -F: '{print $1,$NF}' 1.txt 
root /bin/bash
bin /sbin/nologin
daemon123 /sbin/nologin
adm /sbin/nologin
lp /sbin/nologin
298374837483 298374837483
172.16.0.254 172.16.0.254
10.1.1.1 10.1.1.1

## 定义输出字段分隔符为@@@
[wqf@b1i10 rm_test]$ awk -F: 'BEGIN{OFS="@@@"};{print $1,$NF}' 1.txt
root@@@/bin/bash
bin@@@/sbin/nologin
daemon123@@@/sbin/nologin
adm@@@/sbin/nologin
lp@@@/sbin/nologin
298374837483@@@298374837483
172.16.0.254@@@172.16.0.254
10.1.1.1@@@10.1.1.1

## 定义间隔符为:,输出字段分隔符为@@@
[wqf@b1i10 rm_test]$ awk 'BEGIN{FS=":";OFS="@@@"};{print $1,$NF}' 1.txt
root@@@/bin/bash
bin@@@/sbin/nologin
daemon123@@@/sbin/nologin
adm@@@/sbin/nologin
lp@@@/sbin/nologin
298374837483@@@298374837483
172.16.0.254@@@172.16.0.254
10.1.1.1@@@10.1.1.1

## 可以双引号引用字符串输出
[wqf@b1i10 rm_test]$  awk 'BEGIN{FS=":"};{print $1,"******"$NF}' 1.txt
root ******/bin/bash
bin ******/sbin/nologin
daemon123 ******/sbin/nologin
adm ******/sbin/nologin
lp ******/sbin/nologin
298374837483 ******298374837483
172.16.0.254 ******172.16.0.254
10.1.1.1 ******10.1.1.1

[wqf@b1i10 rm_test]$  awk 'BEGIN{FS=":"};{print "用户名是:"$1,"******解析器:"$NF}' 1.txt
用户名是:root ******解析器:/bin/bash
用户名是:bin ******解析器:/sbin/nologin
用户名是:daemon123 ******解析器:/sbin/nologin
用户名是:adm ******解析器:/sbin/nologin
用户名是:lp ******解析器:/sbin/nologin
用户名是:298374837483 ******解析器:298374837483
用户名是:172.16.0.254 ******解析器:172.16.0.254
用户名是:10.1.1.1 ******解析器:10.1.1.1

RS: the line separator when reading the file; ORS: the line separator when outputting the file

File preparation: modify the first 2 lines of the source file to add tabs and content

[wqf@b1i10 rm_test]$ cat 1.txt
root:x:0:0:root:/roots:/bin/bash        hello   world
ibin:x:1:1:bin:/bins:/sbin/nologin      heima
daemon123:x:2:2:daemon:/sbins:/sbin/nologin
adm:x:3:4:adm:/var/adms:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin
298374837483
172.16.0.254
10.1.1.1
## 读取文件时的行分隔符为制表符\t
[wqf@b1i10 rm_test]$ awk -F: 'BEGIN{RS="\t"};{print $1}' 1.txt
root
hello
world ## world和ibin其实也是同一行,以:分列,取第一列的字段
ibin
heima
daemon123 ## heima和daemon123 其实是同一行,以:分列,取第一列的字段

## 输出文件时的行分隔符为制表符\t
[wqf@b1i10 rm_test]$ awk -F: 'BEGIN{ORS="\t"};{print $1}' 1.txt
root    ibin    daemon123       adm     lp      298374837483    172.16.0.254    10.1.1.1 

4. Advanced application of awk

1. Example of formatted output print and printf

%s 字符类型 strings %-20s
%d 数值类型
-  表示左对齐,默认是右对齐
printf 默认不会在行尾自动换行,加\n

print function: similar to echo "hello world"

[wqf@b1i10 ~]$ date|awk '{print "Month:"$2"\nyear:"$NF}'
Month:May
year:2023

printf function: similar to echo -n

## 限制字符串长度并且左对齐
[wqf@b1i10 rm_test]$ awk -F: '{printf "%-15s %-10s %-15s\n", $1,$2,$3}'  3.txt
[wqf@b1i10 rm_test]$ awk 'BEGIN{FS=":"};{printf "%-15s %-10s %-15s\n", $1,$2,$3}'  3.txt
root            x          0              
bin             x          1              
daemon          x          2              
adm             x          3              
lp              x          4              
sync            x          5              
shutdown        x          6              
halt            x          7              
mail            x          8              
operator        x          11             
games           x          12             
ftp             x          14             
nobody          x          99             
systemd-network x          192            
dbus            x          81             
polkitd         x          999            
apache          x          48             
unbound         x          998            
libstoragemgmt  x          997            
colord          x          996      

## 限制字符串长度并且加入|      
[wqf@b1i10 rm_test]$ awk -F: '{printf "|%15s| %10s| %15s|\n", $1,$2,$3}' 3.txt
|           root|          x|               0|
|            bin|          x|               1|
|         daemon|          x|               2|
|            adm|          x|               3|
|             lp|          x|               4|
|           sync|          x|               5|
|       shutdown|          x|               6|
|           halt|          x|               7|
|           mail|          x|               8|
|       operator|          x|              11|
|          games|          x|              12|
|            ftp|          x|              14|
|         nobody|          x|              99|
|systemd-network|          x|             192|
|           dbus|          x|              81|
|        polkitd|          x|             999|
|         apache|          x|              48|
|        unbound|          x|             998|
| libstoragemgmt|          x|             997|
|         colord|          x|             996|

## 限制字符串长度并且左对齐,加入|
[wqf@b1i10 rm_test]$ awk -F: '{printf "|%-15s| %-10s| %-15s|\n", $1,$2,$3}' 3.txt
|root           | x         | 0              |
|bin            | x         | 1              |
|daemon         | x         | 2              |
|adm            | x         | 3              |
|lp             | x         | 4              |
|sync           | x         | 5              |
|shutdown       | x         | 6              |
|halt           | x         | 7              |
|mail           | x         | 8              |
|operator       | x         | 11             |
|games          | x         | 12             |
|ftp            | x         | 14             |
|nobody         | x         | 99             |
|systemd-network| x         | 192            |
|dbus           | x         | 81             |
|polkitd        | x         | 999            |
|apache         | x         | 48             |
|unbound        | x         | 998            |
|libstoragemgmt | x         | 997            |
|colord         | x         | 996            |

2. Awk variable definition

  • Variables defined by calls in awk do not need to add $
## 对比可知加入$不能如期赋值变量
[wqf@b1i10 rm_test]$ awk -v NUM=3 -F: '{print $NUM}' 1.txt
0
1
2
3
4




[wqf@b1i10 rm_test]$ awk -v NUM=3 -F: '{print NUM}' 1.txt
3
3
3
3
3
3
3
3

[wqf@b1i10 rm_test]$ awk -v num=1 'BEGIN{print num}' 
1
[wqf@b1i10 rm_test]$ awk -v num=1 'BEGIN{print $num}' 

3. Use of BEGIN...END in awk

example

[wqf@b1i10 rm_test]$ awk -F: 'BEGIN{print "Login_shell\t\tLogin_home\n*******************"};{print $NF"\t\t"$(NF-1)};END{print "************************"}' 3.txt
Login_shell             Login_home
*******************
/bin/bash               /root
/sbin/nologin           /bin
/sbin/nologin           /sbin
/sbin/nologin           /var/adm
/sbin/nologin           /var/spool/lpd
/bin/sync               /sbin
/sbin/shutdown          /sbin
/sbin/halt              /sbin
/sbin/nologin           /var/spool/mail
/sbin/nologin           /root
/sbin/nologin           /usr/games
/sbin/nologin           /var/ftp
/sbin/nologin           /
/sbin/nologin           /
/sbin/nologin           /
/sbin/nologin           /
/sbin/nologin           /usr/share/httpd
/sbin/nologin           /etc/unbound
/sbin/nologin           /var/run/lsm
/sbin/nologin           /var/lib/colord
************************

[wqf@b1i10 rm_test]$ awk 'BEGIN{FS=":";print "Login_shell\tLogin_home\n*******************"};{print $NF"\t"$(NF-1)};END{print "************************"}' 3.txt
Login_shell     Login_home
*******************
/bin/bash       /root
/sbin/nologin   /bin
/sbin/nologin   /sbin
/sbin/nologin   /var/adm
/sbin/nologin   /var/spool/lpd
/bin/sync       /sbin
/sbin/shutdown  /sbin
/sbin/halt      /sbin
/sbin/nologin   /var/spool/mail
/sbin/nologin   /root
/sbin/nologin   /usr/games
/sbin/nologin   /var/ftp
/sbin/nologin   /
/sbin/nologin   /
/sbin/nologin   /
/sbin/nologin   /
/sbin/nologin   /usr/share/httpd
/sbin/nologin   /etc/unbound
/sbin/nologin   /var/run/lsm
/sbin/nologin   /var/lib/colord
************************

[wqf@b1i10 rm_test]$ awk -F: 'BEGIN{OFS="\t\t";print"u_name\t\th_dir\t\tshell\n***************************"};{printf "%-20s %-20s %-20s\n",$1,$(NF-1),$NF};END{print "****************************"}' 3.txt
[wqf@b1i10 rm_test]$ awk -F: 'BEGIN{print "u_name\t\th_dir\t\tshell" RS "*****************"}  {printf "%-15s %-20s %-20s\n",$1,$(NF-1),$NF}END{print "***************************"}' 3.txt
u_name          h_dir           shell
*****************
root            /root                /bin/bash           
bin             /bin                 /sbin/nologin       
daemon          /sbin                /sbin/nologin       
adm             /var/adm             /sbin/nologin       
lp              /var/spool/lpd       /sbin/nologin       
sync            /sbin                /bin/sync           
shutdown        /sbin                /sbin/shutdown      
halt            /sbin                /sbin/halt          
mail            /var/spool/mail      /sbin/nologin       
operator        /root                /sbin/nologin       
games           /usr/games           /sbin/nologin       
ftp             /var/ftp             /sbin/nologin       
nobody          /                    /sbin/nologin       
systemd-network /                    /sbin/nologin       
dbus            /                    /sbin/nologin       
polkitd         /                    /sbin/nologin       
apache          /usr/share/httpd     /sbin/nologin       
unbound         /etc/unbound         /sbin/nologin       
libstoragemgmt  /var/run/lsm         /sbin/nologin       
colord          /var/lib/colord      /sbin/nologin       
***************************

4. Comprehensive application of awk and regular expressions

operator illustrate
== equal
!= not equal to
> more than the
< less than
>= greater or equal to
<= less than or equal to
~ match
!~ Mismatch
! logical NOT
&& logic and
\ \

example

## 打印 从第一行开始匹配到以lp开头行
[wqf@b1i10 rm_test]$ awk -F: 'NR==1,/^lp/{print $0}' 1.txt
root:x:0:0:root:/roots:/bin/bash        hello   world
ibin:x:1:1:bin:/bins:/sbin/nologin      heima
daemon123:x:2:2:daemon:/sbins:/sbin/nologin
adm:x:3:4:adm:/var/adms:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin

## 打印 从第一行到第5行          
[wqf@b1i10 rm_test]$ awk -F: 'NR==1,NR==5{print $0}'  1.txt
root:x:0:0:root:/roots:/bin/bash        hello   world
ibin:x:1:1:bin:/bins:/sbin/nologin      heima
daemon123:x:2:2:daemon:/sbins:/sbin/nologin
adm:x:3:4:adm:/var/adms:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin

## 打印 从lp开头的行匹配到第10行 
[wqf@b1i10 rm_test]$ awk -F: '/^lp/,NR==10{print $0}' 1.txt
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin
298374837483
172.16.0.254
10.1.1.1

## 打印 从root开头的行匹配到以lp开头的行       
[wqf@b1i10 rm_test]$ awk -F: '/^root/,/^lp/{print $0}' 1.txt
root:x:0:0:root:/roots:/bin/bash        hello   world
ibin:x:1:1:bin:/bins:/sbin/nologin      heima
daemon123:x:2:2:daemon:/sbins:/sbin/nologin
adm:x:3:4:adm:/var/adms:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin


## 打印 以root开头或者以lp开头的行            
[wqf@b1i10 rm_test]$ awk -F: '/^root/ || /^lp/{print $0}' 1.txt
root:x:0:0:root:/roots:/bin/bash        hello   world
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin

[wqf@b1i10 rm_test]$ awk -F: '/^root/;/^lp/{print $0}' 1.txt
root:x:0:0:root:/roots:/bin/bash        hello   world
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin

## 打印以root开头的第三列(以:为分隔符)或者以lp开头的行 
[wqf@b1i10 rm_test]$ awk -F: '/^root/{print $1};/^lp/{print $0}' 1.txt
root
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin

## 显示5-10行   
[wqf@b1i10 rm_test]$ awk -F':' 'NR>=5 && NR<=10 {print $0}' 3.txt  
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin 

##打印1-39行且第一列开始匹配以bash结尾的内容,输出整行(当前行所有的列)
[wqf@b1i10 rm_test]$ awk 'NR>=1 && NR<=39 && $0 ~ /bash$/{print $0}' 2.txt
root:x:0:0:root:/root:/bin/bash

##打印1-5行以root开头的内容
[wqf@b1i10 rm_test]$ awk 'NR>=1 && NR<=5 && $0 ~ /^root/{print $0}' 1.txt
root:x:0:0:root:/roots:/bin/bash        hello   world

## 打印文件中1-5并且不以root开头的行
[wqf@b1i10 rm_test]$ awk 'NR>=1 && NR<=5 && $0 !~ /^root/{print $0}' 1.txt
ibin:x:1:1:bin:/bins:/sbin/nologin      heima
daemon123:x:2:2:daemon:/sbins:/sbin/nologin
adm:x:3:4:adm:/var/adms:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin

## 打印以bash结尾的行
[wqf@b1i10 rm_test]$ awk '/bash$/{print $0}'    3.txt
root:x:0:0:root:/root:/bin/bash

[wqf@b1i10 rm_test]$ awk '/bash$/' 3.txt
root:x:0:0:root:/root:/bin/bash

## 从第7列匹配以bash结尾,输出整行(当前行所有的列)
[wqf@b1i10 rm_test]$ awk -F: '$7 ~ /bash/' 1.txt
root:x:0:0:root:/roots:/bin/bash

## 打印可以登录系统的用户名
[wqf@b1i10 rm_test]$ awk -F: '$0 ~ /\/bin\/bash/{print $1}' 1.txt
root

Understand the meaning of ; and ||:

[wqf@b1i10 rm_test]$ awk 'NR>=3 && NR<=8 ; /bash$/' 1.txt
root:x:0:0:root:/roots:/bin/bash
daemon123:x:2:2:daemon:/sbins:/sbin/nologin
adm:x:3:4:adm:/var/adms:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin
298374837483
172.16.0.254
10.1.1.1

[wqf@b1i10 rm_test]$ awk 'NR>=3 && NR<=8 || /bash$/' 1.txt
root:x:0:0:root:/roots:/bin/bash
daemon123:x:2:2:daemon:/sbins:/sbin/nologin
adm:x:3:4:adm:/var/adms:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpds:/sbin/nologin
298374837483
172.16.0.254
10.1.1.1

5. Awk script programming

1) Flow control statement
a) if structure

Grammatical structures:

if语句:

if [ xxx ];then
xxx
fi

awk 选项 '正则,地址定位{ if(表达式){语句1;语句2;...}}'  文件名

Example:

[wqf@b1i10 rm_test]$ awk -F: '{if($3>=500 && $3<=60000) {print $1,$3}}' 3.txt
polkitd 999
unbound 998
libstoragemgmt 997
colord 996

[wqf@b1i10 rm_test]$  awk -F: '{if($3==0) {print $1"是管理员"}}' 3.txt
root是管理员

[wqf@b1i10 rm_test]$ awk 'BEGIN{if('$(id -u)'==0) {print "admin"}}' 

b) if...else structure

Grammatical structures:

if...else语句:
if [ xxx ];then
    xxxxx

else
    xxx
fi

awk 选项 '正则,地址定位 {if(表达式){语句;语句;...}else{语句;语句;...}}'  

Example:

[wqf@b1i10 rm_test]$ awk -F: '{if($3>=500 && $3 != 65534) {print $1"是普通用户"} else {print $1,"不是普通用户"}}' 3.txt 
root 不是普通用户
bin 不是普通用户
daemon 不是普通用户
adm 不是普通用户
lp 不是普通用户
sync 不是普通用户
shutdown 不是普通用户
halt 不是普通用户
mail 不是普通用户
operator 不是普通用户
games 不是普通用户
ftp 不是普通用户
nobody 不是普通用户
systemd-network 不是普通用户
dbus 不是普通用户
polkitd是普通用户
apache 不是普通用户
unbound是普通用户
libstoragemgmt是普通用户
colord是普通用户

[wqf@b1i10 rm_test]$ awk 'BEGIN{if( '$(id -u)'>=500 && '$(id -u)' !=65534 ) {print "是普通用户"} else {print "不是普通用户"}}'
是普通用户

c) if...elif...else structure

Grammatical structures:

if...elif...else语句:
if [xxxx];then
    xxxx
elif [xxx];then
    xxx
....
else
...
fi

awk 选项 '正则,地址定位 { if(表达式1){语句;语句;...}else if(表达式2){语句;语句;...}else if(表达式3){语句;语句;...}else{语句;语句;...}}'  

Example:

[wqf@b1i10 rm_test]$ awk -F: '{ if($3==0) {print $1,":是管理员"} else if($3>=1 && $3<=499 || $3==65534 ) {print $1,":是 系统用户"} else {print $1,":是普通用户"}}' 3.txt
root :是管理员
bin :是系统用户
daemon :是系统用户
adm :是系统用户
lp :是系统用户
sync :是系统用户
shutdown :是系统用户
halt :是系统用户
mail :是系统用户
operator :是系统用户
games :是系统用户
ftp :是系统用户
nobody :是系统用户
systemd-network :是系统用户
dbus :是系统用户
polkitd :是普通用户
apache :是系统用户
unbound :是普通用户
libstoragemgmt :是普通用户
colord :是普通用户


[wqf@b1i10 rm_test]$  awk -F: '{if($3==0) {i++} else if($3>=1 && $3<=499 || $3==65534 ) {j++} else {k++}};END{print "管 理员个数为:"i "\n系统用户个数为:"j"\n普通用户的个数为:"k }' 3.txt
管理员个数为:1
系统用户个数为:15
普通用户的个数为:4


[wqf@b1i10 rm_test]$ awk -F: '{if($3==0){i++} else if($3>999){k++} else{j++}} END{print "管理员个数: "i; print "普通用个数: "k; print "系统用户: "j}' /etc/passwd 
管理员个数: 1
普通用个数: 134
系统用户: 45 

## 如果是普通用户打印默认shell,如果是系统用户打印用户名
[wqf@b1i10 rm_test]$ awk -F: '{if($3>=1 && $3<500 || $3 == 65534) {print $1} else if($3>=500 && $3<=60000 ) {print $NF}}' /etc/passwd

2) Loop statement
a) for loop

for ((i=1;i<=5;i++));do echo $i;done

## 打印1~5
[wqf@b1i10 rm_test]$ awk 'BEGIN {for(i=1;i<=5;i++) {print i}}'
1
2
3
4
5

## 求1~10中的奇数和
[wqf@b1i10 rm_test]$ for ((i=1;i<=10;i+=2));do echo $i;done|awk '{sum+=$0};END{print sum}'
25

## 打印1~10中的奇数
[wqf@b1i10 rm_test]$ awk 'BEGIN{for(i=1;i<=10;i+=2) {print i}}'
[wqf@b1i10 rm_test]$ awk 'BEGIN{for(i=1;i<=10;i+=2) print i}'
1
3
5
7
9

## 计算1-5的和
[wqf@b1i10 rm_test]$ awk 'BEGIN{sum=0;for(i=1;i<=5;i++) sum+=i;print sum}'
[wqf@b1i10 rm_test]$ awk 'BEGIN{for(i=1;i<=5;i++) (sum+=i);{print sum}}'
[wqf@b1i10 rm_test]$ awk 'BEGIN{for(i=1;i<=5;i++) (sum+=i);print sum}'
15

b) while loop

## 计算1-5的和
[wqf@b1i10 rm_test]$ awk 'BEGIN{i=1;while(i<=5) {print i;i++}}'
1
2
3
4
5

## 打印1~10中的奇数
[wqf@b1i10 rm_test]$ awk 'BEGIN{i=1;while(i<=10) {print i;i+=2}}'
1
3
5
7
9

## 计算1-5的和
[wqf@b1i10 rm_test]$ awk 'BEGIN{i=1;sum=0;while(i<=5) {sum+=i;i++}; print sum}'
[wqf@b1i10 rm_test]$ awk 'BEGIN{i=1;while(i<=5) {(sum+=i) i++};print sum}'
15

c) nested loops

#!/bin/bash
for ((y=1;y<=5;y++))
do
    for ((x=1;x<=$y;x++))
    do
        echo -n $x    
    done
echo
done

[wqf@b1i10 ~]$ awk 'BEGIN {for(y=1;y<=5;y++) { for(x=1;x<=y;x++) {printf x};print}}'
[wqf@b1i10 ~]$ awk 'BEGIN {y=1;while(y<=5) { for(x=1;x<=y;x++) {printf x};y++;print}}'
1
12
123
1234
12345

6. Awk arithmetic operations

calculating signs

+ - * / %(模) ^(幂2^3)

Example:

[wqf@b1i10 ~]$ awk 'BEGIN{print 1+1}'
2
[wqf@b1i10 ~]$ awk 'BEGIN{print 2**3}'
8
[wqf@b1i10 ~]$ awk 'BEGIN{print 2/3}'
0.666667

Five, the actual application of awk

1) Determine the reason why the number is not hit

Statement of needs

The file is the result of statistics on the number bill, extracting the statistical meaning of each column field, and judging which condition is not met and the number is not hit.

document preparation

[wqf@b1i10 data_val]$ cat test_result.log
130****3963|0752|12|1.2|0.5833333333333334|0.5833333333333334|27|0.31|0|130****3963

Implementation process
1. First use the print function or printf function to print out the meaning of each field. The output format of the printf function can be adjusted.

##printf函数
[wqf@b1i10 ~]$ filename=/apps/wqf/cdc_model/bin/data_val/ test_result.log
[wqf@b1i10 ~]$ calling_num=`awk -F"|" '{printf "%-10s\t%-10s\n","号码:"$1, "主叫次数:"$3}' $filename`
[wqf@b1i10 ~]$ calling_hour_num=`awk -F"|" '{printf "%-10s\t%-10s\n","号码:"$1,  "某1小时内是否大于15次主叫次数:"$9}' $filename`
[wqf@b1i10 ~]$ called_dispersion_rate=`awk -F"|" '{printf "%-10s\t%-10s\n","号码:"$1 , " 被叫离散率:"$4}' $filename`
[wqf@b1i10 ~]$ calling_called_area_same_rate=`awk -F"|" '{printf "%-10s\t%-10s\n","号码:"$1,  " 被叫号码与主叫号码同归属地相同的发呼占比:"$5}' $filename`
[wqf@b1i10 ~]$ calling_visit_area_same_rate=`awk -F"|" '{printf "%-10s\t%-10s\n","号码:"$1, " 被叫号码与主叫号码发呼地相同的发呼占比:"$6}' $filename`
[wqf@b1i10 ~]$ calling_rate=`awk -F"|" '{printf "%-10s\t%-10s\n","号码:"$1," 主叫占比:"$8}' $filename`
[wqf@b1i10 ~]$ whether_calling=`awk -F"|" '{printf "%-10s\t%-10s\n","号码:"$1,"06:00-9:00是否有主叫通话记录:"$10}' $filename`

##print函数
[wqf@b1i10 ~]$ calling_num=`awk -F"|" '{print "号码:"$1, "主叫次数:" $3}' $filename`
[wqf@b1i10 ~]$ calling_hour_num=`awk -F"|" '{print "号码:"$1 ,"某1小时内是否大于15次主叫次数:" $9}' $filename`
[wqf@b1i10 ~]$ called_dispersion_rate=`awk -F"|" '{print "号码:"$1 ," 被叫离散率:" $4}' $filename`
[wqf@b1i10 ~]$ calling_called_area_same_rate=`awk -F"|" '{print "号码:"$1, " 被叫号码与主叫号码同归属地相同的发呼占比:" $5}' $filename`
[wqf@b1i10 ~]$ calling_visit_area_same_rate=`awk -F"|" '{print "号码:"$1 ," 被叫号码与主叫号码发呼地相同的发呼占比:" $6}' $filename`
[wqf@b1i10 ~]$ calling_rate=`awk -F"|" '{print "号码:"$1, " 主叫占比:" $8}' $filename`
[wqf@b1i10 ~]$ whether_calling=`awk -F"|" '{print "号码:"$1 ," 06:00-9:00是否有主叫通话记录:" $10}' $filename`

[wqf@b1i10 ~]$ echo -e $calling_num"\n"$calling_hour_num"\n"$called_dispersion_rate"\n"$calling_called_area_same_rate"\n"$calling_visit_area_same_rate"\n"$calling_rate"\n"$whether_calling
号码:130****3963 主叫次数:12 
号码:130****3963 某1小时内是否大于15次主叫次数:0
号码:130****3963 被叫离散率:1.2
号码:130****3963 被叫号码与主叫号码同归属地相同的发呼占比:0.5833333333333334
号码:130****3963 被叫号码与主叫号码发呼地相同的发呼占比:0.5833333333333334
号码:130****3963 主叫占比:0.31
号码:130****3963 06:00-9:00是否有主叫通话记录:130****3963

2. Awk nested if...elif...else structure to judge which condition is not satisfied

[wqf@b1i10 data_val]$ awk 'BEGIN{FS="|"} {if ($3<30 || $4>=1.3) {printf "%-10s\t%-10s\t%-10s\n","主叫次数:"$3,"被叫离散 率:"$4,$1"不满足主叫次数>=30且被叫离散率<1.3"} else if($9!=15 || $4>=1.3) {printf "%-10s\t%-10s\n%-10s\n","某1小时内,主叫次数是否>=15:"$9,"被叫离散率:"$4,$1":不满足规则:某1小时内,主叫次数>=15且被叫离散率<1.2"} else if($5>0.3) {printf "%-10s\n%-10s\n","被叫号码与主叫号码同归属地相同的发呼占比:"$5,$1":不满足被叫号码与主叫号码同归属地相同的发呼占比<=30%"} else if($6>0.3) {printf "%-10s\n%-10s\n","被叫号码与主叫号码发呼地相同的发呼占比:"$6,$1":不满足被叫号码与主叫号码发呼地相同的发呼占比<=30%"}  else if($7<0.9) {printf "%-10s\n%-10s\n"," 主叫占比:"$7,$1":不满足主叫占比>=90%"}}'  test_result.log

2) Find all rows where the second column is greater than 0.97, and check the number

document preparation

[wwqf@b1i10 result_data]$ head result_20230319_1.txt
neg     pos     calling_nbr     res
0.0023  0.9977  13302873173     1
0.041   0.959   13310815518     1
0.0286  0.9714  13316151376     1
0.0485  0.9515  13316270892     1
0.0083  0.9917  13316593420     1
0.0159  0.9841  13316765627     1
0.0399  0.9601  13318208857     1
0.0261  0.9739  13318445264     1
0.0172  0.9828  13318611412     1

Implementation process

[wqf@b1i10 result_data]$ awk -F ' ' '$2>0.97{print $0}' result_20230319_1.txt | wc -l
6883

3) View the space occupied by the table

Method 1:
1. View the field information and metadata storage path of the hive table

desc formatted table_name;
--查看Location:对应的路径信息为 hive表存储路径

2. View the size of the file space. Hadoop fs -du table storage path. The following example only intercepts part of the information. The file under the first list is the size of the occupied capacity

[wqf@b1i10 ~]$ hadoop fs -du hdfs://b1/apps/wqf/hive/wqf/cdc_black_history_chaiji_user_list_info
31852445  95557335  hdfs://b1/apps/wqf/hive/wqf/cdc_v1_6_501_all_sample_feature_2023031922/000000_0
31837999  95513997  hdfs://b1/apps/wqf/hive/wqf/cdc_v1_6_501_all_sample_feature_2023031922/000001_0

2. View the total capacity of the hive table in G

[wqf@b1i10 ~]$ hadoop fs -du hdfs://b1/apps/wqf/hive/wqf/cdc_black_history_chaiji_user_list_info |awk '{SUM += $1} END{print SUM/(1024*1024*1024)”G“}'
1.03685G

Method 2:

1. First find the storage path of the table to be queried. For the method, refer to the query method of method 1.
2. Check the size of the file space. Hadoop fs -ls table storage path. The following example only intercepts part of the information. The file below the fifth list is the size of the occupied capacity.

[wqf@b1i10 ~]$ hadoop fs -ls hdfs://b1/apps/wqf/hive/wqf/cdc_black_history_chaiji_user_list_info
-rw-r--r--   3 wqf wqf 31852445 2023-03-24 17:01 hdfs://b1/apps/wqf/hive/wqf/cdc_v1_6_501_all_sample_feature_2023031922/000000_0
-rw-r--r--   3 wqf wqf 31837999 2023-03-24 17:01 hdfs://b1/apps/wqf/hive/wqf/cdc_v1_6_501_all_sample_feature_2023031922/000001_0

3. View the total capacity of the hive common table, in G

[wqf@b1i10 ~]$ hadoop fs -ls hdfs://b1/apps/wqf/hive/wqf/cdc_v1_6_501_all_sample_feature_2023031922 | awk -F' ' '{SUM+=$5}END {print SUM/(1024*1024*1024)"G"}'
[wqf@b1i10 ~]$ hadoop fs -ls hdfs://b1/apps/wqf/hive/wqf/cdc_v1_6_501_all_sample_feature_2023031922 | awk -F ' ' '{print $5}'|awk '{a+=$1}END {print a/(1024*1024*1024)"G"}'
1.03685G

Guess you like

Origin blog.csdn.net/sodaloveer/article/details/130499353