Using the linux awk tool (rpm)

add by zhj: awk is very powerful, it is a simple programming language, this book abroad have a special presentation of its use. "Effective awk programming", it supports integer, string, arrays, variables do not need prior to use

Define, directly, because each type of data has a default initial value. It also supports if / for logic and other statements

Original: https://blog.51cto.com/13866901/2166164?tdsourcetag=s_pctim_aiomsg

Operating environment: centos6 Vmware

A, awk Profile

awk is a very useful tool for data processing, often acting with respect to the processing sed an entire row, which row tend awk is divided into several fields] [process, therefore, quite suitable for handling small awk data processing. awk is a report generator, the file format is processed, formatted file system here is not formatted, the contents of the file but various "layout", and further display format; we use the linux is referred to gawk GNU awk, awk and gawk in fact, linked files, so use awk and gawk on your system is the same.


Second, the basic usage of awk

awk[OPTIONS]'program' FILE1 FILE2

program:PATTERN{ACTION STATEMENT}

program: Programming Language PATTERN: Mode ACTIONSTATEMENT: motion statement, statements may be a plurality, but a plurality of intermediate statements to be separated by semicolons

OPTIONS: -F [] indicates the input field delimiter; -v VALUE variable assignment;

for example:

cat /etc/passwd
 

image.png

cat ceshi.txt |awk -v FS: '{print $1,$3}'(每行按冒号分割,输出第一个域和第三个域;默认为空格分割;注意:awk后续动作都要以单引号引起来)
 

image.png

cat ceshi.txt |awk -v FS: '{print $1"XXXX"$3}'("XXXX"代表任意内容,必须用双引号引起来)
 

image.png


Third, variable

1, built-in variables

  • FS input field separator, default blank

  • RS input record separator, default newline

  • OFS output field separator, default whitespace

  • OFS output field separator, that the default newline

  • NF number of fields in the current row

  • print NF number of fields display the current line

  • print $ NF NF field displays the value of the current row       

 

 

  • NR record number

  • FNR files were counted, line numbers

  • FILENAME 当前文件名

  • ARGC 命令行参数的个数

  • ARGV 保存命令行所给定的各参数的数组

2、自定义变量

(1)-v VALUE (变量名称区分大小写)在这里文件ceshi.txt中有多少行就显示多少行变量的值

awk -v fan="cool"  '{print fan}' ceshi.txt
 

image.png

(2)在program中自定义变量

awk 'BEGIN{FS=":";abc=1}{print $abc}' ceshi.txt
 

image.png


四、awk的格式化输出

语法 printf FORNAT,item1,item2

  • FORMAT必须提供

  • 与print语句不同,printf不会自动换行,需要使用换行符\n

  • FORMAT中需要分别为后面的每个item指定一个格式符,否则item无法显示;

格式符介绍:

  • %c 显示字符的ASCII码

  • %d ,%i 显示为十进制整数

  • %e,%E 科学技术法显示数值

  • %f 显示为浮点数

  • %g,%G 以科学技术法或浮点数格式显示数值

  • %s 显示为字符串

  • %u 显示无符号整数

  • %% 当需要显示%号时需要连续打两个百分号

举例说明:

cat ceshi.txt |awk -F: '{printf "%-10s%s\n",$1,$3}'
 

image.png


五、awk的操作符

  • 算术操作符     如:A+B A-B A*B A/B

  • 字符操作符     字符串链接

  • 赋值操作符     如:== += /= %=

  • 比较操作符     如:> >= < <=

  • 模式匹配操作符  ~ 是否能由右侧指定的模式所匹配  !~是否不能由右侧指定的模式所匹配

  • 逻辑操作符     && 与运算 || 或运算

  • 条件表达式     selector?if-true-expression:if-ials-expreion  

  • 函数调用        调用函数来进行数据的处理

举例说明

通过df命令查看当前系统磁盘占用率,查出占用率大于等于百分之20的磁盘名称以及磁盘占用率

df|awk -v FS=% '$0 ~ "/dev/sd" {print $1}'|awk '$NF>=20 {printf "DevName:%-10s Used:%s%%\n",$1,$5}'
 

image.png

awk -v FS=: '{$3>=5?usertype="Big User":usertype="Small User";printf "UserName:%-15s Type:%s\n",$1,usertype}' ceshi.txt
 

image.png


六、awk的控制语句

  • if(condition){statements}[else {statement}]

awk -F: '{if($3>=5){printf "%-10s%s\n",$1,$3}}' ceshi.txt
 

image.png

  • while循环 while(condition){statements}

echo {1..10} |awk '{n=1;while(n<=NF){if($n%2==0){print $n,"oushuo"}else {print $n,"jishu"};n++}}'
 

image.png

  • do-while循环

  • for循环

awk BEGIN'{for(i=1;i<=1000;i++){sum+=i};print sum}'
 

image.png

  • break

  • continue

  • delete array

  • exit

  • next     提前结束对本行文本的处理,而提前进入下一行的处理操作

awk -F: '{if($3%2==0)next;print $1,$3}' ceshi.txt
 

image.png


七、awk的性能测试

实验:从1加到100等于多少?

(1)time seq -s "+" 5000000 |bc

image.png

(2)time awk BEGIN'{for(i=1;i<=1000000;i++){sum+=i};print sum}'

image.png

(3)time awk BEGIN'{i=1;while(i<=1000000){sum+=i;i++};print sum}'

image.png

(4)time for ((i=1;i<=1000000;i++));do let sum+=i; done;echo $sum

image.png


八、awk数组

数组是一个包含一系列元素的表

格式如下:

                  abc[1]="libai"

                  abc[2]="lihei"

abc为数组名,[1][2]为数组下标,可以认为是数组的第一个元素,第二个元素,"libai""lihei"是元素的内容

举例说明:

awk 'BEGIN{fan[0]="libai";fan[1]="lihei";print fan[0]}'
 
awk 'BEGIN{fan[0]="libai";fan[1]="lihei";print fan[1]}'
 

image.png

awk 'BEGIN{fan[0]="libai";fan[1]="lihei";for (i in fan)print i}'
 

image.png

awk -F: '{{fan[NR]=$1;}{print NR,fan[NR]}}' ceshi.txt
 

image.png

利用数组统计每个ip地址访问量(编辑一个文件,该文件存储ip地址)

image.png

awk '{array[$1]++} END {for(key in array) print array[key],key}' a|sort -r
 

关于array[$1]++ 

(1)awk在读取第一行的时候,会读取这个数组,此时的数组是这样的,array["第一行的内容"]++

(2)此时该数组的值还没有定义,但后面有运算符号++,所以awk会将数字0自动赋值给array["第一行的值"]做++运算,所以得到的值为1.

(3)在读到与array["第一行的内容"]相同的时候继续++运算,也就意味着,运算了多少次,就是出现了多少次。

image.png


九、awk函数

1、内键函数

(1)数值处理 rand() 返回0至1之间的一个随机数

awk 'BEGIN{print rand()}'
 

image.png

从这张图中我们发现了一个问题,通过使用rand函数生成随机数,但是rand函数返回的值一直不变,所以我们需要配合srand函数

awk 'BEGIN{srand();print rand()}'
 

image.png

From this figure, we found that the generated random number generator changes, the generated random number is less than 1 decimals, if we want to generate a random integer number, we can use the value of the integer function int integer part taken

awk 'BEGIN{srand();print 100*rand()}'
 
awk 'BEGIN{srand();print int(100*rand())}'
 

image.png

(2) String Functions

The length of the length ([s]) Returns string

for example

awk '{print $0 length()}' abc.txt (每一行全部字符长度)
 

image.png

awk '{for(i=1;i<=NF;i++){print $i,length($i)}}' abc.txt(指定字符长度)
 

image.png

gsub (r, s [, t]) string to match the contents of the string t r based on the mode indicated, all the contents of which are matched to be replaced with an s represented

for example

awk '{gsub("h","H",$1);print $0}' abc.txt
 

image.png

awk '{gsub("h","H",$0);print $0}' abc.txt
 

image.png

Sub (r, s [, t]) string to match the contents of the string t r based on the mode indicated to replace the contents of which is matched to the first represented as s

for example

awk '{sub("h","H");print $0}' abc.txt(只替换指定范围第一次匹配到的符合条件的字符)
 

image.png

split (s, a [, r]) r-delimited array to cut the string to s, and the results are stored to a cut represented

for example

awk -v aa="李大;李二;李三" 'BEGIN{split(aa,lishijiazu,";");for(i in lishijiazu){print lishijiazu[i]}}'
 

image.png

awk -v aa="cc;ff;dd;ee" 'BEGIN{split(aa,lishijiazu,";");for(i in lishijiazu){print lishijiazu[i]}}'
 

image.png

We found from the figure above the array element output order may differ from the order of characters in the string, we can use the following way

awk -v aa="cc;ff;dd;ee" 'BEGIN{ abc=split(aa,lishijiazu,";");for(i=1;i<=abc;i++){print i,lishijiazu[i]}}'
 

image.png

2, user-defined functions

Function is a fundamental part of the program, awk allows us to define your own functions, a large program can be divided into multiple functions and each function can be independently tested

The general format of the user-defined function is:

function function_name(argument1,argument2,...)
   { function body }
 

function_name is the name of the user-defined function, the function name should be alphabetic characters and the remainder may be any combination of numbers, letters or underscore, awk not be used as a reserved word function names; function can accept a comma-separated plurality of parameters, parameter is not mandatory, we can also create a user-defined function without any parameters; function body consists of one or more awk statements.

 

Guess you like

Origin www.cnblogs.com/ajianbeyourself/p/11221077.html