add by zhj: awk is very powerful, it is a simple programming language, this book abroad have a special presentation of its use. "Effective awk programming", it supports integer, string, arrays, variables do not need prior to use
Define, directly, because each type of data has a default initial value. It also supports if / for logic and other statements
Original: https://blog.51cto.com/13866901/2166164?tdsourcetag=s_pctim_aiomsg
Operating environment: centos6 Vmware
A, awk Profile
awk is a very useful tool for data processing, often acting with respect to the processing sed an entire row, which row tend awk is divided into several fields] [process, therefore, quite suitable for handling small awk data processing. awk is a report generator, the file format is processed, formatted file system here is not formatted, the contents of the file but various "layout", and further display format; we use the linux is referred to gawk GNU awk, awk and gawk in fact, linked files, so use awk and gawk on your system is the same.
Second, the basic usage of awk
awk[OPTIONS]'program' FILE1 FILE2
program:PATTERN{ACTION STATEMENT}
program: Programming Language PATTERN: Mode ACTIONSTATEMENT: motion statement, statements may be a plurality, but a plurality of intermediate statements to be separated by semicolons
OPTIONS: -F [] indicates the input field delimiter; -v VALUE variable assignment;
for example:
Third, variable
1, built-in variables
-
FS input field separator, default blank
-
RS input record separator, default newline
-
OFS output field separator, default whitespace
-
OFS output field separator, that the default newline
-
NF number of fields in the current row
-
print NF number of fields display the current line
-
print $ NF NF field displays the value of the current row
-
NR record number
-
FNR files were counted, line numbers
-
FILENAME 当前文件名
-
ARGC 命令行参数的个数
-
ARGV 保存命令行所给定的各参数的数组
2、自定义变量
(1)-v VALUE (变量名称区分大小写)在这里文件ceshi.txt中有多少行就显示多少行变量的值
(2)在program中自定义变量
四、awk的格式化输出
语法 printf FORNAT,item1,item2
-
FORMAT必须提供
-
与print语句不同,printf不会自动换行,需要使用换行符\n
-
FORMAT中需要分别为后面的每个item指定一个格式符,否则item无法显示;
格式符介绍:
-
%c 显示字符的ASCII码
-
%d ,%i 显示为十进制整数
-
%e,%E 科学技术法显示数值
-
%f 显示为浮点数
-
%g,%G 以科学技术法或浮点数格式显示数值
-
%s 显示为字符串
-
%u 显示无符号整数
-
%% 当需要显示%号时需要连续打两个百分号
举例说明:
五、awk的操作符
-
算术操作符 如:A+B A-B A*B A/B
-
字符操作符 字符串链接
-
赋值操作符 如:== += /= %=
-
比较操作符 如:> >= < <=
-
模式匹配操作符 ~ 是否能由右侧指定的模式所匹配 !~是否不能由右侧指定的模式所匹配
-
逻辑操作符 && 与运算 || 或运算
-
条件表达式 selector?if-true-expression:if-ials-expreion
-
函数调用 调用函数来进行数据的处理
举例说明
通过df命令查看当前系统磁盘占用率,查出占用率大于等于百分之20的磁盘名称以及磁盘占用率
六、awk的控制语句
-
if(condition){statements}[else {statement}]
-
while循环 while(condition){statements}
-
do-while循环
-
for循环
-
break
-
continue
-
delete array
-
exit
-
next 提前结束对本行文本的处理,而提前进入下一行的处理操作
七、awk的性能测试
实验:从1加到100等于多少?
(1)time seq -s "+" 5000000 |bc
(2)time awk BEGIN'{for(i=1;i<=1000000;i++){sum+=i};print sum}'
(3)time awk BEGIN'{i=1;while(i<=1000000){sum+=i;i++};print sum}'
(4)time for ((i=1;i<=1000000;i++));do let sum+=i; done;echo $sum
八、awk数组
数组是一个包含一系列元素的表
格式如下:
abc[1]="libai"
abc[2]="lihei"
abc为数组名,[1][2]为数组下标,可以认为是数组的第一个元素,第二个元素,"libai""lihei"是元素的内容
举例说明:
利用数组统计每个ip地址访问量(编辑一个文件,该文件存储ip地址)
关于array[$1]++
(1)awk在读取第一行的时候,会读取这个数组,此时的数组是这样的,array["第一行的内容"]++
(2)此时该数组的值还没有定义,但后面有运算符号++,所以awk会将数字0自动赋值给array["第一行的值"]做++运算,所以得到的值为1.
(3)在读到与array["第一行的内容"]相同的时候继续++运算,也就意味着,运算了多少次,就是出现了多少次。
九、awk函数
1、内键函数
(1)数值处理 rand() 返回0至1之间的一个随机数
从这张图中我们发现了一个问题,通过使用rand函数生成随机数,但是rand函数返回的值一直不变,所以我们需要配合srand函数
From this figure, we found that the generated random number generator changes, the generated random number is less than 1 decimals, if we want to generate a random integer number, we can use the value of the integer function int integer part taken
(2) String Functions
The length of the length ([s]) Returns string
for example
gsub (r, s [, t]) string to match the contents of the string t r based on the mode indicated, all the contents of which are matched to be replaced with an s represented
for example
Sub (r, s [, t]) string to match the contents of the string t r based on the mode indicated to replace the contents of which is matched to the first represented as s
for example
split (s, a [, r]) r-delimited array to cut the string to s, and the results are stored to a cut represented
for example
We found from the figure above the array element output order may differ from the order of characters in the string, we can use the following way
2, user-defined functions
Function is a fundamental part of the program, awk allows us to define your own functions, a large program can be divided into multiple functions and each function can be independently tested
The general format of the user-defined function is:
function_name is the name of the user-defined function, the function name should be alphabetic characters and the remainder may be any combination of numbers, letters or underscore, awk not be used as a reserved word function names; function can accept a comma-separated plurality of parameters, parameter is not mandatory, we can also create a user-defined function without any parameters; function body consists of one or more awk statements.