awk
基础语法
Awk –Fs ‘/pattern/ {action}’ input-file
(或者)
Awk –Fs ‘{action}’ input-file
-F is the field delimiter. If not specified, the default will use the space as a delimiter.
/ pattern / and {action} 9-AWk needed in single quotes.
/ pattern / is optional. If not specified, awk will process all the records in the input file. If a specified pattern, awk only recording mode specified matching process.
Awk program structure (BEGIN, body, END) region
BEGIN region
syntax Begin region:
BEGIN {} awk-Commands
command BEGIN region only beginning, executed before awk command execution body region.
BEGIN area very suitable for printing packet header information, and is used to initialize variables.
BEGIN region may have one or more awk command
keywords must use uppercase BEGIN
BEGIN field is optional
BODY region
/ pattern / {action}
read each row, one line
END field
END {awk-commands} performed only once
awk -F ":" '/^root/{print }' passwd
Built-in variables
awk 'the BEGIN the FS = { ","} {Print $ 2, $}. 3' employee.txt
awk 'the BEGIN {Print "test1", "test2"}'
without using the comma, awk will not use the OFS, the output without any spaces between the variables
$ Gawk 'BEGIN {print "Hello World!"} {Print $ 0} END {print "byebye"}' data1
built-in variable
$ 0 entire record
$ recording the first data field
$ 2 record in the second data fields
$ n the n-th record data fields
FIELDWIDTHS spaces separated by a number, defined specifically for each field width
FS input field separator
RS input record separator
OFS output field separator
ORS output field separator
ARGC 当前命令行参数个数
ARGIND 当前文件在ARGV中的索引
ARGV 包含命令行参数的数组
CONVFMT 数字的转换格式(参见printf语句),默认值为%.6g
ENVIRON 由当前shell环境变量及其值组成的关联数组
ERRNO 当读取或关闭输入文件发生错误时的系统错误号
FILENAME 用作gawk输入的数据文件的文件名
FNR 当前数据文件中的记录数
IGNORECASE 设成非零时,忽略gawk命令中出现的字符串的字符大小写
NF 数据文件中的字段总数
NR 已处理的输入记录数
FNR 文件记录数
OFMT 数字的输出格式,默认值为%.6g
RLENGTH 由match函数所匹配的子串的长度
RSTART 由match函数所匹配的子串的起始位置
示例:
命令行参数个数
awk '{print ARGC}’ /etc/fstab /etc/inittab
命令行各参数
awk ‘BEGIN {print ARGV[0]}’ /etc/fstab /etc/inittab
awk '{print FILENAME, "record number is",NR,"FNR IS" ,FNR }' awk passwd
variable
Awk variables begin with a letter, and subsequent characters can be numbers, letters, or underscores. Keywords can not be used as a variable awk
awk variable can be used directly without prior notice. If you want to initialize a variable, as in the best BEGIN area, it will only be executed once.
Custom variables
-v or directly defined
printf 格式化输出
格式化输出:printf “FORMAT”, item1, item2, .
(1) 必须指定FORMAT
(2) 不会自动换行,需要显式给出换行控制符,\n
(3) FORMAT中需要分别为后面每个item指定格式符
一元操作符
操作符 描述
+ 取正,数字本身返回
- 取反
++
--
算术操作符
操作符 描述
+
-
*
/
%
awk 'NR%2 == 0 {print NR,$0}' passwd
字符串操作符
赋值操作符
操作符 描述
=
+=
-=
*=
/=
%=
比较操作符
>
>=
<
<=
==
!=
&& 且
|| 或
正则表达式
操作符 描述
~ 匹配
!~ 不
awk -F: '$1~"ro"' passwd 第一个字段包含ro
$ awk 'BEGIN { FS=":";print "begin test" } {print $1} END {print "itis end "} ' passwd
Match operator
$ 1 ~ / ^ data /
gawk -F: '$4 == 0{print $1}' /etc/passwd
行范围
awk -F: ‘/^root\>/,/^nobody\>/ {print $1}' /etc/passwd
awk -F: ‘(NR>=10&<=20){print NR,$1}' /etc/passwd (小括号加不加都行)
awk结构化命令
if
单条语句
if(conditional-expression ) {statements ;.......}
多条
if (conditional-expression)
{
action1; #依次执行
action2;
}
if else
if (conditional-expression)
action1
else
action2
if(condition) {statements;…} else {statements;…}
三元操作符
codintional-expression ? action1 : action2 ;
while
while (codition)
{
Actions
}
while(conditon) {statments;…}
do-while
do
{
action
}
while(condition)
for
for(initialization;condition;increment/decrement)
for(expr1;expr2;expr3) {statements;…}
if-then-else语句:
if (condition) statement1; else statement2
while语句:
while (condition)
{
statements
}
do-while语句:
do {
statements
} while (condition)
for语句:
for(variable assignment; condition; iteration process)
示例
seq 10 | awk 'i=0{print $0}' i=0不打印
seq 10 | awk 'i=1{print $0}' =1 打印 与大括号无关
seq 10 | awk 'i=!i{print i, $0}' 开始i未赋值,!i 为真(即1),打印,之后为假(0),不打印,只打印奇数行
seq 10 | awk '!(i=!i){print i, $0}' 同上,打印偶数行
取磁盘利用率并显示
df -h | awk -F "[[:space:]]+|%" '/^\/dev\/sd/{ if ($5>10) print $1, $5}'
awk '/^[[:space:]]*linux16/ {i=1;while (i<= NF) {print $i,length($i);i++} }' /boot/grub2/grub.cfg
for
for(variable assignment;condition;iteration process)
{for-body}
awk 'BEGIN{wkd["mo"]="monday";wkd["fr"]="friday";wkd["sat"]="satday" ; for( i in wkd ){ print i,wkd[i]}}'
awk 'BEGIN{sum=0; for (i=1;i<=100;i++){ sum+=i} print sum }'
next:
提前结束对本行处理而直接进入下一行处理(awk自身的循环)
数组
array[index-expression]
index-expression:
(1) 可使用任意字符串;字符串要使用双引号括起来
(2) 如果某数组元素事先不存在,在引用时,awk会自动创建此元素,并将其值初始化为“空串”
(3) 若要判断数组中是否存在某元素,要使用“index in array”格式进行遍历
Check the number of states
netstat -tan | awk '/ ^ tcp / {state [$ NF] ++} END {for (i in state) {print i, state [i]}}'
access.log 取前十ip ,并加入防火墙
awk '{ip[$1]++} END{for (i in ip ){print i, "连接数 " ip[i]}} ' access_log | sort -nr -k 3 | head
加入iptables防火墙
iptables -A INPUT -s IP -j REJECT
本机连接的ip 取前十
awk '{split($5,ip,":");count[ip[1]]++;print ip[1],"链接数" , count[ip[1]]}' ss.log | sort -nr -k 3 | head
awk -F "[[:space:]]+|:" '{ ip[$6]++}END{for(i in ip) { print "summery", i,"links ", ip[i] } } ' ss.log | sort -nr -k4
取日志里ip ,以数字开头的,
awk '/^[0-9]/ {ip[$1]++ } END{for (i in ip ) print i,ip[i] } ' aess_log
连接数大于100添加至防火墙
while true ; do
awk '/^[0-9]/ {ip[$1]++ } END{for (i in ip ) { if (ip[i]>100) print i} } ' access_log | while read line ;do echo " $line" ; done
sleep 10
done
do iptables -A INPUT -s $line -j REJECT
Taking the random number
awk 'BEGIN {srand (); for (i = 1; i <= 10; i ++) {print rand ()}}'
字符串操作
• length([s]):返回指定字符串的长度
• sub(r,s,[t]):对t字符串搜索r表示模式匹配的内容,并将第一个匹配内容替换为s
echo "2008:08:08 08:08:08" | awk 'gsub(/:/,"-",$0)'
作业:
1 blog.magedu.com
2 www.magedu.com
3 hhhh.magedu.com
4 dddd.magedu.com
5 b333.magedu.com
6 bkkk.magedu.com
7 ssss.magedu.com
8 wog.magedu.com
9 ulog.magedu.com
取主机名
awk -F "[ .]" '{print $2}' soho.txt :确定分隔符后取域
取fstab 文件系统类型 出现次数
awk '/^UUID/{fs[$3]++} END{for (i in fs) {print i,fs[i] } } ' fstab
fstab 单词出现次数
grep -wEo "[[:alpha:]]+" fstab | awk '{word[$1]++} END{for (i in word) {print i,word[i] } } '
提取数字
echo "Yd$C@M05MB%9&Bdh7dq+YVixp3vpw" | awk 'gsub(/[^[:digit:]]/," ",$0 ) '
产生随机数
awk 'BEGIN{srand(); for (i=1;i<=200;i++) { if (i==200 ) {printf "%d", int(rand()*100) ;}else {printf "%d,", int(rand()*100) }} }'
取如上随机数最大最小
awk -F "," ' { MAX=$1;MIN=$1; for (i=1;i<=NF;i++) {if ( $i>= MAX ) { MAX=$i } ; if ( $i <= MIN) { MIN=$i } } } END{ print "MAX=",MAX, "MIN=" ,MIN } ' soho.txt
http://mail.magedu.com/index.html
http://www.magedu.com/test.html
http://study.magedu.com/index.html
http://blog.magedu.com/index.html
http://www.magedu.com/images/logo.jpg
取完全限定域名
确定分割符,选择域,计数打印
awk -F"/" '{FQ[$3]++} END{ for(i in FQ ) print i,FQ[i] }' soho.txt | sort -rn -k 2
例题:
inode|beginnumber|endnumber|counts|
106|3363120000|3363129999|10000|
106|3368560000|3368579999|20000|
310|3337000000|3337000100|101|
310|3342950000|3342959999|10000|
310|3362120960|3362120961|2|
311|3313460102|3313469999|9898|
311|3313470000|3313499999|30000|
311|3362120962|3362120963|2|
输出格式
310|3337000000|3362120961|10103|
311|3313460102|3362120963|39900|
106|3363120000|3368579999|30000|
awk -F'|' -v OFS='|' '/^[0-9]/{inode[$1]++; if(!bn[$1]){bn[$1]=$2} else if(bn[$1]>$2){bn[$1]=$2}; if(en[$1]<$3)en[$1]=$3;cnt[$1]+=$(NF-1)} E{for(i in inode)print i,bn[i],en[i],cnt[i]}' soho.txt
用awk命令,计算一个目录下文件大小的总和
find . -maxdepth 1 -type f -ls | awk '{sum+=$7} END {print sum} '
统计链接到本地数最大的IP10个
netstat -an | head | awk -F "[[:space:]]+|:" ' NR> 2 {print $6}'
netstat -an | head | awk -F "[[:space:]]+|:" ' NR> 2 {ip[$6]++} END{for (i in ip ) print i,ip[i] }' | sort -nr -k 2|head