细说awk-2

观点:学习编写脚本的最好方法,是从编写开始!当编写脚本到一定程度时,再来阅读脚本的规则(awk手册)。

一,变量与系统常量:

初级问题1统计每一种码率对应的平均压缩倍数

文件bitRate内容如下:

resolution=1280,bitRate=1000000,YUV/H264multiple=202)

resolution=1280,bitRate=1000000,YUV/H264multiple=232)

resolution=1280,bitRate=1000000,YUV/H264multiple=173)

resolution=1280,bitRate=2000000,YUV/H264multiple=156)

resolution=1280,bitRate=2000000,YUV/H264multiple=146)

resolution=1280,bitRate=2000000,YUV/H264multiple=153)

resolution=1280,bitRate=3000000,YUV/H264multiple=153)

resolution=1280,bitRate=3000000,YUV/H264multiple=154)

resolution=1280,bitRate=4000000,YUV/H264multiple=72)

resolution=1280,bitRate=4000000,YUV/H264multiple=73)

resolution=1280,bitRate=4000000,YUV/H264multiple=75)

......

编写shell脚本:

#! /bin/bash

#

#caution,this awk program may contain bug !

#if the input file have every different kind of recored ...

awk  '

BEGIN {FS="\(|\)|,|="}

#NF == 7 {print $4 , $6}

NR==1 { iBR=$4

sum+=$6  # awk expression devided with return or semicolon

++times

privBR=$4

next }

$4==iBR { sum+=$6

 ++times

 privBR=$4 }

$4!=iBR {

    iBR=$4

 print "bitRate" , privBR , "rlt:" , sum/times

times=1

sum=$6

    privBR=$4 }

END {print "bitRate" , privBR , "rlt:" , sum/times }

'  $*  # | awk '{print $2,$1,$3,$4}' | sort

#file like form: "resolution=1280,bitRate=2000000,YUV/H264multiple=144)"

 

 

 

二,使用awk语句和awk内置函数

初级问题2我想用自己所能想到的字符串,快速测试自己写的正则表达式。

运行脚本 $./CheckPattern.sh  mypattern

          > Xxx

          Ok!

          > xxdd

  Wrong!

 

编写shell脚本CheckPattern1.sh文件:

#!/bin/sh

#

currentTTY=$(tty)

echo $currentTTY

 

#get pattern from the cmd parameter

testTP=$*  #注意$*shell 命令行的系统变量(父进程,相当于全局变量),而awk

echo $testTP  #中的$0,$1$2,$3....则是awk的系统变量(子进程,相当于局部变量)

  #shell中传变量到awk中还好,反过来似乎比较麻烦

awk  -v  TP="$testTP"  -v  CTTY=$currentTTY  'BEGIN {

Print  "the pattern is:" , TP;

while(1 > 0){

getline  str  <  CTTY;

print  "input str is :" , str;

if(str ~ TP ) {print "pattern:",TP,"str:",str,"rlt:","match!"}

else {print "pattern:",TP,"str:",str,"rlt:","mismatching!"}

}

}'

 

问题是,如何把shell命令行pattern参数传递到BEGIN模块内,这里解决的办法是通过awk-v 选项,指定TPCTTP参数在BEGIN执行之前解析。$testTP用双引号是因为,若pattern有空格,则 -v 后面只是简单的字符串展开,导致pattern以空格分隔的第二个域,被当做输入文件尝试打开。问题是,干嘛要在shell命令行上传入pattern啊?这样使用起来不能实时改变pattern,只能退出程序然后再次启动,重新在命令行输入pattern。改进版本如下:

初级问题3随便输入一个正则表达式pattern,用自己所能想到的字符串,快速测试自己写的正则表达式。

编写shell脚本CheckPattern2.sh文件:

#!/bin/sh

#check pattern program

echo "welcome to use the check pattern program v.0.0.1"

echo "caution:the pattern only awk's verion, grep or sed maybe mismatch!"

echo "usage $./checkpattern2.sh"

currentTTY=$(tty)

echo  $currentTTY

awk  -v  CTTY=$currentTTY  'BEGIN {

#print "the pattern is:","\""TP"\"";

while(1 > 0){

print "\n";

print "current pattern is:","\""TP"\"";

print "input \"K\" to check pattern; \"G\" to change pattern; \"E\" to exit >> ";

getline str < CTTY;

if(str=="K"){

print "please input string...";

getline str < CTTY;

print "string is :","\""str"\"";

if(str ~ TP ) {print "pattern:","\""TP"\"" "\n" "rlt:","match!"}

else {print "pattern:","\""TP"\"" "\n" "rlt:","mismatching!"}

continue;

}

if(str=="G"){

print "please input pattern...";

getline TP < CTTY;

print "pattern is :","\""TP"\"";

continue;

}

 

if(str=="E"){

break;

}else{

print "cmd input error!"

}

}

}' 

运行的效果:

current pattern is: "^hello{2,}"

input "K" to check pattern; "G" to change pattern; "E" to exit >>

K

please input string...

hello

string is : "hello"

pattern: "^hello{2,}"

rlt: mismatching!

current pattern is: "^hello{2,}"

input "K" to check pattern; "G" to change pattern; "E" to exit >>

K

please input string...

hellooo

string is : "hellooo"

pattern: "^hello{2,}"

rlt: match!

酷不酷?

总是输入K很不方便,改下程序:
#!/bin/sh

#check pattern program

echo "welcome to use the check pattern program v.0.0.1"

echo "caution:the pattern only awk's verion, grep or sed maybe mismatch!"

echo "usage $./checkpattern.sh"

currentTTY=$(tty)

echo $currentTTY

awk -v CTTY=$currentTTY 'BEGIN {

#print "the pattern is:","\""TP"\"";

while(1 > 0){

print "\n";

print "current pattern is:","\""TP"\"";

print "input string to check pattern; \"G\" to change pattern; \"E\" to exit >> ";

getline str < CTTY;

if(str=="G"){

print "please input pattern...";

getline TP < CTTY;

print "pattern is :","\""TP"\"";

continue;

}

if(str=="E"){

break;

}

#check the string is match the pattern

print "string is :","\""str"\"";

if(str ~ TP ) {print "pattern:","\""TP"\"" "\n" "rlt:","match!";}

else {print "pattern:","\""TP"\"" "\n" "rlt:","mismatching!";}

continue;

}

}'

运行效果:

sfjiang@sf-vm:~/Desktop/AwkTest$ ./checkPattern2.sh

welcome to use the check pattern program v.0.0.1

caution:the pattern only awk's verion, grep or sed maybe mismatch!

usage $./checkpattern.sh

/dev/pts/0

current pattern is: ""

input string to check pattern; "G" to change pattern; "E" to exit >>

me

string is : "me"

pattern: ""

rlt: match!

current pattern is: ""

input string to check pattern; "G" to change pattern; "E" to exit >>

G

please input pattern...

^me(.*)it$

pattern is : "^me(.*)it$"

current pattern is: "^me(.*)it$"

input string to check pattern; "G" to change pattern; "E" to exit >>

meiiit

string is : "meiiit"

pattern: "^me(.*)it$"

rlt: match!

current pattern is: "^me(.*)it$"

input string to check pattern; "G" to change pattern; "E" to exit >>

mit

string is : "mit"

pattern: "^me(.*)it$"

rlt: mismatching!

current pattern is: "^me(.*)it$"

input string to check pattern; "G" to change pattern; "E" to exit >>

三,使用数组

初级问题4:统计一下码率对应的平均压缩率

文件bitRate内容如下:

resolution=1280,bitRate=2000000,YUV/H264multiple=156)

resolution=1280,bitRate=1000000,YUV/H264multiple=202)

resolution=1280,bitRate=1000000,YUV/H264multiple=232)

resolution=1280,bitRate=3000000,YUV/H264multiple=154)

resolution=1280,bitRate=1000000,YUV/H264multiple=173)

resolution=1280,bitRate=2000000,YUV/H264multiple=146)

resolution=1280,bitRate=2000000,YUV/H264multiple=153)

resolution=1280,bitRate=3000000,YUV/H264multiple=153)

resolution=1280,bitRate=4000000,YUV/H264multiple=72)

resolution=1280,bitRate=4000000,YUV/H264multiple=73)

resolution=1280,bitRate=5000000,YUV/H264multiple=55)

......

在初级问题1中试图解决这个问题,但是所写的awk程序还不够健壮,并且程序条理比较紊乱,通过awk数组特性,将使得本问题解决的清晰自然!

#! /bin/bash

###

awk '

BEGIN  { FS="(|)|,|=" }

{

++times[$4];

sum[$4]+=$6;

}

END {for( i  in  avg)  #special for loop !

{

print"BitRate:",i,", average compress:",sum[i]/times[i];

}

}

'  $*

这里的awk代码比初级问题1的代码清晰得多,主要用到了awk的数组特性,以及awkfor语句对数组下标的遍历。以前阅读c++ primer的序言时,里面提到由于语言的限制,需要通过很多额外的技术技巧来绕过语言的缺陷,看来确实是如此。如果没有for语句对数组下标值的“提取”,那么可能就需要两个数组来关联值超大的“下标”和数组值的对应,遍历也就更加困难。

写了这几个awk代码,现在可以粗浅的认知下它的编程风格。感觉awkC代码更加自由,比如awk表达式分隔符,既可以通过换行来分隔,也可以通过分号来分隔。最后一个表达式(一般后接大括号 } )可以不带分隔符。而C语言的表达式之间一定要且只能用分号分隔。

Awk的数组功能比C语言强大的多,C是以数值类型、字符类型、结构体等为基本的操作对象。而Awk屏蔽了字符类型细节,直接以字符串()为操作对象。数组作为awk的一大特性,其下标关键字)既可以是数字也可以是字符串。数组变量一样,都是使用时自动创建;并且,它们的作用域具有全局性(从开始使用起,直到awk退出才结束)。

Awk的自由强大需要一些规范,比如表达式后面向C语言样加分号。操作变量和数组时,把操作的集合用大括号括起来,以此来表达一个动作整体。比如,若上面蓝色代码部分,若去除大括号,则awk会把每行的输入打印出来,因为+++=操作游离在过程动作)之外。

猜你喜欢

转载自blog.csdn.net/sf_jiang/article/details/78875750
awk
今日推荐