Detailed awk Linux Three Musketeers

Introduction and first chapter awk expression examples

  • The name of one kind of weird language

  • Mode scanning and processing, report generation, and data processing.

awk linux system is not only a command, and a programming language; it can be used to process data and generate reports (Excel); data processing may be one or more files; may be directly from the standard input, and you can get the standard input through a pipe; awk can edit the command operate directly on the command line, you can write to awk program to more complex application.

sed stream editor processing text streams, water.

A, awk environmental profiles

awk mentioned are gawk, the GNU version of awk.

[root@creditease awk]# cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
[root@creditease awk]# uname -r
3.10.0-862.el7.x86_64
[root@creditease awk]# ll `which awk`
lrwxrwxrwx. 1 root root 4 Nov  7 14:47 /usr/bin/awk -> gawk 
[root@creditease awk]# awk --version
GNU Awk 4.0.2

Two, awk format

awk instruction by mode operation, and an operation mode or a combination of the composition.

Detailed awk Linux Three Musketeers

Detailed awk Linux Three Musketeers

  • That pattern mode, may be similarly understood sed pattern matching may be composed by the expression, regular expressions may be between two forward slashes. For example NR == 1, this is the mode, it can be understood as a condition.

  • Action Action i.e., by one or more statement inside the braces composition, separated by 'statement. Use as awk format.

Third, records and fields

name meaning
record Record, the line
filed Domain, region, field, column

1) NF (number of field) represents a region (column) number of row, $ NF taking a last region.

2) Take the $ symbol represents a column (area), $ 1, $ 2, $ NF

3) NR (number of record) the line number, awk record number for each row has a built-in variable to save NR, each finished with a value of +1 will automatically record NR

4) FS (-F) field separator column separator, what the line is divided into a plurality of columns

3.1 specified delimiter

[root@creditease awk]# awk -F "#" '{print $NF}' awk.txt 
GKL$123
GKL$213
GKL$321
[root@creditease awk]# awk -F '[#$]' '{print $NF}' awk.txt 
123
213
321

3.2 Basic operation conditions and operation conditions

[root@creditease awk]# cat awk.txt 
ABC#DEF#GHI#GKL$123
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk -F "#" 'NR==1{print $1}' awk.txt
ABC

Only 3.3 Conditions

 [root@creditease awk]# awk -F "#" 'NR==1' awk.txt
ABC#DEF#GHI#GKL$123

The default action will {print $ 0}

3.4 Only action

[root@creditease awk]# awk -F "#" '{print $1}' awk.txt
ABC
BAC
CAB

The default processing of all rows

A plurality of operation modes and 3.5

[root@creditease awk]# awk -F "#" 'NR==1{print $NF}NR==3{print $NF}' awk.txt 
GKL$123
GKL$321

Awareness of the $ 3.6 0

awk, a whole line of $ 0

[root@creditease awk]# awk '{print $0}' awk_space.txt
ABC DEF GHI GKL$123
BAC DEF GHI GKL$213
CBA DEF GHI GKL$321

3.7 FNR

FNR and NR similar, but more documentation is not incremented each file starting from 1 (behind multi-file processing will be mentioned)

[root@creditease awk]# awk '{print NR}' awk.txt awk_space.txt 
1
2
3
4
5
6
[root@creditease awk]# awk '{print FNR}' awk.txt awk_space.txt 
1
2
3
1
2
3

Fourth, the regular expression operators

sed awk like may be matching process of matching the input text by the mode.
awk also supports a large number of regular expression pattern, and most metacharacters sed similar support, and regular expression is an essential tool Fun Three Musketeers.

awk supported regular expression metacharacters

Detailed awk Linux Three Musketeers

awk does not support default meta characters, and you need to add parameters to support metacharacters

Metacharacters Features Examples Explanation
x{m} repeated m times x /cool{5}/ Point to note is that, cool parentheses or brackets without distinction, x can also be just a character string, so / cool {5} / denotes coo match plus 5 L, i.e. coolllll. / (Cool) {2,} / indicates a match coolcool, coolcoolcool like.
x{m,} is repeated at least m x /(cool){2,}/ Ditto
x{m,n} x repeated at least m times, but not more than n times, specify parameters: - posix or --re-interval. This parameter can not be used without this mode /(cool){5,6}/ Ditto

Regular use of expressions, the default is to find a matching string within the line, if the matching operation is executed action, but sometimes only need a fixed list matches the specified regular expression.

such as:

I would like to take the / etc / passwd file in the fifth column ($ 5) in this column for the line matches a string of mail, so you need to use two additional match operator. Awk and there are only two operators to match the regular expression.

Regular match operator
~ Area for recording or expression to match.
!~ - for expression and otherwise.

4.1 Regular examples

1) Display of GHI column awk.txt

[root@creditease awk]# cat awk.txt 
ABC#DEF#GHI#GKL$123
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk -F "#" '{print $3}' awk.txt 
GHI
GHI
GHI
[root@creditease awk]# awk -F "#" '{print $(NF-1)}' awk.txt 
GHI
GHI
GHI

2) display line comprises 321

[root@creditease awk]# awk '/321/{print $0}' awk.txt 
CBA#DEF#GHI#GKL$321

3) # a delimiter, the display beginning with the first column or the last column B at the end of row 1

[root@creditease awk]# awk -F "#" '$1~/^B/{print $0}$NF~/1$/{print $0}' awk.txt 
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321

4) # a delimiter, the display beginning with the first column of row B or C

[root@creditease awk]# awk -F "#" '$1~/^B|^C/{print $0}' awk.txt 
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk -F "#" '$1~/^[BC]/{print $0}' awk.txt 
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk -F "#" '$1~/^(B|C)/{print $0}' awk.txt 
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk -F "#" '$1!~/^A/{print $0}' awk.txt 
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321

Fifth, the comparison expression

awk is a programming language that can be more complex to determine when the condition is true, awk on the implementation of relevant action, mainly for making a judgment about a particular area, such as printing the results of 80 points or more, this must be compared to this one area of ​​judgment.

The following table lists the relational operators can use awk can be used to compare numeric string, and regular expression when the expression is true when the result of the expression is 1, and 0 otherwise, only the expression is true , awk before the implementation of relevant action.

awk supports relational operators

Operators meaning Examples
< Less than x>y
<= less than or equal to. x<=y
== equal x == y
!= not equal to x! = y
>= greater than or equal to x>=y
> more than the x<y

5.1 Comparison of expression examples

Awk.txt second display, row 3

No // //

[root@creditease awk]# awk 'NR==2{print $0}NR==3{print $0}' awk.txt 
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk 'NR>=1{print $0}' awk.txt 
ABC#DEF#GHI#GKL$123
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk '/BAC/,/CBA/{print $0}' awk.txt 
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321

The second module awk variables and execution

Full awk structure is as follows:

Detailed awk Linux Three Musketeers

A, BEGIN module

BEGIN awk module executed before reading the file, BEGIN mode is often used to modify the value of the built-in variable ORS, RS, FS, OFS and the like. You can not take any input file

Two, awk built-in variable (predefined variables)

variable name Attributes
$0 The current record, an entire row
$1,$2,$3....$a The current record of the n areas, interregional separated by FS.
FS Delimiter input region, the default is a space. field separator
NF The number of regions in the current record is the number of columns. number of field
NO The number of records have been read out, is the line number, starting at 1. number of record
RS The default input record separator newline. record separator
OFS Output area delimiter, the default is a space. output record separator
FNR Reads the record number of the current file, each file is recalculated.
FILENAME The current name of a file being processed

Special Note: FS RS supports regular expressions

2.1 The first role: the definition of built-in variables

[root@creditease awk]# awk 'BEGIN{RS="#"}{print $0}' awk.txt 
ABC
DEF
GHI
GKL$123
BAC
DEF
GHI
GKL$213
CBA
DEF
GHI
GKL$321

2.2 The second function: a marking

[root@creditease awk]# awk 'BEGIN{print "=======start======"}{print $0}' awk.txt 
=======start======
ABC#DEF#GHI#GKL$123
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321

2.3 awk realized computing function

 [root@creditease files]# awk 'BEGIN{a=8;b=90;print a+b,a-c,a/b,a%b}'
98 8 0.0888889 8

Three, END module

END End awk to read all the document, then the END module, generally for outputting a result (accumulated result arrays). BEGIN and may be similar to the end of the module identification information.

3.1 The first role: print logo

[root@creditease awk]# awk 'BEGIN{print "=======start======"}{print $0}END{print "=======end======"}' awk.txt
=======start======
ABC#DEF#GHI#GKL$123
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
=======end======

3.2 The second function: additive

1) Statistical blank line (/ etc / services file)

grep sed, awk

[root@creditease awk]# grep "^$" /etc/services  |wc -l
17
[root@creditease awk]# sed -n '/^$/p' /etc/services |wc -l
17
[root@creditease awk]# awk '/^$/' /etc/services |wc -l
17
[root@creditease awk]# awk '/^$/{i=i+1}END{print i}' /etc/services
17

2) arithmetic

1 + 2 + 3 + 100 = 5050 ......, how expressed awk?

[root@creditease awk]# seq 100|awk '{i=i+$0}END{print i}'
5050

Four, awk Detailed Summary

1, BEGIN and END only one module, BEGIN {} BEGIN {} or END {} END {} are incorrect.

2, whom doing module, may be a plurality.

Five, awk process Executive Summary

Detailed awk Linux Three Musketeers

awk implementation process:

1, the assignment command line (-F or -V)

2、执行BEGIN模式里面的内容

3、开始读取文件

4、判断条件(模式)是否成立

  • 成立则执行对应动作里面的内容
  • 读取下一行,循环判断
  • 直到读取到最后一个文件的结尾

5、最后执行END模式里面的内容

第三篇:awk数组与语法

一、awk数组

1.1 数组结构

Detailed awk Linux Three Musketeers

people[police]=110

people[doctor]=120

[root@creditease awk]# awk 'BEGIN{word[0]="credit";word[1]="easy";print word[0],word[1]}'
credit easy
[root@creditease awk]# awk 'BEGIN{word[0]="credit";word[1]="easy";for(i in word)print word[i]}'
credit
easy

1.2 数组分类


引数组:以数字为下标
关联数组:以字符串为下标

1.3 awk关联数组

现有如下文本,格式如下:即左边是随机字母,右边是随机数字, 即将相同的字母后面的数字加在一起,按字母的顺序输出

a  1
b  3
c  2
d  7
b  5
a  3 
g  2
f  6

以$1为下标,创建数组a[$1]=a[$1]+$2(a[$1]+=$2)然后配合END和for循环输出结果:

[root@creditease awk]# awk '{a[$1]=a[$1]+$2}END{for(i in a)print i,a[i]}' jia.txt 
a 4
b 8
c 2
d 7
f 6
g 2
注意:for(i in a) 循环的顺序不是按照文本内容的顺序来处理的,排序可以在命令后加sort排序

1.4 awk索引数组

以数字为下标的数组
seq生成1-10的数字,要求只显示计数行

[root@creditease awk]# seq 10|awk '{a[NR]=$0}END{for(i=1;i<=NR;i+=2){print a[i]}}'
1
3
5
7
9

seq生成1-10的数字,要求不显示文件的后3行

[root@creditease awk]# seq 10|awk '{a[NR]=$0}END{for(i=1;i<=NR-3;i++){print a[i]}}'
1
2
3
4
5
6
7
解析:改变i的范围即可,多用于不显示文件的后几行

1.5 awk数组实战去重

a++ 和 ++a

[root@creditease awk]# awk 'BEGIN{print a++}'
0
[root@creditease awk]# awk 'BEGIN{print ++a}'
1
[root@creditease awk]# awk 'BEGIN{a=1;b=a++;print a,b}'
2 1
[root@creditease awk]# awk 'BEGIN{a=1;b=++a;print a,b}'
2 2

注:

都是 b = a+1

b=a++ 先把 a 的值赋予b,然后 a + 1

b=++a 先执行a+1,然后把a的值赋予b

对一下文本进行去重处理 针对第二列去重

[root@creditease awk]# cat qc.txt 
2018/10/20   xiaoli     13373305025
2018/10/25   xiaowang   17712215986
2018/11/01   xiaoliu    18615517895 
2018/11/12   xiaoli     13373305025
2018/11/19   xiaozhao   15512013263
2018/11/26   xiaoliu    18615517895
2018/12/01   xiaoma     16965564525
2018/12/09   xiaowang   17712215986
2018/11/24   xiaozhao   15512013263
解法一:
[root@creditease awk]# awk '!a[$2]++' qc.txt 
2018/10/20   xiaoli     13373305025
2018/10/25   xiaowang   17712215986
2018/11/01   xiaoliu    18615517895 
2018/11/19   xiaozhao   15512013263
2018/12/01   xiaoma     16965564525
解析:
!a[$3]++是模式(条件),命令也可写成awk '!
a[$3]=a[$3]+1{print $0}' qc.txt
a[$3]++ ,“++”在后,先取值后加一
!a[$3]=a[$3]+1:是先取a[$3]的值,比较“!a[$3]”是否符合条件(条件非0),后加1
注意:此方法去重后的结果显示的是文本开头开始的所有不重复的行
解法二:
[root@creditease awk]# awk '++a[$2]==1' qc.txt 
2018/10/20   xiaoli     13373305025
2018/10/25   xiaowang   17712215986
2018/11/01   xiaoliu    18615517895 
2018/11/19   xiaozhao   15512013263
2018/12/01   xiaoma     16965564525
解析:
++a[$3]==1是模式(条件),也可写成a[$3]=a[$3]+1==1即只有当条件(a[$3]+1的结果)为1的时候才打印出内容
++a[$3] ,“++”在前,先加一后取值
++a[$3]==1:是先加1,后取a[$3]的值,比较“++a[$3]”是否符合条件(值为1)
注意:此方法去重后的结果显示的是文本开头开始的所有不重复的行
解法三:
[root@creditease awk]# awk '{a[$2]=$0}END{for(i in a){print a[i]}}' qc.txt
2018/11/12   xiaoli     13373305025
2018/11/26   xiaoliu    18615517895
2018/12/01   xiaoma     16965564525
2018/12/09   xiaowang   17712215986
2018/11/24   xiaozhao   15512013263

解析:
注意此方法去重后的结果显示的是文本结尾开始的所有不重复的行

1.6 awk处理多个文件(数组、NR、FNR)

使用awk取file.txt的第一列和file1.txt的第二列然后重定向到一个新文件new.txt中

[root@creditease awk]# cat file1.txt 
a b
c d
e f
g h
i j
[root@creditease awk]# cat file2.txt 
1 2
3 4
5 6
7 8
9 10
[root@creditease awk]# awk 'NR==FNR{a[FNR]=$1}NR!=FNR{print a[FNR],$2}' file1.txt file2.txt 
a 2
c 4
e 6
g 8
i 10
解析:NR==FNR处理的是第一个文件,NR!=FNR处理的是第二个文件.
注意:当两个文件NR(行数)不同的时候,需要把行数多的放前边.
解决方法:把行数多的文件放前边,行数少的文件放后边.
把输出的结果放入一个新文件new.txt中:
[root@creditease awk]# awk 'NR==FNR{a[FNR]=$1}NR!=FNR{print a[FNR],$2>"new.txt"}' file1.txt file2.txt 
[root@creditease awk]# cat new.txt 
a 2
c 4
e 6
g 8
i 10

1.7 awk分析日志文件,统计访问网站的个数

[root@creditease awk]# cat url.txt 
http://www.baidu.com
http://mp4.video.cn
http://www.qq.com
http://www.listeneasy.com
http://mp3.music.com
http://www.qq.com
http://www.qq.com
http://www.listeneasy.com
http://www.listeneasy.com
http://mp4.video.cn
http://mp3.music.com
http://www.baidu.com
http://www.baidu.com
http://www.baidu.com
http://www.baidu.com
[root@creditease awk]# awk -F "[/]+" '{h[$2]++}END{for(i in h) print i,h[i]}' url.txt
www.qq.com 3
www.baidu.com 5
mp4.video.cn 2
mp3.music.com 2
www.crediteasy.com 3

二、awk简单语法

2.1 函数sub gsub

替换功能

格式:sub(r, s ,目标) gsub(r, s ,目标)

[root@creditease awk]# cat sub.txt 
ABC DEF AHI GKL$123
BAC DEF AHI GKL$213
CBA DEF GHI GKL$321
[root@creditease awk]# awk '{sub(/A/,"a");print $0}' sub.txt 
aBC DEF AHI GKL$123
BaC DEF AHI GKL$213
CBa DEF GHI GKL$321
[root@creditease awk]# awk '{gsub(/A/,"a");print $0}' sub.txt 
aBC DEF aHI GKL$123
BaC DEF aHI GKL$213
CBa DEF GHI GKL$321
注:sub只会替换行内匹配的第一次内容;相当于sed ‘s###’
    gsub 会替换行内匹配的所有内容;相当于sed ‘s###g’
[root@creditease awk]# awk '{sub(/A/,"a",$1);print $0}' sub.txt 
aBC DEF AHI GKL$123
BaC DEF AHI GKL$213
CBa DEF GHI GKL$321

练习:

0001|20081223efskjfdj|EREADFASDLKJCV
0002|20081208djfksdaa|JDKFJALSDJFsddf
0003|20081208efskjfdj|EREADFASDLKJCV
0004|20081211djfksdaa1234|JDKFJALSDJFsddf
以'|'为分隔, 现要将第二个域字母前的数字去掉,其他地方都不变, 输出为:
0001|efskjfdj|EREADFASDLKJCV
0002|djfksdaa|JDKFJALSDJFsddf
0003|efskjfdj|EREADFASDLKJCV
0004|djfksdaa1234|JDKFJALSDJFsddf

方法:
awk -F '|'  'BEGIN{OFS="|"}{sub(/[0-9]+/,"",$2);print $0}' sub_hm.txt
awk -F '|'  -v OFS="|" '{sub(/[0-9]+/,"",$2);print $0}' sub_hm.txt

2.2 if和slse的用法

内容:

AA

BC

AA

CB

CC

AA

结果:

AA YES

BC NO YES

AA YES

CB NO YES

CC NO YES

AA YES

1) [root@creditease awk]# awk '{if($0~/AA/){print $0" YES"}else{print $0" NO YES"}}' ifelse.txt 
AA YES
BC NO YES
AA YES
CB NO YES
CC NO YES
AA YES
解析:使用if和else,if $0匹配到AA,则打印$0 "YES",else反之打印$0 " NO YES"。
2)[root@creditease awk]# awk '$0~/AA/{print $0" YES"}$0!~/AA/{print $0" NO YES"}' ifelse.txt 
AA YES
BC NO YES
AA YES
CB NO YES
CC NO YES
AA YES
解析:使用正则匹配,当$0匹配AA时,打印出YES,反之,打印出“NO YES”

2.3 next用法

如上题,用next来实现

next :跳过它后边的所有代码

 [root@creditease awk]# awk '$0~/AA/{print $0" YES";next}{print $0" NO YES"}' ifelse.txt 
AA YES
BC NO YES
AA YES
CB NO YES
CC NO YES
AA YES
解析:
{print $0" NO YES"}:此动作是默认执行的,当前边的$0~/AA/匹配,就会执行{print $0" YES";next}
因为action中有next,所以会跳过后边的action。
如果符合$0~/AA/则打印YES ,遇到next后,后边的动作不执行;如果不符合$0~/AA/,会执行next后边的动作;
next前边的(模式匹配),后边的就不执行,前边的不执行(模式不匹配),后边的就执行。

2.4 printf不换行输出以及next用法

printf :打印后不换行

如下文本,如果 Description:之后为空,将其后一行内容并入此行。

Packages: Hello-1
Owner: me me me me
Other: who care?
Description:
Hello world!
Other2: don't care
想要结果:
Packages: Hello-1
Owner: me me me me
Other: who care?
Description: Hello world!
Origial-Owner: me me me me
Other2: don't care
1)[root@creditease awk]# awk '/^Desc.*:$/{printf $0}!/Desc.*:$/{print $0}' printf.txt 
Packages: Hello-1
Owner: me me me me
Other: who care?
Description:Hello world!
Other2: don't care
解析:使用正则匹配,匹配到'/^Desc.*:$/,就使用printf打印(不换行),不匹配的打印出整行。
2)使用if和else实现
[root@creditease awk]# awk '{if(/Des.*:$/){printf $0}else{print $0}}' printf.txt 
Packages: Hello-1
Owner: me me me me
Other: who care?
Description:Hello world!
Other2: don't care
3)使用next实现
[root@creditease awk]# awk '/Desc.*:$/{printf $0;next}{print $0}' printf.txt 
Packages: Hello-1
Owner: me me me me
Other: who care?
Description:Hello world!
Other2: don't care
注:可简写成awk '/Desc.*:$/{printf $0;next}1'
printf.txt  ## 1是pattern(模式),默认action(动作)是{print $0}

2.5 去重后计数按要求重定向到指定文件

The following text, is calculated for each required number of repeated and the repetition frequency is greater than the file into gt2.txt 2, the repetition number is less than 2 files placed le2.txt

[root@creditease files]# cat qcjs.txt 
aaa
bbb
ccc
aaa
ddd
bbb
rrr
ttt
ccc
eee
ddd
rrr
bbb
rrr
bbb
[root@creditease awk]# awk '{a[$1]++}END{for(i in a){if(a[i]>2){print i,a[i]>"gt2.txt"}else{print i,a[i]>"le2.txt"}}}' qcjs.txt 
[root@creditease awk]# cat gt2.txt 
rrr 3
bbb 4
[root@creditease awk]# cat le2.txt 
aaa 2
ccc 2
eee 1
ttt 1
ddd 2
解析:{print },或括号中打印后可直接重定向到一个新文件,文件名用双引号引起来。如: {print $1 >"xin.txt"}

Three, awk required precautions

a) NR == FNR ## can not be written NR = FNR (= in awk is assigned meaning)

b) NR! = FNR ## NR is not equal FNR

c) {a = 1; a [NR]} This will given: the same command variable and the array name can not be repeated
d) when the output does not wrap printf

e) {print}, brackets or directly after printing can be redirected to a new file, the file name in double quotes. Such as: {print $ 1> "xin.txt"}

f) When the mode (condition) is 0, the operation is not performed behind,! 0 when behind the action was executed.

Author: Qin Wei

Source: CreditEase Institute of Technology

Reproduced in: https: //blog.51cto.com/14159827/2411009

Guess you like

Origin blog.csdn.net/weixin_34152820/article/details/93032575