Awk command notes

awk 'BEGIN { commands } PATTERN { commands } END { commands }'
     begin块               body块                 end块
  1. begin/end is case-sensitive, and uppercase is valid.
  2. Spaces are optional.
  3. Built-in variables need to be capitalized.

Execution process:
1) Execute the begin block first.
Equivalent to loop initialization.
2) For each input record, execute the body block.
Equivalent to executing the loop body.
If it is processing multi-line text, the default is to split the record with a newline character, that is, to loop through each line for processing.
3) Default processing for each record (each line of text):
separate the fields in each line with a delimiter (default is a space), and assign values ​​to $1, $2 ... ($0 represents the entire line) to process
each field as needed , output, etc.
4) After the loop ends, execute the end block.

pattern pattern:
1) /regular expression/: /some string/
2) relational expression: $2>10 NR%2==0
3) pattern matching expression: ~ ~!
4) range interval:
'NR==1, NR==10'1-10 lines

$ cat /tmp/test.log 
2023-05-18 05:08:56.965846   460  4936 I SOMEIP  : 2023-05-18 05:08:56.871641 [info] [TID:571]  repetitionsBaseDelay_ = 100
2023-05-18 05:08:56.965863   460  4936 I SOMEIP  : 2023-05-18 05:08:56.871645 [info] [TID:571]  repetitionsMax_ = 3
2023-05-18 05:08:30.896556  1538  1538 I SettingsIFImpl: line 1
2023-05-18 05:08:30.896556  1538  1538 I SettingsIFImpl: line 2

# 包含SOMEIP的行
$ awk '/SOMEIP/' /tmp/test.log
2023-05-18 05:08:56.965846   460  4936 I SOMEIP  : 2023-05-18 05:08:56.871641 [info] [TID:571]  repetitionsBaseDelay_ = 100
2023-05-18 05:08:56.965863   460  4936 I SOMEIP  : 2023-05-18 05:08:56.871645 [info] [TID:571]  repetitionsMax_ = 3

# 不包含SOMEIP的行
$ awk '!/SOMEIP/' /tmp/test.log
2023-05-18 05:08:30.896556  1538  1538 I SettingsIFImpl: line 1
2023-05-18 05:08:30.896556  1538  1538 I SettingsIFImpl: line 2

# 把字段$6匹配模式的行,打印每行中$6后续内容
$ awk 'BEGIN{ ORS=" " } $6~/SOMEIP/{ for(i=8; i<=NF; i++) { print $i } print "\n" }' /tmp/test.log
2023-05-18 05:08:56.871641 [info] [TID:571]  repetitionsBaseDelay_ = 100 
 2023-05-18 05:08:56.871645 [info] [TID:571]  repetitionsMax_ = 3 
 2023-05-18 05:08:56.871641 [info] [TID:571]  repetitionsBaseDelay_ = 100 
 2023-05-18 05:08:56.871645 [info] [TID:571]  repetitionsMax_ = 3 

built-in variable

FILENAME: 当前文件名
NR: 表示所有处理文件已处理的输入记录个数
FNR: 文件的当前记录数
NF: 表示数据文件中数据字段的个数,可以通过$NF获取最后一个数据字段
ARGC: 命令行参数个数
ARGV: 命令行参数数组
$0: 这个变量包含执行过程中当前行的文本内容。
$n: 一行记录的第n个字段,例如$1, $2

FS:输入字段分隔符
OFS:输出字段分隔符
RS:输入记录分割符
ORS:输出字段分隔符
FIELDWIDTHS:定义数据字段的宽度
$ awk '{print FILENAME, NF, $0}' /tmp/test.log 
/tmp/test.log 8 2023-05-18 05:08:30.896556  1538  1538 I SettingsIFImpl: line 1
/tmp/test.log 8 2023-05-18 05:08:30.896556  1538  1538 I SettingsIFImpl: line 2

$ awk '{print FILENAME "line>>" NR, $0}' /tmp/test.log 
/tmp/test.logline>>1 2023-05-18 05:08:30.896556  1538  1538 I SettingsIFImpl: line 1
/tmp/test.logline>>2 2023-05-18 05:08:30.896556  1538  1538 I SettingsIFImpl: line 2

body block:

/pattern/ {
    
     commands }

1) Execute the commands for the records matching the regular expression pattern.
2) There can be multiple matching patterns, for example:
(1) When pattern1 is matched, commands1 is executed. If pattern2 is matched, commands2 will be executed.
(2) The function of next is to skip the following pattern matching and commands (similar to if/else relationship); if there is no next command, each pattern matching will be judged.

/patten1/ {
    
    commands1; next} /pattern2/ {
    
    commands2}

3) If {commands} is omitted, print $0 is executed by default. Only /pattern/ matching filtering is performed.
(1) For simple strings, degenerate into grep.
(2) Use built-in variables and expressions to do complex searches (filtering), such as filtering out even-numbered rows.

$ awk '/Max/' /tmp/test.log 
$ awk 'NR%2==0' /tmp/test.log

4) The print function can redirect the output to a file or pipeline in commands

awk 'NR%2==0 { print $1 > "/tmp/part.log" }' /tmp/test.log

Each command block:
1) Multiple statements can be written. One command per line, no semicolon at the end. Multiple commands can also be separated by semicolons.
2) Support if-else, for, while and other control structures.
3) Support data, the index can be number or string, which is equivalent to map.
4) Customizable functions
function find_min(num1, num2) { if (num1 < num2) return num1 return num2 }
5) Built-in functions:
(1) Mathematical functions: sin, cos, log, sqrt, int, rand
(2) String functions: gsub, sub, substr, index, length, match, split, tolower , toupper, sprintf, strtonum
sub(reg, str [, target])
a) String matching reg, replaced by str.
b) target is the replacement target string, the default is 0, it can be specified as a table or field 0, and it can be specified as a table or field0 , can be specified as a table or field n.
c) gsub is the same as the prototype of sub, replacing all strings that match reg, and sub only replaces the first occurrence.
gensub(reg, str, h [, target])
a) h can specify how many occurrences of reg to replace, or "g/G" to replace all.
b) In str, "\n" can be used to refer to the position where reg appears.
print
(3) Time functions: mktime, strftime, systime
(4) Bit operation functions: and, or, xor, compl, lshift, rshift
(5) Other functions: close, flush, exit, delete, getline, next, nextfile, return system

# sub,gsub例子
# sub
$ cat /tmp/test.log 
2023-05-18 05:08:56.965846   460  4936 I SOMEIP  : 2023-05-18 05:08:56.871641 [info] [TID:571] client_timer_18215_1 repetitionsBaseDelay_ = 100
2023-05-18 05:08:56.965863   460  4936 I SOMEIP  : 2023-05-18 05:08:56.871645 [info] [TID:571] client_timer_18215_1 repetitionsMax_ = 3

#sub函数只替换第一次出现的字符串
$ awk '{ sub(/2023-05-18/, "1234-56-78"); print $0 }' /tmp/test.log 
1234-56-78 05:08:56.965846   460  4936 I SOMEIP  : 2023-05-18 05:08:56.871641 [info] [TID:571] client_timer_18215_1 repetitionsBaseDelay_ = 100
1234-56-78 05:08:56.965863   460  4936 I SOMEIP  : 2023-05-18 05:08:56.871645 [info] [TID:571] client_timer_18215_1 repetitionsMax_ = 3

#gsub函数替换一行中所有匹配的字符串
$ awk '{ gsub(/2023-05-18/, "1234-56-78"); print $0 }' /tmp/test.log 
1234-56-78 05:08:56.965846   460  4936 I SOMEIP  : 1234-56-78 05:08:56.871641 [info] [TID:571] client_timer_18215_1 repetitionsBaseDelay_ = 100
1234-56-78 05:08:56.965863   460  4936 I SOMEIP  : 1234-56-78 05:08:56.871645 [info] [TID:571] client_timer_18215_1 repetitionsMax_ = 3

# gsub删除收尾空格
$ awk '{gsub(/^ +| +$/,"")} {print "=" $0 "="}' onefile.txt

Split records with multiple delimiters:

$ cat test.log
2023-05-18 05:08:56.965846   460  4936 I SOMEIP

# 用空格、.、:多个分隔符来拆分记录,默认只用空格分隔符
$ awk -F "[ .:]" '{print $1,$2,$3,$4,$5}' test.log 
2023-05-18 05 08 56 965846

# FS和-F参数等效
$ awk 'BEGIN{FS="[ .:]"} {print $1,$2,$3,$4,$5}' test.log 
2023-05-18 05 08 56 965846

Custom variables
1) are more useful for scripts, but not very useful in the command line.
2) On the command line, custom variables are written after script instructions. The script instructions are next to awk.

$ awk '{ print name"="age }' name=tom age=12 /tmp/test.log 
tom=12
tom=12

Array:
1) Associative array, map type.
2) No need to define first
3) The for...in loop may be out of order, while the for (i=1...) loop is in order

$ awk '
BEGIN {
str="this is a string"; 
len=split(str, array, " "); 
print length(array), len; 
for (i in array) 
	print i": "array[i]; 
}'

4 4
1: this
2: is
3: a
4: string

$ awk 
'BEGIN {
str="this is a string"; 
len=split(str, array, " "); 
print length(array), len; 
for (i=1; i<=len; i++) 
	print i": "array[i]; 
}'
4 4
1: this
2: is
3: a
4: string

$ awk 
'BEGIN{ 
arr["one"]=1; 
arr["two"]=2; 
arr["three"]=3; 

for (item in arr) 
	print item"->"arr[item] 
}'
three->3
two->2
one->1

Process control:

$ cat /tmp/file.txt 
line 1
line 2
line 3
line 4
line 5
line 6

# if
$ awk '{ 
if (NR % 2 == 0) 
{
	print $0
} 
else if (NR %3 == 0) 
{
	print $0
} 
}' /tmp/file.txt 

line 2
line 3
line 4
line 6

# while 循环
$ awk 'BEGIN{ 
count=3; 
while (count>0) 
{
	print count; 
	count--;
} 
}' /tmp/file.txt 
3
2
1

# for循环
$ awk 'BEGIN{ 
for(count=3; count>0; count--) 
{
	print count;
} 
}' /tmp/file.txt 
3
2
1

control commands:

break,退出while/for循环
continue,继续下一次循环
next,继续系一条记录,把body块作为循环体,next类似continue,跳到下一条记录。
exit, 在body块中exit,结束body块循环,执行END;在END中exit,退出程序。包body块作为循环体,exit类似break。

Numerical calculation example:


$ cat /tmp/num.txt
1
2
3
4
5
# 计算均值
$ awk 'BEGIN{ sum=0; } { sum+=$1; } END{ print sum/NR }' /tmp/num.txt 
3

$ cat /tmp/num.txt
1 2 3 4 5
11 22 33 44 55
10 20 30 40 50
# 计算均值
$ awk '{ sum=0; for(i=1; i<=NF; i++) sum+=$i; print sum/NF }' /tmp/num.txt 
3
33
30


Awk can customize functions, which are not used for the time being, and will not be recorded.

Guess you like

Origin blog.csdn.net/yinminsumeng/article/details/130618023