-15- command shell entry study notes explain: One of the Three Musketeers awk- case study

Disclaimer: This article is a blogger hanchao5272 original articles, please indicate the source and leave the original link address, thank you! https://blog.csdn.net/hanchao5272/article/details/89205963

Series catalog References Portal: shell introductory study notes - Prologue

awk Case Study

1. Analysis of the log

Log Format:

{data} {time} {log_level} {thread_id} {code_line_at_class} - http_{url}_request_请求耗时={use_time} {result}

others

The statistics of the number of requests url

[root@103-32-150-sh-100-M01 log]# awk '{a[$7]++}END{for(v in a) print v,a[v]}' info.log |grep  请求参数

Filter 30 times greater than the number of request url

awk '{a[$7]++}END{for(v in a) {if(a[v] >= 30) print v,a[v]}}' info.log |grep  请求参数

Statistics request url frequency ordering, take the first 3

awk '/请求参数/{a[$7]++}END{for(v in a) print v,a[v] |"sort -k2 -nr |head -3"}' info.log
  • sort -k2: The second sort as reference
  • sort -n: The sort column values ​​as the processed
  • sort -r: Descending

Defined period of time, count the number of requests for each url

awk '$1" "$2 >= "2019-03-26 12:00:00,000" && $1" "$2 <= "2019-03-26 16:00:00,000" && $7~/请求参数/{a[$7]++}END{for(v in a) print v,a[v]}' info.log

2. File Compare

[worker@c2-a02-126-10-4 hanchao]$ cat a.txt
1
2
3
4
5
[worker@c2-a02-126-10-4 hanchao]$ cat b.txt
3
4
5
6
7

Find out the same record two files:

[worker@c2-a02-126-10-4 hanchao]$ awk 'FILENAME=="a.txt"{print FILENAME,$0} FILENAME=="b.txt"{print FILENAME"="$0}' a.txt b.txt
a.txt 1
a.txt 2
a.txt 3
a.txt 4
a.txt 5
b.txt=3
b.txt=4
b.txt=5
b.txt=6
b.txt=7
[worker@c2-a02-126-10-4 hanchao]$ awk 'FILENAME=="a.txt"{arr[$0]} FILENAME=="b.txt"{if($0 in arr) print $0}' a.txt b.txt
3
4
5

Find present in the recording a.txt but not in the b.txt

[worker@c2-a02-126-10-4 hanchao]$ awk 'FILENAME=="a.txt"{arr[$0]} FILENAME=="b.txt"{if($0 in arr){} else print $0}' a.txt b.txt
6
7

3. Merge the two files

[worker@c2-a02-126-10-4 hanchao]$ cat a.txt
zhangsan 20
lisi 23
wangwu 29
[worker@c2-a02-126-10-4 hanchao]$ cat b.txt
zhangsan man
lisi woman
wangwu man
[worker@c2-a02-126-10-4 hanchao]$ awk 'FILENAME=="a.txt"{arr[$1]=$0} FILENAME=="b.txt"{if($1 in arr) print arr[$1],$2}' a.txt b.txt

4. The combined features of a particular line

[worker@c2-a02-126-10-4 hanchao]$ cat a.txt
article/id_请求耗时=20ms
article/level_请求耗时=900ms
article/search_请求耗时=5ms
article/id_请求耗时=49ms
article/level_请求耗时=703ms
article/year_请求耗时=11ms
article/level_请求耗时=713ms
article/level_请求耗时=5ms
article/year_请求耗时=10ms
article/level_请求耗时=69ms
article/id_请求耗时=46ms
article/level_请求耗时=4ms
article/level_请求耗时=3ms
article/id_请求耗时=23ms
article/level_请求耗时=166ms
article/id_请求耗时=19ms
article/id_请求耗时=22ms
article/year_请求耗时=35ms
article/level_请求耗时=705ms
article/id_请求耗时=43ms
article/level_请求耗时=64ms
article/year_请求耗时=9ms
article/id_请求耗时=54ms
article/level_请求耗时=715ms

The time-consuming requests for each URL on a single line:

# 
[worker@c2-a02-126-10-4 hanchao]$ awk -vFS="=" -vOFS="=" '{a[$1]=a[$1]" "$2}END{for(v in a)print v,a[v]}' a.txt
article/level_请求耗时= 900ms 703ms 713ms 5ms 69ms 4ms 3ms 166ms 705ms 64ms 715ms
article/search_请求耗时= 5ms
article/id_请求耗时= 20ms 49ms 46ms 23ms 19ms 22ms 43ms 54ms
article/year_请求耗时= 11ms 10ms 35ms 9ms

# 执行过程
[worker@c2-a02-126-10-4 hanchao]$ awk -vFS="=" -vOFS="=" '{a[$1]=a[$1]" "$2;print $1,a[$1]}' a.txt | grep "article/level_"
article/level_请求耗时= 900ms
article/level_请求耗时= 900ms 703ms
article/level_请求耗时= 900ms 703ms 713ms
article/level_请求耗时= 900ms 703ms 713ms 5ms
article/level_请求耗时= 900ms 703ms 713ms 5ms 69ms
article/level_请求耗时= 900ms 703ms 713ms 5ms 69ms 4ms
article/level_请求耗时= 900ms 703ms 713ms 5ms 69ms 4ms 3ms
article/level_请求耗时= 900ms 703ms 713ms 5ms 69ms 4ms 3ms 166ms
article/level_请求耗时= 900ms 703ms 713ms 5ms 69ms 4ms 3ms 166ms 705ms
article/level_请求耗时= 900ms 703ms 713ms 5ms 69ms 4ms 3ms 166ms 705ms 64ms
article/level_请求耗时= 900ms 703ms 713ms 5ms 69ms 4ms 3ms 166ms 705ms 64ms 715ms

The column line switch

[worker@c2-a02-126-10-4 hanchao]$ cat a.txt
1 2 3 4
5 6 7 8
9
# 对第一列的处理过程
[worker@c2-a02-126-10-4 hanchao]$ awk '{for(i=1;i<=NF;i++) {a[i]=a[i]$i" ";if(i==1)print a[i]}}' a.txt
1
1 5
1 5 9
# 全部处理
[worker@c2-a02-126-10-4 hanchao]$ awk '{for(i=1;i<=NF;i++) {a[i]=a[i]$i" ";}}END{for(v in a) print a[v]}' a.txt | sort -k1
1 5 9
2 6
3 7
4 8

6. string into

# 关键在于指定FS=''
[worker@c2-a02-126-10-4 hanchao]$ echo hello |awk -vFS='' '{for(i=1;i<=NF;i++) print $i}'
h
e
l
l
o
# split 乱序的
[worker@c2-a02-126-10-4 hanchao]$ echo hello |awk '{split($0,a,"''");for(v in a) print a[v]}'
l
o
h
e
l
# 通过length和substr实现
[worker@c2-a02-126-10-4 hanchao]$ echo hello |awk '{for(i=1;i<=length($0);i++)print substr($0,i,1)}'
h
e
l
l
o

7. accumulation line

[worker@c2-a02-126-10-4 hanchao]$ cat a.txt
zhangsan 2000
lisi 3000
zhangsan 1000
wangwu 1000
zhangsan 100
lisi 200
[worker@c2-a02-126-10-4 hanchao]$ awk '{name[$1];cost[$1]+=$2}END{for(v in name){print v,cost[v]}}' a.txt
zhangsan 3100
wangwu 1000
lisi 3200

8. The maximum value of a column Get

[worker@c2-a02-126-10-4 hanchao]$ cat a.txt
article/id_请求耗时=20ms
article/level_请求耗时=900ms
article/search_请求耗时=5ms
article/id_请求耗时=49ms
article/level_请求耗时=703ms
article/year_请求耗时=11ms
article/level_请求耗时=713ms
article/level_请求耗时=5ms
article/year_请求耗时=10ms
article/level_请求耗时=69ms
article/id_请求耗时=46ms
article/level_请求耗时=4ms
article/level_请求耗时=3ms
article/id_请求耗时=23ms
article/level_请求耗时=166ms
article/id_请求耗时=19ms
article/id_请求耗时=22ms
article/year_请求耗时=35ms
article/level_请求耗时=705ms
article/id_请求耗时=43ms
article/level_请求耗时=64ms
article/year_请求耗时=9ms
article/id_请求耗时=54ms
article/level_请求耗时=715ms

# 查询每个请求的最大耗时
[worker@c2-a02-126-10-4 hanchao]$ cat a.txt |sed 's/ms//;' |awk -F '=' '{if($2>max[$1]) max[$1]=$2}END{for(v in max) print v,max[v]}'
article/level_请求耗时 900
article/search_请求耗时 5
article/id_请求耗时 54
article/year_请求耗时 35

9. remove the first and last lines

[worker@c2-a02-126-10-4 hanchao]$ seq 5 |awk 'NR>2{print s}{s=$0}'
2
3
4

Guess you like

Origin blog.csdn.net/hanchao5272/article/details/89205963