Use of awk and log filtering

awk

1.What is awk

awk is a programming language used for text and data processing under linux/unix. Data can come from standard input (stdin), one or more files, or the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions, and is a powerful programming tool under Linux/Unix. It is used from the command line, but more commonly as a script. Awk has many built-in functions, such as arrays, functions, etc. This is the same as the C language. Flexibility is the biggest advantage of awk.

2.awk command format and options

2.1 Grammar format
awk [options] 'script' var=value file(s)
awk [options] -f scriptfile var=value file

Command options

  • **-F fs: **fs specifies the input delimiter. fs can be a string or a regular expression, such as -F:. The default delimiter is continuous spaces or tabs.
  • **-v var=value:** Assign a user-defined variable and pass the external variable to awk.
  • **-f scripfile:** Read awk commands from a script file.
  • **-m[fr] val: ** Sets an intrinsic limit on the val value. The -mf option limits the maximum number of blocks allocated to val; the -mr option limits the maximum number of records. These two functions are extended functions of the Bell Labs version of awk and are not applicable in standard awk.
2.2 awk variables

Variables: built-in variables and custom variables, -voptions added before variables

2.2.1 awk’s built-in variables (predefined variables)

Description: [A][N][P][G]Represents the first tool that supports variables, [A]=awk, [N]=nawk, [P]=POSIXawk, [G]=gawk

(1) Variable description

[A] :

  • NF represents the number of fields, which corresponds to the current number of fields during execution. $NFIt refers to the last column and $(NF-1)the second to last column.

  • NR represents the number of records, which corresponds to the current line number during execution, and can be followed by multiple files. The second file line number continues from the last line number of the first file.

  • Output format for OFMT numbers (default is %.6g).

  • OFS output field delimiter (default is a space).

  • ORS output record delimiter (default is a newline).

  • RS record separator (default is a newline character).

  • FS field separator (default is any space).

  • FILENAME The name of the current input file.

[N] :

  • The number of ARGC command line parameters.

  • ARGV contains an array of command line arguments.

  • ERRNO Description of the last system error.

  • RSTARTThe first position of the string matched by the match function.

  • RLENGTH is the length of the string matched by the match function.

  • SUBSEP array subscript delimiter (default is 34).

[G]:

  • The position of the current file in the ARGIND command line (counting from 0).

  • IGNORECASE If true, performs a case-ignoring match.

  • CONVFMT digital conversion format (default is %.6g).

  • FIELDWIDTHS List of field widths (space separated).

[P] :

  • ENVIRON environment variable associative array.

  • FNR is the same as NR, but relative to the current file.

(2) Example

[root@along ~]# cat awkdemo
hello:world
linux:redhat:lalala:hahaha
along:love:youou
[root@along ~]# awk -v FS=':' '{print $1,$2}' awkdemo  #FS指定输入分隔符
hello world
linux redhat
along love
[root@along ~]# awk -v FS=':' -v OFS='---' '{print $1,$2}' awkdemo  #OFS指定输出分隔符
hello---world
linux---redhat
along---love
[root@along ~]# awk -v RS=':' '{print $1,$2}' awkdemo
hello
world linux
redhat
lalala
hahaha along
love
you
[root@along ~]# awk -v FS=':' -v ORS='---' '{print $1,$2}' awkdemo
hello world---linux redhat---along love---
[root@along ~]# awk -F: '{print NF}' awkdemo
2
4
3
[root@along ~]# awk -F: '{print $(NF-1)}' awkdemo  #显示倒数第2列
hello
lalala
love
[root@along ~]# awk '{print NR}' awkdemo awkdemo1
1
2
3
4
5
[root@along ~]# awk END'{print NR}' awkdemo awkdemo1
5
[root@along ~]# awk '{print FNR}' awkdemo awkdemo1
1
2
3
1
2
[root@along ~]# awk '{print FILENAME}' awkdemo
awkdemo
awkdemo
awkdemo
[root@along ~]# awk 'BEGIN {print ARGC}' awkdemo awkdemo1
3
[root@along ~]# awk 'BEGIN {print ARGV[0]}' awkdemo awkdemo1
awk
[root@along ~]# awk 'BEGIN {print ARGV[1]}' awkdemo awkdemo1
awkdemo
[root@along ~]# awk 'BEGIN {print ARGV[2]}' awkdemo awkdemo1
awkdemo1
2.2.2 Custom variables

Custom variables (character case sensitive)

(1)-v var=value

① Define variables first, then execute the action print

[root@along ~]# awk -v name="along" -F: '{print name":"$0}' awkdemo
along:hello:world
along:linux:redhat:lalala:hahaha
along:along:love:you

② Define variables after executing the action print

[root@along ~]# awk -F: '{print name":"$0;name="along"}' awkdemo
:hello:world
along:linux:redhat:lalala:hahaha
along:along:love:you

(2) Define directly in program

You can put the executed actions in the script and call the script -f directly

[root@along ~]# cat awk.txt
{
    
    name="along";print name,$1}
[root@along ~]# awk -F: -f awk.txt awkdemo
along hello
along linux
along along

3.awk operation and judgment

As one of the characteristics of a programming language, awk supports a variety of operations, which are basically the same as those provided by the C language. Awk also provides a series of built-in operation functions (such as log, sqr, cos, sin, etc.) and some functions for operating (operations) on strings (such as length, substr, etc.). References to these functions greatly improve the computing capabilities of awk. As part of the conditional transfer instructions, relational judgment is a function that every programming language has, and awk is no exception. Awk allows a variety of tests. As a style matching, it also provides pattern matching expressions (matching) and! (Mismatch). As an extension to testing, awk also supports logical operators.

3.1 Operators
3.1.1 Arithmetic operators
operator describe
+ - Add, subtract
* / & Multiplication, division and remainder
+ - ! Unary addition, subtraction and logical negation
^ *** exponentiation
++ – increase or decrease, as a prefix or suffix
3.1.2 Logical operators
operator describe
|| logical or
&& logical AND
3.1.3 Regular operators
operator describe
~ !~ Matching regular expressions and not matching regular expressions
^ Beginning of line
$ line tail
. Any single character except a newline character
* Zero or more leading characters
.* all characters
[] Any character within the character group
[^] Negate each character in the character group (do not match every character in the character group)
^[^] Lines starting with characters other than those in the character group
[a-z] Lower case letters
[A-Z] uppercase letter
[a-Z] lowercase and uppercase letters
[0-9] number
\< Prefix words are generally separated by spaces or special characters, and consecutive character strings are treated as single
\> word ending

Note: Regular expressions need to be surrounded by /regular/

3.1.4 Relational operators
operator describe
< less than
<= less than or equal to
> more than the
>= greater or equal to
!= not equal to
== equal
3.1.5 Other operators
operator describe
$ field reference
space String concatenation
?: C conditional expression
in Whether a key value exists in the array
3.2 Function

Commonly used built-in functions:

  • tolower(): Characters are converted to lowercase.
  • length(): Returns the string length.
  • substr():Return substring.
  • sin(): sine.
  • cos(): cosine.
  • sqrt():square root.
  • rand():random number.

4. Process control statements

In the while, do-while and for statements of Linux awk, break and continue statements are allowed to be used to control the process direction, and statements such as exit are also allowed to be used to exit. break interrupts the currently executing loop and jumps outside the loop to execute the next statement. if is a process selection usage. In awk, flow control statements, grammatical structures, and c language types. With these statements, many shell programs can actually be handed over to awk, and the performance is very fast. The following is the usage of each statement.

4.1 Conditional judgment statements
4.1.1 awk conditional judgment

awkAllows you to specify output conditions and only output lines that meet the conditions.

The output condition should be written before the action.

awk '条件 动作' 文件名

Please see the example below.

awk -F ':' '/usr/ {print $1}' demo.txt
root
daemon
bin
sys

In the above code, printthe command is preceded by a regular expression, and only the contained usrlines are output.

The following example only outputs odd-numbered lines, and outputs the third and subsequent lines.

# 输出奇数行
awk -F ':' 'NR % 2 == 1 {print $1}' demo.txt
root
bin
sync

# 输出第三行以后的行
awk -F ':' 'NR >3 {print $1}' demo.txt
sys
sync

The following example prints the rows whose first field equals the specified value.

awk -F ':' '$1 == "root" {print $1}' demo.txt
root

awk -F ':' '$1 == "root" || $1 == "bin" {print $1}' demo.txt
root
bin
4.1.2 if statement

(1) Grammar

if(condition){
    
    statement;}[else statement]  双分支
if(condition1){
    
    statement1}else if(condition2){
    
    statement2}else{
    
    statement3}  多分支

(2) Usage scenario: Make conditional judgment on the entire row or a certain field obtained by awk

(3) Example

[root@along ~]# awk -F: '{if($3>10 && $3<1000)print $1,$3}' /etc/passwd
operator 11
games 1
[root@along ~]# awk -F: '{if($NF=="/bin/bash") print $1,$NF}' /etc/passwd
root /bin/bash
along /bin/bash
---输出总列数大于3的行
[root@along ~]# awk -F: '{if(NF>2) print $0}' awkdemo
linux:redhat:lalala:hahaha
along:love:you
---第3列>=1000为Common user,反之是root or Sysuser
[root@along ~]# awk -F: '{if($3>=1000) {printf "Common user: %s\n",$1} else{printf "root or Sysuser: %s\n",$1}}' /etc/passwd
root or Sysuser: root
root or Sysuser: bin
Common user: along
---磁盘利用率超过40的设备名和利用率
[root@along ~]# df -h|awk -F% '/^\/dev/{print $1}'|awk '$NF > 40{print $1,$NF}'
/dev/mapper/cl-root 43
---test=100>90为very good; 90>test>60为good; test<60为no pass
[root@along ~]# awk 'BEGIN{ test=100;if(test>90){print "very good"}else if(test>60){ print "good"}else{print "no pass"}}'
very good
[root@along ~]# awk 'BEGIN{ test=80;if(test>90){print "very good"}else if(test>60){ print "good"}else{print "no pass"}}'
good
[root@along ~]# awk 'BEGIN{ test=50;if(test>90){print "very good"}else if(test>60){ print "good"}else{print "no pass"}}'
no pass
4.2 Loop statement
4.2.1 while loop

(1) Grammar

while(condition){
    
    statement;}

Note: If the condition is "true", the loop will be entered; if the condition is "false", the loop will be exited.

(2)使用场景

对一行内的多个字段逐一类似处理时使用

对数组中的各元素逐一处理时使用

(3)示例

---以along开头的行,以:为分隔,显示每一行的每个单词和其长度
[root@along ~]# awk -F: '/^along/{i=1;while(i<=NF){print $i,length($i); i++}}' awkdemo
along 5
love 4
you 3
---以:为分隔,显示每一行的长度大于6的单词和其长度
[root@along ~]# awk -F: '{i=1;while(i<=NF) {if(length($i)>=6){print $i,length($i)}; i++}}' awkdemo
redhat 6
lalala 6
hahaha 6
---计算1+2+3+...+100=5050
[root@along ~]# awk 'BEGIN{i=1;sum=0;while(i<=100){sum+=i;i++};print sum}'
5050
4.2.2 do-while循环

(1)语法

do{
    
    statement;}while(condition)

意义:无论真假,至少执行一次循环体

(2)计算1+2+3+…+100=5050

[root@along ~]# awk 'BEGIN{sum=0;i=1;do{sum+=i;i++}while(i<=100);print sum}'
5050
4.2.3 for 循环

(1)语法

for(expr1;expr2;expr3) {
    
    statement;}

(2)特殊用法:遍历数组中的元素

for(var in array) {
    
    forbody} 

(3)示例

---显示每一行的每个单词和其长度
[root@along ~]# awk -F: '{for(i=1;i<=NF;i++) {print$i,length($i)}}' awkdemo
hello 5
world 5
linux 5
redhat 6
lalala 6
hahaha 6
along 5
love 4
you 3
---求男m、女f各自的平均
[root@along ~]# cat sort.txt
score
[m=>170]
xiaoming m 90
xiaohong f 93
xiaohei m 80
xiaofang f 99
[root@along ~]# awk '{m[$2]++;score[$2]+=$3}END{for(i in m){printf "%s:%6.2f\n",i,score[i]/m[i]}}' sort.txt
m: 85.00
f: 96.00
4.3 其他语句
  • break 当 break 语句用于 while 或 for 语句时,导致退出程序循环。

  • continue 当 continue 语句用于 while 或 for 语句时,使程序循环移动到下一个迭代。

  • next 能能够导致读入下一个输入行,并返回到脚本的顶部。这可以避免对当前输入行执行其他的操作过程。

  • exit 语句使主输入循环退出并将控制转移到END,如果END存在的话。如果没有定义END规则,或在END中应用exit语句,则终止脚本的执行。

5.awk数组

数组是awk的灵魂,处理文本中最不能少的就是它的数组处理。因为数组索引(下标)可以是数字和字符串在awk中数组叫做关联数组(associative arrays)。awk 中的数组不必提前声明,也不必声明大小。数组元素用0或空字符串来初始化,这根据上下文而定。

array[index-expression]

(1)可使用任意字符串;字符串要使用双引号括起来

(2)如果某数组元素事先不存在,在引用时,awk 会自动创建此元素,并将其值初始化为“空串”

(3)若要判断数组中是否存在某元素,要使用“index in array”格式进行遍历

(4)若要遍历数组中的每个元素,要使用for 循环**:for(var in array)** {for-body}

5.1 数组的定义

数字做数组索引(下标):

Array[1]="sun"
Array[2]="kai"

字符串做数组索引(下标):

Array["first"]="www"
Array"[last"]="name"
Array["birth"]="1987"
5.2 数组使用

(1)数组的基本使用示例

[root@along ~]# cat awkdemo2
aaa
bbbb
aaa
123
123
123
---去除重复的行
[root@along ~]# awk '!arr[$0]++' awkdemo2
aaa
bbbb
123
---打印文件内容,和该行重复第几次出现
[root@along ~]# awk '{!arr[$0]++;print $0,arr[$0]}' awkdemo2
aaa 1
bbbb 1
aaa 2
123 1
123 2
123 3

分析:把每行作为下标,第一次进来,相当于print ias…一样结果为空,打印空,!取反结果为1,打印本行,并且++变为不空,下次进来相同的行就是相同的下标,本来上次的值,!取反为空,不打印,++变为不空,所以每次重复进来的行都不打印。

(2)数组遍历

awk 关联数组 key=>value 无序

[root@along ~]# awk 'BEGIN{abc["ceo"]="along";abc["coo"]="mayun";abc["cto"]="mahuateng";for(i in abc){print i,abc[i]}}'
coo mayun
ceo along
cto mahuateng
[root@along ~]# awk '{for(i=1;i<=NF;i++)abc[$i]++}END{for(j in abc)print j,abc[j]}' awkdemo2
aaa 2
bbbb 1
123 3

6.数值和字符串的应用处理

6.1 数值的处理
  • rand():返回0和1之间一个随机数,需有个种子 srand(),没有种子,一直输出0.237788

演示:

[root@along ~]# awk 'BEGIN{print rand()}'
0.237788
[root@along ~]# awk 'BEGIN{srand(); print rand()}'
0.51692
[root@along ~]# awk 'BEGIN{srand(); print rand()}'
0.189917
---取0-50随机数
[root@along ~]# awk 'BEGIN{srand(); print int(rand()*100%50)+1}'
12
[root@along ~]# awk 'BEGIN{srand(); print int(rand()*100%50)+1}'
24
6.2 字符串的处理
  • length([s]) :返回指定字符串的长度
  • sub(r,s,[t]) :对t 字符串进行搜索r 表示的模式匹配的内容,并将第一个匹配的内容替换为s
  • gsub(r,s,[t]) :对t 字符串进行搜索r 表示的模式匹配的内容,并全部替换为s 所表示的内容
  • split(s,array,[r]) :以r 为分隔符,切割字符串s ,并将切割后的结果保存至array 所表示的数组中,第一个索引值为1, 第二个索引值为2,…

演示:

[root@along ~]# echo "2008:08:08 08:08:08" | awk 'sub(/:/,"-",$1)'
2008-08:08 08:08:08
[root@along ~]# echo "2008:08:08 08:08:08" | awk 'gsub(/:/,"-",$0)'
2008-08-08 08-08-08
[root@along ~]# echo "2008:08:08 08:08:08" | awk '{split($0,i,":")}END{for(n in i){print n,i[n]}}'
4 08
5 08
1 2008
2 08
3 08 08

7.awk脚本

(1)awk脚本编写

将awk 程序写成脚本,直接调用或执行

示例:

[root@along ~]# cat f1.awk
{
    
    if($3>=1000)print $1,$3}
[root@along ~]# cat f2.awk
#!/bin/awk -f
{
    
    if($3 >= 1000)print $1,$3}
[root@along ~]# chmod +x f2.awk
[root@along ~]# ./f2.awk -F: /etc/passwd
along 1000

(2)向awk脚本传递参数

格式:

awkfile var=value var2=value2... Inputfile

注意 :在BEGIN 过程 中不可用。直到 首行输入完成以后,变量才可用 。可以通过**-v 参数**,让awk 在执行BEGIN 之前得到变量的值。命令行中每一个指定的变量都需要一个-v。

示例:

[root@along ~]# cat test.awk
#!/bin/awk -f
{
    
    if($3 >=min && $3<=max)print $1,$3}
[root@along ~]# chmod +x test.awk
[root@along ~]# ./test.awk -F: min=100 max=200 /etc/passwd
systemd-network 192

8.awk的筛选练习

筛选给定时间范围内的日志

/var/log/的路径下复制某个日志文件,编写脚本文件来对所需要的时间范围内的日志进行指定筛选。

[root@localhost /]# cat time.log      #拷贝的日志文件的内容
[2023-07-30T17:59:05+0800] INFO === Started libdnf-0.63.0 ===
[2023-07-30T17:59:06+0800] INFO === Started libdnf-0.63.0 ===
[2023-07-30T17:59:06+0800] INFO === Started libdnf-0.63.0 ===
[2023-07-31T17:10:29+0800] INFO === Started libdnf-0.63.0 ===
[2023-07-31T17:21:57+0800] INFO === Started libdnf-0.63.0 ===
[2023-07-31T17:22:02+0800] INFO === Started libdnf-0.63.0 ===
[2023-07-31T17:22:03+0800] INFO === Started libdnf-0.63.0 ===
[2023-07-31T17:26:35+0800] INFO === Started libdnf-0.63.0 ===
[2023-07-31T17:26:46+0800] INFO === Started libdnf-0.63.0 ===
[2023-07-31T17:26:47+0800] INFO === Started libdnf-0.63.0 ===
[2023-07-31T17:26:47+0800] INFO === Started libdnf-0.63.0 ===
[2023-07-31T17:30:30+0800] INFO === Started libdnf-0.63.0 ===
[2023-07-31T17:32:07+0800] INFO === Started libdnf-0.63.0 ===
[2023-07-31T17:32:08+0800] INFO === Started libdnf-0.63.0 ===
[2023-07-31T17:32:08+0800] INFO === Started libdnf-0.63.0 ===
[2023-07-31T18:17:11+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-01T16:07:13+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T01:18:14+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T01:19:37+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T01:19:52+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T01:19:53+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T01:19:53+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T01:19:54+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T09:48:37+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T09:54:20+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T09:54:21+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T09:54:22+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T10:31:20+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T10:31:36+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T10:31:36+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T10:31:36+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T10:32:12+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T10:34:31+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T10:34:33+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T10:34:33+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T11:12:38+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T14:22:38+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-05T23:57:34+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-05T23:57:48+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-05T23:58:40+0800] INFO === Started libdnf-0.63.0 ===

awk中提供了mkrime()函数,可以将时间转化为epoch值。

[root@localhost /]# awk 'BEGIN{print mktime("2023 08 01 03 42 40")}'
1690832560
#将2023-08-01 03:42:40的时间转化为时间戳,便于进行比较 

编写脚本文件

BEGIN{
    
    
  #创建需要的比较的基准epoch值,用于筛选
  which_time1 = mktime("2023 08 02 00 00 00")
  which_time2 = mktime("2023 08 04 00 00 00")
}

{
    
    
  #取出日志中的时间字符串
  match($0,"^.*\\[(.*)\\].*",arr)
  
  #调用创建的strptime1函数将摘取出来的字符串转化为epoch值
  tmp_time = strptime1(arr[1])
  
  #将所摘取日志的时间戳与基准时间戳进行比较,符合指定范围内的内容摘取出来
  if(tmp_time > which_time1 && tmp_time < which_time2){
    
    print}
}

#创建将摘取时间进行重构的函数,返回时间戳
function strptime1(str,arr,Y,M,D,H,m,S){
    
    
  #patsplit来取时间中的数字
  patsplit(str,arr,"[0-9]{1,4}")
  Y=arr[1]
  M=arr[2]
  D=arr[3]
  H=arr[4]
  m=arr[5]
  S=arr[6]
  return mktime(sprintf("%s %s %s %s %s %s",Y,M,D,H,m,S))
}

调用脚本文件对所保存的time.log文件中所需要的内容进行定位筛选。

[root@localhost /]# awk -f awkdemo.awk time.log 
[2023-08-02T01:18:14+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T01:19:37+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T01:19:52+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T01:19:53+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T01:19:53+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T01:19:54+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T09:48:37+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T09:54:20+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T09:54:21+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T09:54:22+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T10:31:20+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T10:31:36+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T10:31:36+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T10:31:36+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T10:32:12+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T10:34:31+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T10:34:33+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T10:34:33+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T11:12:38+0800] INFO === Started libdnf-0.63.0 ===
[2023-08-02T14:22:38+0800] INFO === Started libdnf-0.63.0 ===

Guess you like

Origin blog.csdn.net/qq_44829421/article/details/132126629