Super detailed explanation of Linux command awk

awk

A programming language for text and data manipulation

Supplementary Note

Awk  is a programming language for processing text and data under linux/unix. Data can come from standard input (stdin), one or more files, or the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions, and is a powerful programming tool under linux/unix. It's used on the command line, but more often as a script. Awk has many built-in functions, such as arrays, functions, etc., which are the same as C language, and flexibility is the biggest advantage of awk.

awk command format and options

grammatical form

awk [options] 'script' var=value file(s)
awk [options] -f scriptfile var=value file(s)

Common command options

  • -F fs  fs specifies the input separator, fs can be a string or a regular expression, such as -F:, the default separator is consecutive spaces or tabs
  • -v var=value  Assign a user-defined variable, pass the external variable to awk
  • -f scripfile  reads awk commands from a script file
  • -m[fr] val  sets an intrinsic limit on the val value, the -mf option limits the maximum number of blocks allocated to val; the -mr option limits the maximum number of records. These two functions are extensions of the Bell Labs version of awk and are not available in standard awk.

awk modes and operations

An awk script consists of patterns and operations.

model

Mode can be any of the following:

  • /regex/: Use an extended set of wildcards.
  • Relational expressions: operate using operators, which can be comparison tests of strings or numbers.
  • Pattern matching expressions: use the operators ~(match) and !~(mismatch).
  • BEGIN statement block, pattern statement block, END statement block: see how awk works

operate

An operation consists of one or more commands, functions, and expressions, separated by newlines or semicolons, and placed within curly braces. The main parts are:

  • variable or array assignment
  • output command
  • built-in function
  • control flow statement

Basic structure of awk script

awk 'BEGIN{ print "start" } pattern{ commands } END{ print "end" }' file

An awk script usually consists of three parts: BEGIN statement block, general statement block that can use pattern matching, and END statement block. These three parts are optional. Any part can not appear in the script, and the script is usually enclosed  in single quotes, for example:

awk 'BEGIN{ i=0 } { i++ } END{ print i }' filename

How awk works

awk 'BEGIN{ commands } pattern{ commands } END{ commands }'
  • Step 1: Execute BEGIN{ commands }the statements in the statement block;
  • The second step: read a line from the file or standard input (stdin), and then execute pattern{ commands }the statement block, which scans the file line by line, repeating this process from the first line to the last line, until the file is all read.
  • Step 3: When the end of the input stream is read, END{ commands }the statement block is executed.

The BEGIN statement block is executed before  awk starts to read lines from the input stream   . This is an optional statement block. Statements such as variable initialization and printing the header of the output table can usually be written in the BEGIN statement block.

The END statement block  is executed after  awk reads all the lines from the input stream  , such as printing the analysis results of all lines, and such information summary is completed in the END statement block, which is also an optional statement block.

The common command in the pattern block  is the most important part, it is also optional. If no pattern statement block is provided, it will be executed by default { print }, that is, each line read will be printed, and the statement block will be executed for each line read by awk.

example

echo -e "A line 1\nA line 2" | awk 'BEGIN{ print "Start" } { print } END{ print "End" }'
Start
A line 1
A line 2
End

When used without parameters print, it prints the current line. When printthe parameters are separated by commas, spaces are used as delimiters when printing. Double quotes are used as splicing characters in awk's print statement block, for example:

echo | awk '{ var1="v1"; var2="v2"; var3="v3"; print var1,var2,var3; }' 
v1 v2 v3

Use double quotes:

echo | awk '{ var1="v1"; var2="v2"; var3="v3"; print var1"="var2"="var3; }'
v1=v2=v3

{ }Like a loop body, it will iterate each line in the file. Usually, the variable initialization statement (such as: i=0) and the statement for printing the header of the file are placed in the BEGIN statement block, and the printed results and other statements are placed in the END statement block.

awk built-in variables (predefined variables)

Explanation: [A][N][P][G] indicates the first tool that supports variables, [A]=awk, [N]=nawk, [P]=POSIXawk, [G]=gawk

 **$n**  当前记录的第n个字段,比如n为1表示第一个字段,n为2表示第二个字段。 
 **$0**  这个变量包含执行过程中当前行的文本内容。
[N]  **ARGC**  命令行参数的数目。
[G]  **ARGIND**  命令行中当前文件的位置(从0开始算)。
[N]  **ARGV**  包含命令行参数的数组。
[G]  **CONVFMT**  数字转换格式(默认值为%.6g)。
[P]  **ENVIRON**  环境变量关联数组。
[N]  **ERRNO**  最后一个系统错误的描述。
[G]  **FIELDWIDTHS**  字段宽度列表(用空格键分隔)。
[A]  **FILENAME**  当前输入文件的名。
[P]  **FNR**  同NR,但相对于当前文件。
[A]  **FS**  字段分隔符(默认是任何空格)。
[G]  **IGNORECASE**  如果为真,则进行忽略大小写的匹配。
[A]  **NF**  表示字段数,在执行过程中对应于当前的字段数。
[A]  **NR**  表示记录数,在执行过程中对应于当前的行号。
[A]  **OFMT**  数字的输出格式(默认值是%.6g)。
[A]  **OFS**  输出字段分隔符(默认值是一个空格)。
[A]  **ORS**  输出记录分隔符(默认值是一个换行符)。
[A]  **RS**  记录分隔符(默认是一个换行符)。
[N]  **RSTART**  由match函数所匹配的字符串的第一个位置。
[N]  **RLENGTH**  由match函数所匹配的字符串的长度。
[N]  **SUBSEP**  数组下标分隔符(默认值是34)。

escape sequence

\ \自身
$ 转义$
\t 制表符
\b 退格符
\r 回车符
\n 换行符
\c 取消换行

example

echo -e "line1 f2 f3\nline2 f4 f5\nline3 f6 f7" | awk '{print "Line No:"NR", No of fields:"NF, "$0="$0, "$1="$1, "$2="$2, "$3="$3}' 
Line No:1, No of fields:3 $0=line1 f2 f3 $1=line1 $2=f2 $3=f3
Line No:2, No of fields:3 $0=line2 f4 f5 $1=line2 $2=f4 $3=f5
Line No:3, No of fields:3 $0=line3 f6 f7 $1=line3 $2=f6 $3=f7

Use print $NFto print the last field in a row, use $(NF-1)to print the penultimate field, and so on:

echo -e "line1 f2 f3\n line2 f4 f5" | awk '{print $NF}'
f3
f5
echo -e "line1 f2 f3\n line2 f4 f5" | awk '{print $(NF-1)}'
f2
f4

Print the second and third fields of each line:

awk '{ print $2,$3 }' filename

Count the number of lines in a file:

awk 'END{ print NR }' filename

The above command only uses the END statement block. When each line is read, awk will update NR to the corresponding line number. When the last line is reached, the value of NR is the line number of the last line, so the NR in the END statement block is The number of lines in the file.

An example of accumulating the first field value in each row:

seq 5 | awk 'BEGIN{ sum=0; print "总和:" } { print $1"+"; sum+=$1 } END{ print "等于"; print sum }' 
总和:
1+
2+
3+
4+
5+
等于
15

Pass external variable value to awk

 External values ​​(not from stdin) can be passed to awk with the help  -vof options :

VAR=10000
echo | awk -v VARIABLE=$VAR '{ print VARIABLE }'

Another way to pass external variables:

var1="aaa"
var2="bbb"
echo | awk '{ print v1,v2 }' v1=$var1 v2=$var2

Use when input is from a file:

awk '{ print v1,v2 }' v1=$var1 v2=$var2 filename

In the above method, the variables are separated by spaces as the command line parameters of awk followed by the BEGIN, {} and END statement blocks.

Find the process pid

netstat -antup | grep 7770 | awk '{ print $NF NR}' | awk '{ print $1}'

awk operation and judgment

As one of the characteristics that a programming language should have, awk supports a variety of operations, which are basically the same as those provided by the C language. Awk also provides a series of built-in operation functions (such as log, sqr, cos, sin, etc.) and some functions (such as length, substr, etc.) for string operations (operations). The reference of these functions has greatly improved the operation function of awk. As a part of conditional transfer instructions, relationship judgment is a function of every programming language, and awk is no exception. Awk allows multiple tests. As a style match, it also provides pattern matching expressions (matching) and ! (Mismatch). As an extension to testing, awk also supports logical operators.

arithmetic operator

operator describe
+ - add, subtract
* / & Multiply, Divide and Remainder
+ - ! Unary addition, subtraction and logical not
^ *** exponentiation
++ – increase or decrease, as a prefix or suffix

example:

awk 'BEGIN{a="b";print a++,++a;}'
0 2

Note: All operations are used as arithmetic operators, the operands are automatically converted to values, and all non-values ​​are turned into 0

assignment operator

operator describe
= += -= *= /= %= ^= **= assignment statement

example:

a+=5; 等价于:a=a+5; 其它同类

Logical Operators

operator describe
`
&& logic and

example:

awk 'BEGIN{a=1;b=2;print (a>5 && b<=2),(a>5 || b<=2);}'
0 1

regular operator

operator describe
~ !~ Matching regular expressions and not matching regular expressions
^ 行首
$ 行尾
. 除了换行符以外的任意单个字符
* 前导字符的零个或多个
.* 所有字符
[] 字符组内的任一字符
[^]对字符组内的每个字符取反(不匹配字符组内的每个字符)
^[^] 非字符组内的字符开头的行
[a-z] 小写字母
[A-Z] 大写字母
[a-Z] 小写和大写字母
[0-9] 数字
< 单词头单词一般以空格或特殊字符做分隔,连续的字符串被当做单词
> 单词尾

Regex needs to be surrounded by /regular/

example:

awk 'BEGIN{a="100testa";if(a ~ /^100*/){print "ok";}}'
ok

relational operator

operator describe
< <= > >= != == relational operator

example:

awk 'BEGIN{a=11;if(a >= 9){print "ok";}}'
ok

Note: > < can be used as a string comparison or a numerical comparison. The key is that if the operand is a string, it will be converted to a string comparison. Both are numbers before converting to numerical comparison. String comparison: compare in ASCII order.

other operators

operator describe
$ field reference
space string concatenation
?: C conditional expressions
in Whether a key value exists in the array

example:

awk 'BEGIN{a="b";print a=="b"?"ok":"err";}'
ok
awk 'BEGIN{a="b";arr[0]="b";arr[1]="c";print (a in arr);}'
0
awk 'BEGIN{a="b";arr[0]="b";arr["b"]="c";print (a in arr);}'
1

Operational Priority Table

!The higher the level, the higher the priority
The higher the level, the higher the priority

awk advanced input and output

read next record

nextStatement usage in awk : match line by line in the loop, if next is encountered, the current line will be skipped, and the following statement will be ignored directly. Instead, the next line is matched. The next statement is generally used for multi-line merging:

cat text.txt
a
b
c
d
e

awk 'NR%2==1{next}{print NR,$0;}' text.txt
2 b
4 d

When the record line number is divided by 2 and the remainder is 1, the current line is skipped. The following print NR,$0will not be executed either. The next line starts, and the program has a start judgment NR%2value. At this time, if the line number is recorded :2 , the following statement block will be executed:'print NR,$0'

The analysis found that the line containing "web" needs to be skipped, and then the content needs to be merged into one line with the following line:

cat text.txt
web01[192.168.2.100]
httpd            ok
tomcat               ok
sendmail               ok
web02[192.168.2.101]
httpd            ok
postfix               ok
web03[192.168.2.102]
mysqld            ok
httpd               ok
0
awk '/^web/{T=$0;next;}{print T":"t,$0;}' text.txt
web01[192.168.2.100]:   httpd            ok
web01[192.168.2.100]:   tomcat               ok
web01[192.168.2.100]:   sendmail               ok
web02[192.168.2.101]:   httpd            ok
web02[192.168.2.101]:   postfix               ok
web03[192.168.2.102]:   mysqld            ok
web03[192.168.2.102]:   httpd               ok

Simply read a record

awk getlineUsage: Required for output redirection getline函数. getline takes input from standard input, a pipe, or another input file other than the one currently being processed. It is responsible for getting the content of the next line from the input and assigning values ​​to built-in variables such as NF, NR and FNR. The getline function returns 1 if a record is obtained, 0 if the end of the file is reached, and -1 if an error occurs, such as a failure to open the file.

getline syntax: getline var, the variable var contains the content of a specific line.

Awk getline as a whole, usage instructions:

  • When there is no redirection character |or around it <:  getline acts on the current file, and reads the first line of the current file to the following variable varor $0(no variable), it should be noted that because awk has read a line before processing getline , so the return result obtained by getline is interlaced.
  • When there are redirection characters |or around it <:  getline acts on the directed input file. Since the file is just opened, it is not read into a line by awk, but getline reads in, so getline returns the first line of the file. rather than interlaced.

Example:

Execute the linux datecommand and output it through the pipeline getline, then assign the output to the custom variable out, and print it:

awk 'BEGIN{ "date" | getline out; print out }' test

Execute the date command of the shell and output it to getline through the pipeline, then getline reads from the pipeline and assigns the input to out, the split function converts the variable out into an array mon, and then prints the second element of the array mon:

awk 'BEGIN{ "date" | getline out; split(out,mon); print mon[2] }' test

The output of the command ls is passed to geline as input, and the loop causes getline to read a line from the output of ls and print it to the screen. There is no input file here, because the BEGIN block executes before opening the input file, so the input file can be ignored.

awk 'BEGIN{ while( "ls" | getline) print }'

close file

Awk allows an input or output file to be closed in a program by using the awk close statement.

close("filename")

filename can be the file opened by getline, or stdin, a variable containing the filename, or the exact command used by getline. or an output file, which can be stdout, a variable containing the filename or the exact command using a pipe.

output to a file

Awk allows outputting results to a file in the following ways:

echo | awk '{printf("hello word!n") > "datafile"}'
# 或
echo | awk '{printf("hello word!n") >> "datafile"}'

set field delimiter

The default field delimiter is whitespace, you can -F "定界符" specify a delimiter explicitly with:

awk -F: '{ print $NF }' /etc/passwd
# 或
awk 'BEGIN{ FS=":" } { print $NF }' /etc/passwd

In BEGIN语句块, you can use to OFS=“定界符”set the delimiter of the output field.

flow control statement

In the while, do-while and for statements of linux awk, break and continue statements are allowed to control the flow direction, and statements such as exit are also allowed to exit. break interrupts the currently executing loop and jumps outside the loop to execute the next statement. if is process selection usage. In awk, flow control statements, grammatical structures, and c language types. With these statements, many shell programs can be handed over to awk, and the performance is very fast. The following is the usage of each statement.

conditional statement

if(表达式)
  语句1
else
  语句2

Statement 1 in the format can be multiple statements. For the convenience of judgment and reading, it is better to enclose multiple statements with {}. The awk branch structure allows nesting, and its format is:

if(表达式)
  {语句1}
else if(表达式)
  {语句2}
else
  {语句3}

Example:

awk 'BEGIN{
test=100;
if(test>90){
  print "very good";
  }
  else if(test>60){
    print "good";
  }
  else{
    print "no pass";
  }
}'

very good

Each command statement can be terminated with ; a semicolon  .

loop statement

# while statement

while(表达式)
  {语句}

Example:

awk 'BEGIN{
test=100;
total=0;
while(i<=test){
  total+=i;
  i++;
}
print total;
}'
5050

# for loop

There are two formats for for loops:

Format 1:

for(变量 in 数组)
  {语句}

Example:

awk 'BEGIN{
for(k in ENVIRON){
  print k"="ENVIRON[k];
}

}'
TERM=linux
G_BROKEN_FILENAMES=1
SHLVL=1
pwd=/root/text
...
logname=root
HOME=/root
SSH_CLIENT=192.168.1.21 53087 22

Note: ENVIRON is an awk constant and is a subtypical array.

Format 2:

for(变量;条件;表达式)
  {语句}

Example:

awk 'BEGIN{
total=0;
for(i=0;i<=100;i++){
  total+=i;
}
print total;
}'
5050

# do loop

do
{语句} while(条件)

example:

awk 'BEGIN{ 
total=0;
i=0;
do {total+=i;i++;} while(i<=100)
  print total;
}'
5050

other statements

  • break  When the break statement is used with a while or for statement, it causes the program loop to exit.
  • continue  When the continue statement is used with a while or for statement, causes the program loop to move to the next iteration.
  • next  can cause the next input line to be read and return to the top of the script. This avoids performing additional operations on the current input line.
  • The exit  statement causes the main input loop to exit and transfers control to END, if END exists. If no END rule is defined, or an exit statement is applied within END, execution of the script is terminated.

Array application

Arrays are the soul of awk, and the most important thing in processing text is its array processing. Because array indices (subscripts) can be numbers and strings, arrays in awk are called associative arrays. Arrays in awk do not have to be declared ahead of time, nor do they have to declare their size. Array elements are initialized with 0 or an empty string, depending on the context.

array definition

Numbers as array indices (subscripts):

Array[1]="sun"
Array[2]="kai"

String as array index (subscript):

Array["first"]="www"
Array"[last"]="name"
Array["birth"]="1987"

When using print Array[1]it will print out sun; using print Array[2]it will print out kai; using it print["birth"]will get 1987.

read the value of the array

{ for(item in array) {print array[item]}; }       #输出的顺序是随机的
{ for(i=1;i<=len;i++) {print array[i]}; }         #Len是数组的长度

Array related functions

Get the length of the array:

awk 'BEGIN{info="it is a test";lens=split(info,tA," ");print length(tA),lens;}'
4 4

length returns the string and the length of the array, split splits the string into an array, and returns the length of the array obtained by splitting.

awk 'BEGIN{info="it is a test";split(info,tA," ");print asort(tA);}'
4

asort sorts an array and returns the length of the array.

Output array content (unordered, ordered output):

awk 'BEGIN{info="it is a test";split(info,tA," ");for(k in tA){print k,tA[k];}}'
4 test
1 it
2 is
3 a 

for…inOutput, since arrays are associative arrays, are unordered by default. So by for…ingetting an unordered array. If you need to get an ordered array, you need to get it by subscript.

awk 'BEGIN{info="it is a test";tlen=split(info,tA," ");for(k=1;k<=tlen;k++){print k,tA[k];}}'
1 it
2 is
3 a
4 test

Note: The array subscript starts from 1, which is different from the C array.

Determine the existence of the key value and delete the key value:

# 错误的判断方法:
awk 'BEGIN{tB["a"]="a1";tB["b"]="b1";if(tB["c"]!="1"){print "no found";};for(k in tB){print k,tB[k];}}' 
no found
a a1
b b1
c

There is a strange problem above, tB[“c”]there is no definition, but when looping, it is found that the key value already exists, and its value is empty. It should be noted here that the awk array is an associative array. As long as its key is referenced through the array, the sequence will be automatically created.

# 正确判断方法:
awk 'BEGIN{tB["a"]="a1";tB["b"]="b1";if( "c" in tB){print "ok";};for(k in tB){print k,tB[k];}}'  
a a1
b b1

if(key in array)This method is used to determine whether the key value is contained in the array key.

#删除键值:
awk 'BEGIN{tB["a"]="a1";tB["b"]="b1";delete tB["a"];for(k in tB){print k,tB[k];}}'                     
b b1

delete array[key]Can be deleted, corresponding to keythe sequence value of the array.

Use of two-dimensional and multidimensional arrays

Awk's multidimensional array is essentially a one-dimensional array. More precisely, awk does not support multidimensional arrays in storage. Awk provides access methods that logically simulate two-dimensional arrays. For example, array[2,4]=1such access is permitted. Awk uses a special string SUBSEP(�34)as the split field. In the above example, the key value stored in the associative array array is actually 2.344.

Similar to the membership test of a one-dimensional array, multidimensional arrays can use if ( (i,j) in array)this syntax, but the subscript must be placed in parentheses. Similar to iterating over one-dimensional arrays, multidimensional arrays use for ( item in array )this syntax to traverse the array. Unlike one-dimensional arrays, multidimensional arrays must use split()functions to access individual subscripted components.

awk 'BEGIN{
for(i=1;i<=9;i++){
  for(j=1;j<=9;j++){
    tarr[i,j]=i*j; print i,"*",j,"=",tarr[i,j];
  }
}
}'
1 * 1 = 1
1 * 2 = 2
1 * 3 = 3
1 * 4 = 4
1 * 5 = 5
1 * 6 = 6 
...
9 * 6 = 54
9 * 7 = 63
9 * 8 = 72
9 * 9 = 81

The array contents can be obtained by array[k,k2]reference.

Another way:

awk 'BEGIN{
for(i=1;i<=9;i++){
  for(j=1;j<=9;j++){
    tarr[i,j]=i*j;
  }
}
for(m in tarr){
  split(m,tarr2,SUBSEP); print tarr2[1],"*",tarr2[2],"=",tarr[m];
}
}'

built-in function

Awk's built-in functions are mainly divided into the following three types: arithmetic functions, string functions, other general functions, and time functions.

arithmetic function

Format describe
atan2( y, x ) Returns the arctangent of y/x.
cos( x ) Returns the cosine of x; x is in radians.
sin( x ) Returns the sine of x; x is radians.
exp( x ) Returns the power of x function.
log( x ) Returns the natural logarithm of x.
sqrt( x ) Returns the square root of x.
int( x ) Returns the value of x truncated to an integer.
rand( ) Return any number n where 0 <= n < 1.
srand( [expr] ) Sets the rand function's seed value to the value of the Expr argument, or a time of day if the Expr argument is omitted. Returns the previous seed value.

for example:

awk 'BEGIN{OFMT="%.3f";fs=sin(1);fe=exp(10);fl=log(10);fi=int(3.1415);print fs,fe,fl,fi;}'
0.841 22026.466 2.303 3

OFMT sets the output data format to retain 3 decimal places.

Get a random number:

awk 'BEGIN{srand();fr=int(100*rand());print fr;}'
78
awk 'BEGIN{srand();fr=int(100*rand());print fr;}'
31
awk 'BEGIN{srand();fr=int(100*rand());print fr;}'
41 

string functions

Format describe
gsub( Ere, Repl, [ In ] ) It performs exactly like the sub function except that all occurrences of the regular expression are substituted.
sub( Ere, Repl, [ In ] ) Replaces the first occurrence of the extended regular expression specified by the Ere parameter in the string specified by the In parameter with the string specified by the Repl parameter. The sub function returns the number of substitutions. The ampersand (ampersand) that occurs in the string specified by the Repl parameter is replaced by the string specified by the In parameter that matches the extended regular expression specified by the Ere parameter. If the In parameter is not specified, the default is the entire record ($0 record variable).
index( String1, String2 ) Returns the position, numbered starting from 1, within the string specified by the String1 parameter in which the parameter specified by String2 occurs. Returns 0 (zero) if the String2 parameter does not appear in the String1 parameter.
length [(String)] Returns the length, in characters, of the string specified by the String parameter. If no String parameter is given, the length of the entire record is returned ($0 record variable).
blength [(String)] 返回 String 参数指定的字符串的长度(以字节为单位)。如果未给出 String 参数,则返回整个记录的长度($0 记录变量)。
substr( String, M, [ N ] ) 返回具有 N 参数指定的字符数量子串。子串从 String 参数指定的字符串取得,其字符以 M 参数指定的位置开始。M 参数指定为将 String 参数中的第一个字符作为编号 1。如果未指定 N 参数,则子串的长度将是 M 参数指定的位置到 String 参数的末尾 的长度。
match( String, Ere ) 在 String 参数指定的字符串(Ere 参数指定的扩展正则表达式出现在其中)中返回位置(字符形式),从 1 开始编号,或如果 Ere 参数不出现,则返回 0(零)。RSTART 特殊变量设置为返回值。RLENGTH 特殊变量设置为匹配的字符串的长度,或如果未找到任何匹配,则设置为 -1(负一)。
split( String, A, [Ere] ) 将 String 参数指定的参数分割为数组元素 A[1], A[2], . . ., A[n],并返回 n 变量的值。此分隔可以通过 Ere 参数指定的扩展正则表达式进行,或用当前字段分隔符(FS 特殊变量)来进行(如果没有给出 Ere 参数)。除非上下文指明特定的元素还应具有一个数字值,否则 A 数组中的元素用字符串值来创建。
tolower( String ) 返回 String 参数指定的字符串,字符串中每个大写字符将更改为小写。大写和小写的映射由当前语言环境的 LC_CTYPE 范畴定义。
toupper( String ) 返回 String 参数指定的字符串,字符串中每个小写字符将更改为大写。大写和小写的映射由当前语言环境的 LC_CTYPE 范畴定义。
sprintf(Format, Expr, Expr, . . . ) 根据 Format 参数指定的 printf 子例程格式字符串来格式化 Expr 参数指定的表达式并返回最后生成的字符串。

注:Ere都可以是正则表达式。

gsub,sub使用

awk 'BEGIN{info="this is a test2010test!";gsub(/[0-9]+/,"!",info);print info}'
this is a test!test!

在 info中查找满足正则表达式,/[0-9]+/ 用””替换,并且替换后的值,赋值给info 未给info值,默认是$0

查找字符串(index使用)

awk 'BEGIN{info="this is a test2010test!";print index(info,"test")?"ok":"no found";}'
ok

未找到,返回0

正则表达式匹配查找(match使用)

awk 'BEGIN{info="this is a test2010test!";print match(info,/[0-9]+/)?"ok":"no found";}'
ok

截取字符串(substr使用)

[wangsl@centos5 ~]$ awk 'BEGIN{info="this is a test2010test!";print substr(info,4,10);}'
s is a tes

从第 4个 字符开始,截取10个长度字符串

字符串分割(split使用)

awk 'BEGIN{info="this is a test";split(info,tA," ");print length(tA);for(k in tA){print k,tA[k];}}'
4
4 test
1 this
2 is
3 a

分割info,动态创建数组tA,这里比较有意思,awk for …in循环,是一个无序的循环。 并不是从数组下标1…n ,因此使用时候需要注意。

格式化字符串输出(sprintf使用)

格式化字符串格式:

其中格式化字符串包括两部分内容:一部分是正常字符,这些字符将按原样输出; 另一部分是格式化规定字符,以"%"开始,后跟一个或几个规定字符,用来确定输出内容格式。

格式 描述 格式 描述
%d 十进制有符号整数 %u 十进制无符号整数
%f 浮点数 %s 字符串
%c 单个字符 %p 指针的值
%e 指数形式的浮点数 %x %X 无符号以十六进制表示的整数
%o 无符号以八进制表示的整数 %g 自动选择合适的表示法
awk 'BEGIN{n1=124.113;n2=-1.224;n3=1.2345; printf("%.2f,%.2u,%.2g,%X,%on",n1,n2,n3,n1,n1);}'
124.11,18446744073709551615,1.2,7C,174

一般函数

格式 描述
close( Expression ) 用同一个带字符串值的 Expression 参数来关闭由 print 或 printf 语句打开的或调用 getline 函数打开的文件或管道。如果文件或管道成功关闭,则返回 0;其它情况下返回非零值。如果打算写一个文件,并稍后在同一个程序中读取文件,则 close 语句是必需的。
system(command ) 执行 Command 参数指定的命令,并返回退出状态。等同于 system 子例程。
Expression ` ` getline [ Variable ]
getline [ Variable ] < Expression 从 Expression 参数指定的文件读取输入的下一个记录,并将 Variable 参数指定的变量设置为该记录的值。只要流保留打开且 Expression 参数对同一个字符串求值,则对 getline 函数的每次后续调用读取另一个记录。如果未指定 Variable 参数,则 $0 记录变量和 NF 特殊变量设置为从流读取的记录。
getline [ Variable ] 将 Variable 参数指定的变量设置为从当前输入文件读取的下一个输入记录。如果未指定 Variable 参数,则 $0 记录变量设置为该记录的值,还将设置 NF、NR 和 FNR 特殊变量。

打开外部文件(close用法)

awk 'BEGIN{while("cat /etc/passwd"|getline){print $0;};close("/etc/passwd");}'
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin

逐行读取外部文件(getline使用方法)

awk 'BEGIN{while(getline < "/etc/passwd"){print $0;};close("/etc/passwd");}'
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
awk 'BEGIN{print "Enter your name:";getline name;print name;}'
Enter your name:
chengmo
chengmo

调用外部应用程序(system使用方法)

awk 'BEGIN{b=system("ls -al");print b;}'
total 42092
drwxr-xr-x 14 chengmo chengmo     4096 09-30 17:47 .
drwxr-xr-x 95 root   root       4096 10-08 14:01 ..

b返回值,是执行结果。

时间函数

格式 描述
函数名 说明
mktime( YYYY MM dd HH MM ss[ DST]) 生成时间格式
strftime([format [, timestamp]]) 格式化时间输出,将时间戳转为时间字符串具体格式,见下表。
systime() 得到时间戳,返回从1970年1月1日开始到当前时间(不计闰年)的整秒数

建指定时间(mktime使用)

awk 'BEGIN{tstamp=mktime("2001 01 01 12 12 12");print strftime("%c",tstamp);}'
2001年01月01日 星期一 12时12分12秒
awk 'BEGIN{tstamp1=mktime("2001 01 01 12 12 12");tstamp2=mktime("2001 02 01 0 0 0");print tstamp2-tstamp1;}'
2634468

求2个时间段中间时间差,介绍了strftime使用方法

awk 'BEGIN{tstamp1=mktime("2001 01 01 12 12 12");tstamp2=systime();print tstamp2-tstamp1;}' 
308201392

strftime日期和时间格式说明符

格式 描述
%a 星期几的缩写(Sun)
%A 星期几的完整写法(Sunday)
%b 月名的缩写(Oct)
%B 月名的完整写法(October)
%c 本地日期和时间
%d 十进制日期
%D 日期 08/20/99
%e 日期,如果只有一位会补上一个空格
%H 用十进制表示24小时格式的小时
%I 用十进制表示12小时格式的小时
%j 从1月1日起一年中的第几天
%m 十进制表示的月份
%M 十进制表示的分钟
%p 12小时表示法(AM/PM)
%S 十进制表示的秒
%U 十进制表示的一年中的第几个星期(星期天作为一个星期的开始)
%w 十进制表示的星期几(星期天是0)
%W 十进制表示的一年中的第几个星期(星期一作为一个星期的开始)
%x 重新设置本地日期(08/20/99)
%X 重新设置本地时间(12:00:00)
%y 两位数字表示的年(99)
%Y 当前月份
%% 百分号(%)

Guess you like

Origin blog.csdn.net/u012581020/article/details/131435946