awk command:
The Three Musketeers of Linux Text Processing: grep sed awk
awk gawk: report generator, formatted text output
Basic usage:
gawk [options] 'program' file file ...
The program format is: PATTERN{ACTION_STATEMENT}, ACTION_STATEMENT consists of statements, and the statement separator is;
Where ACTION contains: print, printf
# awk -F: '$3>50{print $0}' /etc/passwd output lines with uid greater than 50
Options:
-F[]: Specify the input field separator; when -F is not used, the default space is the field separator
gawk -F: '{print $1,$3}' /etc/passwd
gawk -F: '{print $1,$3,"user"}' /etc/passwd
-v var=var assign value to variable
1, awk output command print
print item1, item2, ...
Key Points:
(1) Each item is separated by a comma, and the output is separated by an output delimiter;
(2) Each output item can be a string or a numerical value, a field ($n) of the current record, a variable or an awk expression; the numerical value will be implicitly converted to a character for output;
(3) If the item after print is omitted, it is equivalent to print $0; to output "blank", use print "";
2. Variables
2.1 Built-in variables
FS: input field seperator (field separator when inputting), the default is a blank character;
gawk -v FS=":" '{print $1,$3}' /etc/passwd
RS: input record separator (the newline character of the file when inputting), the default is the newline character;
awk -v RS=" " '{print $0}' /etc/passwd
OFS: output field seperator (field separator during output), the default is a blank character;
ORS: output record separator (file newline during output), the default is a newline;
NF: number of field in current record, number of characters;
awk '{print NF}' /etc/passwd number of output fields
awk '{print $NF}' /etc/passwd print the last field
NR: The number of lines, all files are counted uniformly;
awk '{print NR}' /etc/passwd /etc/issue
awk '{print NR,$0}' /etc/passwd /etc/issue prefix each line with a line number
FNR: the number of lines, each file is counted separately;
awk '{print FNR}' /etc/passwd /etc/issue
awk '{print FNR,$0}' /etc/passwd /etc/issue
FILENAME: the current file name;
awk '{print FILENAME,$0}' /etc/passwd /etc/issue prefix each line with the filename
ARGC: the number of command line arguments;
awk '{print ARGC}' /etc/issue
ARGV: array, saves the command line parameters;
awk '{print ARGV[0]}' /etc/issue
2.2 Custom variables
-v var=val: variable names are case sensitive
Where to define the variable:
(1) Variables can be defined in the program;
awk '{file="passwd";print file,$1}' /etc/passwd
(2) Define variables through the -v option;
awk -v file="passwd" '{print file,$1}' /etc/passwd
3. printf command
Format: printf format, item1, item2, ...
awk 'BEGIN{printf "%d\n",6}'
Key Points:
(1) format is required;
(2) It will not automatically wrap, and the line separator needs to be explicitly given;
(3) In the format, a format character needs to be specified for each subsequent item;
Format characters: all start with %, followed by a character
%c: Display the ASCII code of the character;
%d, %i: display decimal integers;
%e, %E: Display values in scientific notation;
%f: display as a floating point number;
%g, %G: Display values in scientific notation or floating point format;
%s: string
%u: unsigned integer
%%: show % itself
Modifier:
#[.#]: The first # specifies the display width, such as %30s; the second # indicates the precision after the decimal point;
awk -F: '{printf "%20s:%5d\n",$1,$3}' /etc/passwd
-: left-aligned
awk -F: '{printf "%20s:%-5d\n",$1,$3}' /etc/passwd
4. Operator:
Arithmetic operators:
x+y, xy, x*y, x/y, x^y, x%y
-x: negative value
+x: convert to numeric
String manipulation: String concatenation
Assignment operator:
=, +=, -=, *=, /=, %=, ^=
++, --
Comparison operator:
>, >=, <. <=, ==, !=
Pattern matcher:
~
!~
# awk -F: '$1~/root/{print $7}' /etc/passwd make $1 match the pattern /root/
Logical operators:
&&
||
Conditional expression:
selector?if-true-expression:if-false-expression
例子:~]# awk -F: '{$3>=500?usertype="common user":usertype="sysuser or admin";printf "%20s:%-s\n",$1,usertype}' /etc/passwd
function call:
function_name(argu1,argu2,...)
5、PATTERN
(1) /regular expression/: Only process lines that can be matched by /regular expression/;
awk -F: '/^\<root\>/{print $3}' /etc/passwd
(2) relational expression: a relational expression, which can be divided into true and false. Generally speaking, when the result is a non-zero or non-empty string, it is "true", otherwise, it is "false";
awk -F: '$3>=500{print $1,$3}' /etc/passwd
awk -F: '$5~/root/{print $0}' /etc/passwd
(3) line ranges: line ranges, similar to the address delimitation method of sed or vim; startline, endline
(4) BEGIN/END: special mode, executed only once before awk runs the program (BEGIN), or only once after awk runs the program (END);
awk -F: 'BEGIN{print "username","shell\n"----------------}$7~/bash\>/{print $1,$7}END{print "--------------\n"}' /etc/passwd
awk -F: 'BEGIN{username="username";shell="shell";printf "%10s%10s\n",username,shell;print "-----------"}$7~/bash\>/{printf "%10s%10s\n",$1,$7}END{print "---------------"}' /etc/passwd
(5) empty: empty pattern, matching any line;
6. Common actions
(1) Expression
(2) Control Statement
(3) Input sentence
(4) Output statement
7. Control Statements
if (condition) statement [ else statement ]
while (condition) statement
do statement while (condition)
for (expr1; expr2; expr3) statement
for (var in array) statement
break
continue
delete array[index]
delete array
exit [ expression ]
{ statements }
7.1 if-else
Usage: Make conditional judgments on the entire line or fields in the line obtained by awk;
grammar:
if (condition) statement [ else statement ]
if (condition) { statements; } [ else { statements; }]
# awk -F: '{if ($3>=500) print $1," is a common user." }' /etc/passwd
# awk -F: '{if ($3>=500) {print $1," is a common user."} else {print $1," is a system user or admin."}}' /etc/passwd
# awk '{if (NF>6) print NF, $0 }' /etc/inittab
7.2 while loop
Usage: usually used to cycle through the fields of the current row;
Syntax: while (condition) statement
while (condition) { statements }
Loop when the condition is true, until it is false to exit;
# awk '{i=1;while(i<=NF){printf "%20s:%d\n",$i,length($i); i++}}' /etc/inittab
# awk '{i=1;while(i<=NF){if (length($i)>5) {printf "%20s:%d\n",$i,length($i);} i++}}' /etc/inittab
7.3 The do-while loop
语法:do statement while (condition)
do { do-while-body } while (condition)
Meaning: execute the loop body at least once;
7.4 for loops
Syntax: for (expr1; expr2; expr3) statement
for (expr1; expr2; expr3) { statements }
# awk '{for(i=1;i<=NF;i++) {printf "%s:%d\n", $i, length($i)}}' /etc/inittab
There is a for loop in awk dedicated to iterating over array elements:
语法:for (var in array) { for-body }
7.5 switch
语法:switch (expression) {case VALUE or /REGEXP/: statement; ...; default: statementN}
7.6 break and continue
break [n]: Exit the current loop
continue: end the current cycle early and go directly to the next cycle
7.7 next
End the processing of the current line early and proceed to the processing of the next line
~]# awk -F: '{if($3%2!=0) next;print $1,$3}' /etc/passwd
8、Array
Associative array: array[index-expression]
index-expression: any string can be used, note: if using a numeric index, the index-expression starts from 1
If an array element does not exist beforehand, awk will automatically create this element and initialize its value to an empty string when referencing it. Therefore, to determine whether an element exists in the array, use "index in array";
a[mon]="Monday"
print a[my]
To iterate over each element in the array, use: for (var in array) { for body }, note: var will iterate over every index in the array, print array[var]
Example: Count the number of occurrences of each word in each line
# awk '{for(i=1;i<=NF;i++) {count[$i]++}}END{for(j in count) {print j,count[j]}}' awk.txt
# awk '{for(i=1;i<=NF;i++) {count[$i]++};for(j in count) {print j,count[j]};for(j in count) {count[j]=""};print "---------------"}' awk.txt
# ss -tan | awk '!/^State/{state[$1]++}END{for (i in state) {print i,state[i]}}'
# netstat -tan | awk '/^tcp/{state[$NF]++}END{for(i in state){print i,state[i]}}'
Exercise: Count the number of occurrences of each IP in the httpd access log;
~]# awk '{ip[$1]++}END{for(i in ip){print i,ip[i]}}' /var/log/httpd/access_log
9. Function
9.1 Built-in functions
Numerical processing:
rand(): returns a random number between 0 and 1;
String handling:
length([s]): Returns the length of the specified string
sub(r, s [, t]): Use the pattern represented by r to find a match in the t string, and replace the first occurrence of it with the string represented by s;
sub(ab,AB,$0)
gsub(r, s [, t]): Find a match in the t string with the pattern represented by r, and replace all its occurrences with the string represented by s;
split(s, a [, r]): Cut the string s with r as the delimiter, and save the cut result to the array represented by a;
# netstat -tan | awk '/^tcp/{len=split($5,client,":");ip[client[len-1]]++}END{for(i in ip){print i,ip[i]}}'
substr(s, i [, n]): Take a substring from the string represented by s, starting from i, and take n characters;
Time class functions:
systime(): get timestamp;
Bit operation function:
and(v1,va2):
9.2 Custom Functions
function f_name(p,q)
{
...
}
Written work:
filename awkfile
++++++++++++++++
Mike Harrington:[510] 548-1278:250:100:175
Christian Dobbins:[408] 538-2358:155:90:201
Susan Dalsass:[206] 654-6279:250:60:50
Archie McNichol:[206] 548-1348:250:100:175
Jody Savage:[206] 548-1278:15:188:150
Guy Quigley:[916] 343-6410:250:100:175
Dan Savage: [406] 298-7744: 450: 300: 275
Nancy McNeil:[206] 548-1278:250:80:75
John Goldenrod:[916] 348-4278:250:100:175
Chet Main:[510] 548-5258:50:95:135
Tom Savage:[408] 926-3456:250:168:200
Elizabeth Stachelin:[916] 440-1763:175:75:300
The database above contains names, phone numbers and donations made in the past three months
1. Show all phone numbers
awk -F: '{print $2}' awkfile
2. Display Dan's phone number
awk -F: '/Dan/{print $ 2}' awkfile
3. Display Susan's name and phone number
awk -F: '/Susan/{print $1,$2}' awkfile
4. Show all surnames starting with D
awk '$2~/^D/{split($2,arr,":"); print arr[1]}' awkfile
5. Display all names starting with a C or E
awk '$1~/^[CE]/{print $1}' awkfile
6. Display all names with only four characters
awk '{if(length($1)==4) print $1}' awkfile
7. Display the names of all people with area code 916
awk -F: '$2~/^\[916\]/{print $1}' awkfile
8. Display Mike's donations. Each value is displayed with $ at the beginning. Such as $250$100$175
awk -F: '$1~/^Mike/{printf "$%s $%s $%s\n", $3, $4, $5}' awkfile
9. Display last name followed by a comma and first name like Jody, Savage
awk -F: '{split($1, arr, " "); printf "%s,%s\n", arr[1],arr[2]}' awkfile