Article Directory
Linux in both a command, is a programming language for processing data and generating reports. awk composed of three names, no special significance in itself.
grammar
awk [options] '[pattern]{actions}' [var1=value1 var2=value2 ...] file ...
awk [options] -f scripts_filename [var1=value1 var2=value2 ...] file ...
options Options
-F
: Field separator, character segmentation, default blank characters (including spaces, tabs, etc.)-f SCRIPTS_FILE
: For performing the specified script file-v
: Variable declarationawk -v a=1 -v b=2 '{print a,b}'
pattern mode
BEGIN
- Specified actions that occur before the first record is processed
- Initialization statement for defining, built-in or user variables, initialization format
END
- Specifies the action occurring after the last record to be processed
- Output for summary
- Processing (empty)
- Application instructions for all rows
/RegExp/
: Dealing with regular expressions to match lines/RegExp1/, /RegExp2/
: A row of the specified range, such as/^a/, /^b/
- Conditional expression
- Relational operators:>, <,> =, <=, ==, =, a string or number of standard plastic!
- Regular expression matching: ~, ~, like!
awk -F: ‘$1 ~ /^a.*e$/{print $1,$3}’
action operation
- Variable or array assignment
var=value
name="alice"
- Array
a["age"]="haha"
a[1]="xxx"
- Formatted output
- A specified field
$N
expressed$0
as a whole line, does not require additional reference variables$
pirnt
printf
: C language usage with almost can be used%s
,%d
such as placeholders
- A specified field
- Built-in functions
sub
substr
- Wait
- Control flow commands
if/else
for
while
awk script file format
-
Line multiple statements separated by semicolons:
action1;action2
-
Each statement on separate lines, do not want to be separated by semicolons needed
-
If the operation followed a pattern, it must be left brace on the same line with the pattern (like the habit of writing java)
BEGIN { action1 action2 }
-
Comment:
#
Records and fields
- Record definition:RecordEach line in the file is referred to as a record, separated by line feed
- Structure of the input data has a fixed format
- Not endless string
- Built-in variables:
$0
: The contents of the current row, the contents of the current recordORS
: Output record separator, an output record delimiter, defaults to wrapRS
: Record separator, record delimiter, defaults to wrapNR
: Number of records, the current processThe total record numberFNR
: File's number of record, currently being processedRecord number of the file, Awk can handle a plurality of files, a new file is read whenFNR
re-count, soFNR
≤NR
- Field Definitions:Field, Each record consisting of a plurality of fields, separated by delimiter between fields
- Built-in variables:
OFS
: Output field separator, the output field delimiter, the default is a spaceFS
: Field separator, field separator, default whitespaceNF
: Number of fields, number of fields in the current record
- Field division:
- By -F option to modify
FS
the means:awk -F: ‘{print $1}’
- Using multiple field separator, the delimiter placed in []:
echo -e “a:b\tc” | awk -F’[:\t]’ ‘{print $1, $3}’
- By -F option to modify
- Built-in variables:
Formatted output
- Parameters may be variable, the calculation expression, a string variable
- String must beDouble quotesQuotes:
echo 300 2 3 4 | awk '{print "hello" }'
- Between parameters separated by commas:
echo 300 2 3 4 | awk '{print $1,$2 }'
- Output can be redirected:
echo 300 2 3 4 | awk '$1 * $2 > 500{print $3+$4 >> "/tmp/test" }'
- Input and output pipe through:
echo 300 2 3 4 | awk '$1 * $2 > 500{print $3+$4 | "grep 7" }'
- Note that this is the result of all the operation via conduit
- Escape Sequences
\n
: Newline\t
:Tabs\r
: Enter\047
: 47 octal value
printf
-
Returns a string with a standard output format: f, format
-
End of the line does not wrap
-
Comprising adding a "" control string
-
Modifiers
-
: Left-justified, right-justified by default+
: Digital plus and minus signs (+, -)
-
Description format conversion specifier
%
%c
: Character, character%s
: String, string%d
: Digit, integer%f
: Fload, float
[root@localhost ~]# echo alice 23 01 |awk ' {printf "The name is: %-15s ID is %8d\n",$1,$3}'
The name is: alice ID is 1
Exercise
- test file contents:
Mary 2143 78 84 77
Jack 2321 66 78 45
Tom 2122 48 77 71
Mike 2537 87 97 95
Bob 2415 40 57 62
- Output results table:
Lineno. Name No. Math English Computer Total
------------------------------------------------------------
1 Mary 2143 78 84 77 239
2 Jack 2321 66 78 45 189
3 Tom 2122 48 77 71 196
4 Mike 2537 87 97 95 279
5 Bob 2415 40 57 62 159
------------------------------------------------------------
Total: 319 393 350
Avg: 63.8 78.6 70
- by
.awk
BEGIN {
printf "%-10s%-10s%-10s%-10s%-10s%-10s%-10s\n","LNO","Name","No","Math","English","Computer","Total"
printf "------------------------------------------------------------------\n"
}
{
math+=$3;english+=$4;com+=$5;printf "%-10s%-10s%-10s%-10s%-10s%-10s%-10s\n",NR,$1,$2,$3,$4,$5,$3+$4+$5
}
END {
printf "------------------------------------------------------------------\n"
printf "%-30s%-10s%-10s%-10s\n","Total:",math,english,com
printf "%-30s%-10s%-10s%-10s\n","Avg:",math/NR,english/NR,com/NR
}
- By command
awk 'BEGIN{math=0;eng=0;com=0;printf "Lineno. Name No. Math English Computer Total\n";printf "------------------------------------------------------------\n"}{math+=$3; eng+=$4; com+=$5;printf "%-8s %-7s %-7s %-7s %-9s %-10s %-7s \n",NR,$1,$2,$3,$4,$5,$3+$4+$5} END{printf "------------------------------------------------------------\n";printf "%-24s %-7s %-9s %-20s \n","Total:",math,eng,com;printf "%-24s %-7s %-9s %-20s \n","Avg:",math/NR,eng/NR,com/NR}' test
Programming structure
Relational expression
< , <= , >, >= , == , !=
~ 和 !~
: Match / mismatch operator for expression to record or field matchesawk '$1 ~ [Ll]ove{printf }' file
awk '$1 !~ [Ll]ove' file
Composite mode
&&
: Logic andecho | awk 'a>b && a!=0 {print a}' a=2 b=1
||
: Logical orecho | awk 'a>b || a!=0 {print a}' a=1 b=2
!
: Logical NOTecho | awk '! a!=0 {print a}' a=0 b=2
Conditions assignment
var=(EXP)?var1:var2
: In fact, three head expression- echo| awk’{max=(a>b)? a:b; print max}’ a=1 b=2
Range mode
/RegExp1/,/RegExp2/
awk '/^a/,/^b/{print $0}' file
Arithmetic
+,-,*,/,%,^或**
: Addition, subtraction, multiplication, division, remainder, exponentiation++,--
: Plus or minus one yuan=,+=,-=,*=,/=,%=,^=,**=
: Assignment Operators
function
Built-in functions
String Functions
-
sub(/RegExp/,"STR1"[,"STR2"]) FILENAME
: Substitute, replace- Find the record can match the regular expressionLeftmost longest substringFollowed by replacement string
STR1
substring replace the found - If the target string is specified on the search to replace the target string is not specified, the current record deal
- Only appears for each lineFirst matchReplacement
- Example: search STR2 RegExp content matching replaced STR1
echo "abca" | awk '{sub(/a/,"b");print}'
echo "ab:ca" | awk -F: '{sub(/a/,"b",$1);print}'
- Find the record can match the regular expressionLeftmost longest substringFollowed by replacement string
-
gsub(/RegExp/,"STR1"[,"STR2"]) FILENAME
: Global substitute, global replacement, usage, andsub
the same -
index("字符串","查找子串")
: Returns a string substringThe first timePosition, starting from a calculated index number, returns 0 if noecho "ab:cd" | awk -F: '{print index($1,"a")}'
echo "ab:cd" | awk -F: '{print index($1,"b")}'
-
substr("字符串",开始位置,长度)
: Returns a string substring starting from the specified positionecho "abcdef" | awk '{print substr($0,1,3)}'
-
match("字符串",/正则表达式/)
: Returns the position of the regular expression appearing in the string, it returns 0 if it does not occur-
Built-in variables
RSTART
of the matched substring starting position in the string,Subscript starts at 1 -
Built-in variables
RLENGTH
for the substring length -
By
substr
string matching the extracted sub -
echo "abcdefABC" | awk '{match($0,/[[:upper:]]+/);print RSTART,RLENGTH;print substr($0,RSTART,RLENGTH)}'
[root@localhost ~]# echo "abcdefABC" | awk '{match($0,/[[:upper:]]+/);print RSTART,RLENGTH;print substr($0,RSTART,RLENGTH)}' 7 3 ABC
-
-
split("字符串",数组名,"字段分隔符")
: Third argument specifies the field separator splits the string into an arrayecho "18/01/31" | awk '{split($0,date,"/");print date[1],date[2],date[3]}'
-
length("字符串")
: Returns the length of the string (in characters) -
blength("字符串")
: Returns the length of the string (in bytes) -
tolower("字符串")
: Each string of characters will be uppercase to lowercase -
toupper("字符串")
: Each string of characters changed to uppercase lowercase
Time Functions
-
mktime( YYYY MM DD HH MM SS)
: Time stamp is generated by a string of numbers representedawk 'BEGIN{tstamp=mktime("2013 01 04 12 12 12"); print tstamp}'
[root@localhost ~]# awk 'BEGIN{tstamp=mktime("2013 01 04 12 12 12"); print tstamp;}' 1357319532
-
systime()
: Get the timestamp, returns the integer number of seconds since January 1, 1970 start to the current time (not counting leap years) of -
strftime([格式 [, 时间戳]])
: Output Formatting time, the time stamp into a string, a time stamp representing the current time is omitted[root@localhost ~]# awk 'BEGIN{tstamp=mktime("2013 01 04 12 12 12"); print strftime("%c", tstamp);}' 2013年01月04日 星期五 12时12分12秒 [root@localhost ~]# awk 'BEGIN{tstamp=mktime("2013 01 04 12 12 12"); print strftime("%c");}' 2020年03月24日 星期二 05时29分42秒 [root@localhost ~]# awk 'BEGIN{tstamp=mktime("2013 01 04 12 12 12"); print strftime();}' 二 3月 24 05:32:04 EDT 2020
Arithmetic function
int(x)
: ToSeirand()
: Returns the random number n, where 0 ≤ n <1sqrt(x)
:square root
Other functions
sytem("CMD")
: Call the system command execution
Custom Functions
# 格式
function_name(参数1,参数2,...){
statements
return expression
}
variable
User-defined variables
- Variable names consist of letters, numbers, underscores, can not start with a number
- awk can be deduced from the context of its data type
- Included in the string variable to "" in
Built-in variables
Attributes | Explanation |
---|---|
$0 | The current record (as a single variable) |
$1~$n | N-th current record fields, fields are separated by FS between |
FS | The default input field separator is a space |
OFS | Default output field separator is a space |
RS | Input record separator, default newline |
ORS | Output record separator, default newline |
NF | The number of fields in the current record is the number of columns |
NO | The number of records have been read out, is the line number, starting at 1 |
FNR | The current record file number |
ARGC | The number of command line parameters |
ARGV | An array of command line arguments |
FILENAME | Enter the name of the current file |
IGNORECASE | If true, ignore case matching is performed |
RSTART | The matching function matches the first string |
RLENGTH | Matching function matches the length of the string |
Redirect
Output redirection
>
:echo | awk '{print "abc" > "test" }
>>
Input redirection
getline
- If a record obtained, the function returns getline 1, if the end of file is returned to 0, if an error occurs, such as opening a file fails, it returns -1, while, etc. may be incorporated into the flow control statements used
awk 'BEGIN{getline < "-";print}'
awk 'BEGIN{getline < "test.txt"; print $0}'
awk 'BEGIN{ while( getline line < "test" ){ print line } }'
pipeline
You can open a pipe in awk, butThe same time only by the presence of a pipeline: If the first conduit is not closed on the use of the second pipeline, the data is still transmitted to the first conduit, since the previous data and the file descriptor of the pipe (pointer) to establish a further connection. By close () closable duct
awk '{print $1, $2 | "sort" }' test END {close("sort")}
Conditional statements
-
if/else
if (expression) { statement; ... } else if (expression) { statement; ... } else { statement; }
cycle
-
while
while (expression) { statement; ... }
-
for
for (i=1;i<=NF;i++){ statement; ... } for (i in Array){ # 注意这里取出来的i为Array的key print i,Array[i] }
Program control
next
: Skip the current record, the next record is readexit N
: Exit awk execution
Array
- Subscripts may be a number or string (corresponding dictionary)
- Array elementNot sequential storage
- Commonly referred to as a key index (key)
- Variables can be used as an array subscript
- Field values as array subscripts
- Loop through the elements in the array:
for (i in Arrayname) {print Arrayname[i]}
- I read out here is key, not value
- Tcp statistics of the number of links
[root@localhost ~]# netstat -ant | awk '/^tcp/ {++state[$NF]} END {for(key in state) print key,"\t",state[key]}'
LISTEN 12
ESTABLISHED 2
- Statistics are of different tcp ip link
[root@localhost ~]# netstat -ant | awk '/^tcp/ {n=split($(NF-1),array,":");if(n<=2)++S[array[(1)]];else++S[array[(4)]];++s[$NF];++N} END {for(a in S){printf("%-20s %s\n", a, S[a]);++I}printf("%-20s %s\n","TOTAL_IP",I);for(a in s) printf("%-20s %s\n",a, s[a]);printf("%-20s %s\n","TOTAL_LINK",N);}'
* 6
192.168.159.1 2
0.0.0.0 6
TOTAL_IP 3
LISTEN 12
ESTABLISHED 2
TOTAL_LINK 14
[root@localhost ~]# netstat -ant | awk '/^tcp/ {n=split($(NF-1),array,":");if(n<=2)++S[array[(1)]];else++S[array[(4)]];++s[$NF];++N} END {for(a in S){printf("%-20s %s\n", a, S[a]);++I}printf("%-20s %s\n","TOTAL_IP",I);for(a in s) printf("%-20s %s\n",a, s[a]);printf("%-20s %s\n","TOTAL_LINK",N);}'
* 6
192.168.159.1 2
0.0.0.0 6
TOTAL_IP 3
LISTEN 12
ESTABLISHED 2
TOTAL_LINK 14