awk text processing
awk is a pattern scanning and processing language, use the Linux command awk text processing can be carried out quickly and efficiently. Awk scans each line of text and perform the specified command.
awk was born in 1977, it draws on the C language programming language, the name taken from the three designer Alfred Aho, Peter Weinberger, and Brian Kernighan surname. awk version of many, as used herein, is GNU Awk on Ubuntu, you can use HomeBrew install gawk on MacOS.
usage
awk can be executed directly from the command line, you can also write .awk
file suffix and then executed. awk text processing in units, are performed for each row of the received specified behavior.
Command line execution
$ awk [ -F fs ] [ -v var=value ] 'pattern {action}' [ file ... ]
复制代码
Wherein the -F
specified delimiter -v
specified awk built-in variable.
For example, /etc/passwd
the contents of the file:
root:x:0:0:root:/root:/usr/bin/zsh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
复制代码
If you want to output the contents of each line, you can use
$ awk '{print $0}' /etc/passwd
复制代码
Wherein $0
a scanning line to the text.
File execution
.awk
File can be divided into three parts to write, as follows:
# passwd.awk
BEGIN{
FS="\n";
print "Before action";
}
{
print $0;
}
END{
print "After action";
}
复制代码
BEGIN block is used to define the behavior before processing each row, it can be used to set the built-in variable awk, after processing each row are set to take effect at the back.
END block is used to define the behavior after the text has been processed, some conclusions can be used to output information.
BEGIN and END operation of the intermediate block for each row. Directly from the command line may be used BEGIN and END block, when executed awk.
After the written document executed from the command line:
Before action
root:x:0:0:root:/root:/usr/bin/zsh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
After action
复制代码
variable
$ Digital +
$0
To a scanning line, $1
it indicates an item after the first partition line, $2
represents the second term after the line separators, and so on.
In order to output /etc/passwd
the user name (first term), you can execute the following statement:
$ awk -F ':' '{print $1}' /etc/passwd
root
daemon
bin
sys
复制代码
Here to process the first row root:x:0:0:root:/root:/usr/bin/zsh
as an example, awk will first deal is in accordance with -F
the set delimiter :
these swaths is root x 0 0 root /root /usr/bin/zsh
then output the firstroot
Special Variables
-
FS(field separator)
FS is the input field separator, for example provided above
:
, the default value is a space available in the command line-F
is provided, or by the BEGIN blockFS=
is set to a string or a regular expression. E.g:$ awk -F ":" '{print $1,$2,$3}' /etc/passwd root x 0 daemon x 1 bin x 2 sys x 3 复制代码
-
OFS(output field separator)
OFS is the output connector field, the default output in the above example using space as the output field connector, modified by setting the variable OFS:
$ awk -F ":" -v OFS="-" '{print $1,$2,$3}' /etc/passwd root-x-0 daemon-x-1 bin-x-2 sys-x-3 复制代码
-
RS(record separator)
In previous examples, awk are the default processing in units of text, each line of a stored record, because the default record separator RS to "\ n". There are not trying to text while stored as csv files and other units of lines, such as:
# people.txt P1 male 15 p2 female 20 p3 male 19 复制代码
Above file using the "\ n \ n" separate records, each record in turn uses "\ n" separate the fields, it can be treated:
$ awk -F "\n" -v RS="\n\n" '{print $1,$2,$3}' people.txt P1 male 15 p2 female 20 p3 male 19 复制代码
-
ORS(output field separator)
Similarly with the RS, setting the output record separator ORS.
$ awk -F "\n" -v RS="\n\n" -v ORS="\n***\n" '{print $1,$2,$3}' people.txt P1 male 15 *** p2 female 20 *** p3 male 19 *** 复制代码
-
NR(number of records)
NR indicates the recording currently being processed is the first of several, if present in END block NR indicates the number of records processed
$ awk -F ":" '{print "line" NR ":" $1,$2,$3}' /etc/passwd line1:root x 0 line2:daemon x 1 line3:bin x 2 line4:sys x 3 复制代码
If you are processing multiple files, then the number of entries will accumulate
$ awk -F ":" '{print "record" NR ":" $1,$2,$3}' people.txt /etc/passwd record1:P1 record2:male record3:15 record4: record5:p2 record6:female record7:20 record8: record9:p3 record10:male record11:19 record12:root x 0 record13:daemon x 1 record14:bin x 2 record15:sys x 3 复制代码
-
NF(number of fields)
NF denotes the number of fields in a record are separated so that the value provided FS related to:
# 以 ":" 为分隔符 $ awk -F ":" '{print "record" NR " with " NF " fields:" $1,$2,$3}' /etc/passwd record1 with 7 fields:root x 0 record2 with 7 fields:daemon x 1 record3 with 7 fields:bin x 2 record4 with 7 fields:sys x 3 # 以 "o" 为分隔符 $ awk -F "o" '{print "record" NR " with " NF " fields:" $1,$2,$3}' /etc/passwd record1 with 7 fields:r t:x:0:0:r record2 with 5 fields:daem n:x:1:1:daem n:/usr/sbin:/usr/sbin/n record3 with 3 fields:bin:x:2:2:bin:/bin:/usr/sbin/n l gin record4 with 3 fields:sys:x:3:3:sys:/dev:/usr/sbin/n l gin 复制代码
-
FILENAME
FILENAME is the name of the file currently being processed
$ awk -F ":" '{print FILENAME}' /etc/passwd people.txt /etc/passwd /etc/passwd /etc/passwd /etc/passwd people.txt people.txt people.txt people.txt people.txt people.txt people.txt people.txt people.txt people.txt people.txt 复制代码
This value after the start of the recording process makes sense, therefore try BEGIN block obtained null output FILENAME
-
FNR
NR indicates the number of items in front of the processing will be accumulated in a plurality of files, and the files in the current records indicates FNR of several
awk -F ":" '{print "record" FNR ":" $1,$2,$3}' people.txt /etc/passwd record1:P1 record2:male record3:15 record4: record5:p2 record6:female record7:20 record8: record9:p3 record10:male record11:19 record1:root x 0 record2:daemon x 1 record3:bin x 2 record4:sys x 3 复制代码
Built-in functions
awk provides some built-in functions, text processing and to facilitate operations, including obtaining the string length length()
, the random number is acquired rand()
, the calculated sine and cosine sin()
and cos()
.
These functions can be in the official manual query.
Record Screening
All of the above examples for each recording operation is performed, in fact can also be screened using conditions.
Regular judge
Use a regular expression pattern matching can record:
$ awk -F ':' '/root/ {print $1,$2,$3}' /etc/passwd
root x 0
复制代码
Here containing selected root
records.
Conditional
Combined with built-in variables and functions awk can also be screened:
# 输出第一个字段长度大于 2 且在 第 1 条记录之后的记录
$ awk -F ':' 'length($1)>3 && NR>1 {print $0}' /etc/passwd
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
复制代码
The if statement
awk also provides an if statement:
# 输出第 3 个字段为 0 的记录
$ awk -F ':' '{if ($3==0)print $0}' /etc/passwd
root:x:0:0:root:/root:/usr/bin/zsh
复制代码
awk also for statement, in the form similar to the C language:
$ awk -v ORS="," 'BEGIN{ for(i=1;i<5;i++) print i}'
1,2,3,4,
复制代码
Character and Numeric
awk support provides support for mathematical operators, and logical operators, and may also be cast directly between the awk string numbers, +0
can be forced into a digital, stitching and spaces can be converted to a string:
awk 'BEGIN{print "origin\tnumber\tstring"}{print $0,"\t",$0+0,"\t",$0 ""}' people.txt
origin number string
P1 0 P1
male 0 male
15 15 15
0
p2 0 p2
female 0 female
20 20 20
0
p3 0 p3
male 0 male
19 19 1
复制代码
As used herein, the $0 ""
representation of the original recording was spliced with spaces, but in some cases there will be a problem, consider the following statement:
$ awk 'BEGIN { print -12 " " -24 }'
-| -12-24
复制代码
A space where you want, but did not get the desired results in the -12 and -24 intermediate, which is due to mathematical operator priority over the splicing operation, the analytical sequence is as follows:
-12 (" " - 24)
⇒ -12 (0 - 24)
⇒ -12 (-24)
⇒ -12-24
复制代码
To get the right results need to use the right combination of parentheses:
$ awk 'BEGIN { print -12 " " (-24) }'
-| -12 -24
复制代码