I believe many people have used the Linux awk command, but not many people really study it. Because most of us are programming for Baidu. Today we take a moment to talk about it briefly.
I just googled it and found the articles of the two great gods. I believe many people know them.
My article is based on theirs as a reference, and added some content.
Awk is an application for processing text files, and almost all Linux systems come with this program.
It is called AWK because it takes the first character of the Family Name of the three founders Alfred Aho, Peter Weinberger, and Brian Kernighan.
awk syntax
awk [option parameter]'script' var=value file(s)
or
awk [option parameter] -f scriptfile var=value file(s)
Option parameter description:
- -F fs or --field-separator fs
Specify the input file delimiter, fs is a string or a regular expression, such as -F:.
- -v var=value or --asign var=value
Assign a user-defined variable.
- -f scripfile or --file scriptfile
Read the awk command from the script file.
- -mf nnn and -mr nnn
Set an inherent limit on the value of nnn, the -mf option limits the maximum number of blocks allocated to nnn; the -mr option limits the maximum number of records. These two functions are the extended functions of Bell Labs version of awk, which are not applicable in standard awk.
- -W compact or --compat, -W traditional or --traditional
Run awk in compatibility mode. So gawk behaves exactly like standard awk, all awk extensions are ignored.
- -W copyleft or --copyleft, -W copyright or --copyright
Print short copyright information.
- -W help or --help, -W usage or --usage
Print all awk options and a short description of each option.
- -W lint or --lint
Print warnings about structures that cannot be ported to traditional unix platforms.
- W lint-old or --lint-old
Print a warning about structures that cannot be ported to traditional unix platforms.
- -W posix
Turn on compatibility mode. However, the following restrictions are not recognized: /x, function keywords, func, escape sequence, and when fs is a space, the new line is used as a field separator; the operator and = cannot replace ^ and ^=; fflush is invalid .
-W re-interval or --re-inerval
Allow the use of interval regular expressions, refer to (Posix character class in grep), such as bracket expression [[:alpha:]].
-W source program-text or --source program-text
Use program-text as the source code, which can be mixed with the -f command.
-W version or --version
Print the version of the bug report information.
Basic usage
Let's take the content of the xttblog.txt file as an example.
2
this
is a test
3
Are
you like awk
5
the WWW
.
Xttblog
.
COM
, amateur grass
This
'
s a test
10
There
are orange
,
apple
,
mongo
When the awk'{print $1,$4}' xttblog.txt command is executed, the following content will appear.
In the above example, each line is divided by space or TAB, and items 1 and 4 in the output text are output. awk'{printf "%-8s %-10s\n",$1,$4}' xttblog.txt is for formatted display of the output content.
Let's look at an example of specifying a delimiter:
awk -F #-F is equivalent to the built-in variable FS, specifying the split character
Use "," to split.
$ awk
F
,
'{print $1,$2}'
xttblog
.
txt
2
this
is
a test
3
Are
you like awk
5
www
.
Xttblog
.
Com
amateur grass
This
's a test
10 There are orange apple
Use built-in variables.
$ awk
'BEGIN{FS=","} {print $1,$2}'
xttblog
.
txt
2
this
is
a test
3
Are
you like awk
5
www
.
Xttblog
.
Com
amateur grass
This
's a test
10 There are orange apple
Use multiple separators. Use spaces to divide first, and then use "," to divide the result of the division.
$ awk
F
'[ ,]'
'{print $1,$2,$5}'
xttblog
.
txt
2
this
test
3
Are
awk
5
www
.
xttblog
.
com
This
's a
10 There apple
Let's take a look at the usage of setting variables. To set variables we use awk -v
$ awk
v a
1
'{print $1,$1+a}'
xttblog
.
txt
2
3
3
4
5
6
This
's 1
10 11
Example of setting two variables.
$ awk
v a
1
-
v b
s
'{print $1,$1+a,$1b}'
xttblog
.
txt
2
3
2s
3
4
3s
5
6
5s
This
's 1 This'
ss
10
11
10s
The power of awk lies in its awk script.
$ awk -f {awk script} {file name}
Operators are also supported at the same time.
Example: Filter the rows whose first column is greater than 2.
$ awk
'$1>2'
xttblog
.
txt
3
Are
you like awk
5
the WWW
.
Xttblog
.
COM
, amateur grass
This
's a test
10 There are orange,apple,mongo
Filter the rows where the first column is equal to 2.
$ awk
'$1==2 {print $1,$3}'
xttblog
.
txt
2
is
Filter rows where the first column is greater than 2 and the second column is equal to'Are'.
$ awk
'$1>2 && $2=="Are" {print $1,$2,$3}'
xttblog
.
txt
3
Are
you
Awk also supports built-in variables.
$ awk
'BEGIN{printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n","FILENAME","ARGC","FNR","FS","NF","NR","OFS","ORS","RS";printf "---------------------------------------------\n"} {printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n",FILENAME,ARGC,FNR,FS,NF,NR,OFS,ORS,RS}'
xttblog
.
txt
FILENAME ARGC FNR FS NF NR OFS ORS RS
xttblog
.
txt
2
1
5
1
xttblog
.
txt
2
2
5
2
xttblog
.
txt
2
3
2
3
xttblog
.
txt
2
4
3
4
xttblog
.
txt
2
5
4
5
Output sequence number NR, matching text line number.
$ awk
'{print NR, FNR, $ 1, $ 2, $ 3}'
xttblog
.
txt
1
1
2
this
is
2
2
3
Are
you
3
3
5
the WWW
.
Xttblog
.
COM
, amateur grass
4
4
This
's a test
5 5 10 There are
Specify the output separator.
$ awk
'{print $1,$2,$5}'
OFS
" $ "
xttblog
.
txt
2
$
this
$ test
3
$
Are
$ awk
5
$ www
.
Xttblog
.
Com
, amateur grass
$
This
's $ a $
10 $ There $
Awk also supports regular and string matching.
Output the second column contains "th", and print the second and fourth columns.
$ awk
'$2 ~ /th/ {print $2,$4}'
xttblog
.
txt
this
a
In the above command, ~ means the mode starts. // The middle is the mode.
Let's look at another line that contains "," in the output.
$ awk
'/,/ '
xttblog
.
txt
5
the WWW
.
Xttblog
.
COM
, amateur grass
10
There
are orange
,
apple
,
mongo
Awk ignores case.
$ awk
'BEGIN{IGNORECASE=1} /this/'
xttblog
.
txt
2
this
is
a test
The awk mode is reversed.
$ awk
'$2 !~ /th/ {print $2,$4}'
xttblog
.
txt
Are
like
the WWW
.
xttblog
.
COM
, amateur grass
a
There
orange
,
apple
,
mongo
$ awk
'!/th/ {print $2,$4}'
xttblog
.
txt
Are
like
the WWW
.
xttblog
.
COM
, amateur grass
a
There
orange
,
apple
,
mongo
Awk also supports awk scripts. The script file ends with .awk.
The awk script has two important keywords BEGIN and END.
BEGIN{ here is the statement before execution}
END {Here is the statement to be executed after processing all the lines}
{Here is the statement to be executed when processing each line
Let's look at a simple xttblog.awk script.
#!/bin/awk -f
#Run before
BEGIN
{
math
=
0
english
=
0
computer
=
0
printf
"NAME NO. MATH ENGLISH COMPUTER TOTAL\n"
printf
"---------------------------------------------\n"
}
#Running
{
math
+=
$3
english
+=
$4
computer
+=
$5
printf
"%-6s %-6s %4d %8d %8d %8d\n"
,
$1
,
$2
,
$3
,
$4
,
$5
,
$3
+
$4
+
$5
}
#After running
END
{
printf
"---------------------------------------------\n"
printf
" TOTAL:%10d %8d %8d \n"
,
math
,
english
,
computer
printf
"AVERAGE:%10.2f %8.2f %8.2f\n"
,
math
/
NR
,
english
/
NR
,
computer
/
NR
}
This script seems to have a lot of code, let's look at a hello world script.
echo
|
awk
'{print "Hello,World!"}'
echo
|
awk
'BEGIN {print "Hello,World!"}'
awk
'BEGIN {print "Hello,World!"}'
echo
"hello world"
|
awk
'{print}'
Awk is very powerful, suitable for operation and maintenance, and Java developers can learn! I suggest you collect this article!