linux awk

Introduction
Awk is a powerful text analysis tool. Compared with grep's search and sed's editing, awk is particularly powerful when it analyzes data and generates reports. Simply put, awk reads the file line by line, slices each line with a space as the default delimiter, and performs various analysis and processing on the cut part.

There are 3 different versions of awk: awk, nawk and gawk, without special instructions, generally refers to gawk, gawk is the GNU version of AWK.

awk takes its name from the initials of the surnames of its founders, Alfred Aho, Peter Weinberger and Brian Kernighan. In fact, AWK does have its own language: the AWK programming language, which the three creators have formally defined as a "style scanning and processing language". It allows you to create short programs that read input files, sort data, process data, perform calculations on input, and generate reports, among countless other functions.

 

Using the method
awk '{pattern + action}' {filenames}
Although operations can be complex, the syntax is always like this, where pattern represents what AWK looks for in the data, and action is a sequence of actions to execute when a match is found Order. Curly braces ({}) do not need to appear all the time in a program, but they are used to group a series of instructions according to a specific pattern. pattern is the regular expression to be represented, enclosed in slashes.

The most basic function of awk language is to browse and extract information based on specified rules in files or strings. After awk extracts information, other text operations can be performed. Complete awk scripts are often used to format information in text files.

Usually, awk is a line processing unit of the file. awk receives each line of the file, and then executes the corresponding command to process the text.

 


There are three ways to call awk to call awk


1. Command line mode
awk [-F field-separator] 'commands' input-file(s)
Among them, commands are real awk commands, and [-F field separator] is optional. input-file(s) are the files to process.
In awk, on each line of the file, each item separated by a field separator is called a field. Normally, the default field separator is a space when the -F field separator is not specified.

2. The shell script method
inserts all the awk commands into a file and makes the awk program executable, and then the awk command interpreter is used as the first line of the script, which is called by typing the script name.
Equivalent to the first line of a shell script: #!/bin/sh
can be replaced with: #!/bin/awk

3. Insert all awk commands into a single file, then call:
awk -f awk-script-file input-file(s)
where the -f option loads the awk script in awk-script-file, input-file(s ) is the same as above.

This chapter focuses on the command line method.

 

The entry example
assumes the output of last -n 5 is as follows

[root@www ~]# last -n 5 <== only take out the first five lines
root pts/1 192.168.1.100 Tue Feb 10 11:21 still logged in
root pts/1 192.168.1.100 Tue Feb 10 00:46 - 02: 28 (01:41)
root pts/1 192.168.1.100 Mon Feb 9 11:41 - 18:30 (06:48)
dmtsai pts/1 192.168.1.100 Mon Feb 9 11:41 - 11:41 (00:00)
root tty1 Fri Sep 5 14:09 - 14:10 (00:01)
if only show the last 5 accounts logged in

#last -n 5 | awk '{print $1}'
root
root
root
dmtsai
root
The awk workflow is as follows: read a record separated by '\n' newline character, and then divide the record into fields according to the specified field separator , fill the domain, $0 means all domains, $1 means the first domain, $n means the nth domain. The default domain separator is "space key" or "[tab] key", so $1 means logged in user, $3 means logged in user ip, and so on.

 

If only the account of /etc/passwd is displayed

#cat /etc/passwd |awk -F ':' '{print $1}'
root
daemon
bin
sys
This is an example of awk+action, each line will execute action{print $1}.

-F specifies the field separator as ':'.

 

If only the account of /etc/passwd and the shell corresponding to the account are displayed, and the account and the shell are separated by the tab key

#cat /etc/passwd |awk -F ':' '{print $1"\t"$7}'
root /bin/bash
daemon /bin/sh
bin /bin/sh
sys /bin/sh

If only the account of /etc/passwd and the shell corresponding to the account are displayed, and the account and the shell are separated by commas, and the column name name, shell is added to all lines, and "blue, /bin/nosh" is added to the last line.


cat /etc/passwd |awk -F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "blue,/bin/nosh"}'
name,shell
root,/bin /bash
daemon,/bin/sh
bin,/bin/sh
sys,/bin/sh
....
blue,/bin/nosh The

awk workflow is as follows: first execute BEGING, then read the file, read in / A record separated by a newline character, and then divide the record into fields according to the specified field separator, fill the fields, $0 represents all fields, $1 represents the first field, $n represents the nth field, and then starts the execution mode corresponding to the action action. Then start reading the second record...until all the records are read, and finally execute the END operation.

 

Search /etc/passwd for all lines with the root keyword

#awk -F: '/root/' /etc/passwd
root:x:0:0:root:/root:/bin/bash
This is an example of the use of pattern, which matches the pattern (here is root) Action will be executed (if no action is specified, the content of each line is output by default).

The search supports regular expressions, for example, look for those starting with root: awk -F: '/^root/' /etc/passwd

 

Search /etc/passwd for all lines with the root keyword and display the corresponding shell

# awk -F: '/root/{print $7}' /etc/passwd
/bin/bash
action{print $7} is specified here

 

awk built-in variables
awk has many built-in variables for setting environment information, these variables can be changed, some of the most commonly used variables are given below.


ARGC Number of command-line parameters
ARGV Command-line parameter arrangement
ENVIRON Supports the system environment variable in the queue to
browse the file name using FILENAME awk
FNR The number of records to browse the file
FS Set the input field separator, equivalent to the command line -F option
NF Browse records Number of fields
NR Number of records read
OFS Output field separator
ORS Output record separator
RS Control record separator

In addition, the $0 variable refers to the entire record. $1 represents the first field of the current line, $2 represents the second field of the current line, and so on.

 

Statistics /etc/passwd: file name, line number of each line, number of columns of each line, corresponding complete line content:

#awk -F ':' '{print "filename:" FILENAME ",linenumber:" NR ",columns:" NF ",linecontent:"$0}' /etc/passwd
filename:/etc/passwd,linenumber:1,columns:7,linecontent:root:x:0:0:root:/root:/bin/bash
filename:/etc/passwd,linenumber:2,columns:7,linecontent:daemon:x:1:1:daemon:/usr/sbin:/bin/sh
filename:/etc/passwd,linenumber:3,columns:7,linecontent:bin:x:2:2:bin:/bin:/bin/sh
filename:/etc/passwd,linenumber:4,columns:7,linecontent:sys:x:3:3:sys:/dev:/bin/sh

Using printf instead of print can make the code more concise and readable

awk -F ':' '{printf("filename:%s,linenumber:%s,columns:%s,linecontent:%s\n",FILENAME,NR,NF,$0)}' /etc/passwd

print and printf
awk provides both print and printf functions for printing output.

The parameters of the print function can be variables, numbers or strings. Strings must be quoted in double quotes and arguments are separated by commas. Without the comma, the arguments are concatenated and indistinguishable. Here, the role of the comma is the same as that of the delimiter of the output file, except that the latter is a space.

The printf function, its usage is basically similar to the printf in the c language, it can format the string, when the output is complex, printf is easier to use and the code is easier to understand.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

awk programming
variable and assignment

In addition to awk's built-in variables, awk can also customize variables.

The following counts the number of accounts in /etc/passwd

awk '{count++;print $0;} END{print "user count is ", count}' /etc/passwd
root:x:0:0:root:/root:/bin/bash
......
user count is 40
count is a custom variable. There is only one print in the previous action{}, in fact, print is just a statement, and action{} can have multiple statements, separated by ;.

 

There is no initialization of count here. Although the default is 0, the proper way is to initialize it to 0:

awk 'BEGIN {count=0;print "[start]user count is ", count} {count=count+1;print $0;} END{print "[end]user count is ", count}' /etc/passwd
[start]user count is 0
root:x:0:0:root:/root:/bin/bash
...
[end]user count is 40

Count the number of bytes occupied by files in a folder

ls -l |awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size}'
[end]size is 8657198

If displayed in M ​​units:

ls -l |awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size/1024/1024,"M"}'
[end]size is 8.25889
MNote , the statistics do not include subdirectories of the folder.

 

Conditional statements

The conditional statement in awk is borrowed from the C language, see the following declaration method:


if (expression) {
statement;
statement;
... ...
}

if (expression) {
statement;
} else {
statement2;
}

if (expression) {
statement1;
} else if (expression1) {
statement2;
} else {
statement3;
}

Count the number of bytes occupied by files in a folder, and filter files with a size of 4096 (usually folders):

ls -l |awk 'BEGIN {size=0;print "[start]size is ", size} {if($5!=4096){size=size+$5;}} END{print "[end]size is ", size/1024/1024,"M"}'
[end]size is 8.22339 M

loop statement

The loop statement in awk is also borrowed from the C language, and supports while, do/while, for, break, continue. The semantics of these keywords are exactly the same as those in the C language.

 

array

Because the subscripts of arrays in awk can be numbers and letters, the subscripts of arrays are often called keys. Values ​​and keys are stored in an internal table that hashes against key/value. Since hashes are not stored sequentially, when you display the contents of the array, you will find that they are not displayed in the order you expected. Like variables, arrays are automatically created when they are used, and awk will also automatically determine whether they store numbers or strings. In general, arrays in awk are used to collect information from records, and can be used to calculate sums, count words, track the number of times a template is matched, and so on.

 

show accounts in /etc/passwd


awk -F ':' 'BEGIN {count=0;} {name[count] = $1;count++;}; END{for (i = 0; i < NR; i++) print i, name[i]}' / etc/passwd
0 root
1 daemon
2 bin
3 sys
4 sync
5 games
 …

Here a for loop is used to iterate over the array

 

There is a lot of content in awk programming. Only simple and common usages are listed here. For more information, please refer to http://www.gnu.org/software/gawk/manual/gawk.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325387977&siteId=291194637