awk(4) - Introduction to awk

1 Introduction
1.1 Features: awk is a programming language that can be run directly without compiling in advance; it has a built-in pipe function, which can transmit the data being processed to the shell for processing, and then send the shell processing results back to awk. Pipe makes awk easy use of system resources.
Usually some small tools are written, and the pipes provided by the shell pass the data to different awk tools for processing, so as to solve the big problems. If there are performance requirements, these gadgets can also be rewritten in C language.
sed is to process a line of data, awk is to divide a line into several segments to process.

1.2 Syntax:
awk 'Condition Type 1 {Instruction 1} Condition Type 2{Instruction 2} ...' filename
awk is followed by single quotation marks and curly brackets to set the data processing. The file can be followed by a file, or standard input can be used. Awk processes the data of the fields of each line, and the default is to use the space or the Tab key as the delimiter.

1.3 awk processing flow:
1. Read a line, put the data into $0, $1, $2..... $0 is the entire line of data, $1, $2, $3... is a column of data divided by the delimiter
2, According to the type of condition, it is judged whether to execute the following "instruction". The value of the judgment condition is true (or not 0, or not an empty string). If there is no "condition" before the braces, the instruction is executed unconditionally.
3. Finish all "instructions" and "condition types"
4. If there is still row data, repeat steps 1--3 until all data are processed.
5. If there are multiple files, execute each file one by one.
6. Multiple scripts and one file, and a line of data executes the instructions in the script sequentially.

1.4 Condition type
1.4.1, awk logical operation characters
 > < >= <= == !=
such as: x>34{commond}
1.4.2, ~ matches, !~ does not match
such as: A~B whether string A contains B
"banana"~/an/
1.4.3 &&, ||, ! 
The above two conditions are established, and the result is a logical value, which can be combined with &&, ||, ! to form a new logical value
1.4.4 Two Condition
Example:
FNR>=22 && FNR<=28{print " " $0}
can be written as
FNR==22,FNR==29{print " " $0}
Explanation: For this type of condition, awk will set up a switch, when the first When one condition is met, open the switch to execute the command. When the second condition is met, close the switch command and not execute the

1.5 awk command
http://blog.csdn.net/convict_eva/article/details/74988695


1.6 awk built-in variables 
1), the number of columns in each line ($0) of NF (separated by a delimiter)
2), the number of lines of data currently processed by NR (counting from 1), if there are multiple files, this value keep adding up.
3) The difference between FNR and NR is: each time a file is opened, FNR starts to accumulate again
4), the current separator of FS, the default is space (or tab)
5), OFS column output separator. The default is a blank space. print $1,$2
6), RS line separator. The default is newline (\n)
7), the line separator character when ORS is output. The default is newline (\n)
8), FILENAME is the name of the file being processed (the file after awk is what the parameter is, this is what this is. If it is receiving data from a pipeline, the value is - (standard input))
9), OFMT numerical output format, the default is %.6g. That is, output up to 6 decimal places,
such as: print 2/3 output: 0.666667  
RSTART RLENTTH Refer to the match() function of awk http://blog.csdn.net/convict_eva/article/details/74987793
10), SUBSEP array is the label separator . The default is \034
. In fact, arrays in awk only accept strings as annotations. Such as: arr["jamin"], awk can still use numbers when used, and even multi-level arrays can be used. Such as: arr[1,22] In fact, before awk accepts arr[1,22], it replaces its label with the string "1\03422", and then uses arr["1\03422"] instead of arr[1 ,22]
11), ARGV[], ARGC
ARGC is an integer, the number of arguments other than the options -v -f and the arguments corresponding to the base. ARGC can be used to determine the number of open files, but this value can be modified (eg: ARGC=1). When the ARGC value is set to 1, awk
mistakenly thinks that there is no file to process, and cannot open the file with ARGV[1], ARGV[2], but you can still get the command line through ARGV[1], ARGV[2] parameter. If there are multiple files following the awk command, set the ARGC value to process multiple files.
ARGV[] String data representing the input parameter. Counting from 0, ARGV[0] is awk, the parameter still counts from 1, similar to shell script.
Example print parameters: arg.awk The script is as follows:
awk 'BEGIN{
        for(i=0;i<ARGC;i++){
            print ARGV[i]
        }
}' $*
execute:
$./arg.awk abc de
awk
a
b
c
de

Example:
$ echo 'A125 Jenny 100 210' | awk '{print $0"\n" $1, $2,$3,$4 "\n" "NF="NF"," "NR="NR "," "FILENAME= "FILENAME ",FS="FS}'
A125 Jenny 100 210 #$0
A125 Jenny 100 210 #$1,... $4
NF=4,NR=1,FILENAME=-,FS=
Description: FILENAME=- No file is specified, It is the data taken directly from the stream.
NF currently has 4 columns, so the value is 4.
NR currently has only one row, so the value is 1.
FS is a space by default. You can see that there is a space behind by selecting it after the command line is executed.
print is an awk output command, which defaults to output to the screen. The output screen is separated by spaces (OFS built-in variables, which can be set, the default is spaces), as shown in the second line of the above result.


2 Use awk
2.1 to write the awk command in the file, the execution syntax is as follows:
$awk -f test.awk file_name
$awk -f test.awk -f test2.awk file_name (one line of data, first execute the command in test.awk, Then execute the instructions in test2.awk)
Example:
there is a pay file with the following content:
A125 Jenny 100 210
A341 Dan 110 215
P158 Max 130 209
P148 John 125 220
A123 Linda 95 210
Description:
The first column: employee number: A is the assembly department , P is the packaging department
Column 2: employee name
Column 3: salary
Column 4: working hours
Example 1: The salary of the assembly department is increased by 5%, and the salary of all (not just the assembly department) employees is less than 100, then adjusted to 100.
$ awk '$1~/^A.*/{$3 *= 1.05} $3<100{$3=100}{print $1,$2,$3,$4}' pay
A125 Jenny 105 210
A341 Dan 115.5 215
P158 Max 130 209
P148 John 125 220
A123 Linda 100 210
You can also write awk commands to a file (wage.awk, commands written in a file can be formatted, as you can see in the example below.):

$1~/^A.*/ {$3 *= 1.05}
$3<100 {$3=100}
{print $1,$2,$3,$4}
Execute the following command, the result is the same:
$ awk -f wage.awk pay

2.2 Using arrays in awk (arrays in books, similar to key values)
Features of arrays in awk:
1) Use strings as labels instead of numbers (numbers can be used, and awk automatically converts them internally). Such as: arr["a"]=1
2), do not declare the array in advance. (The use of variables is also not required to declare in advance)
Example:
The course file reg with the student's choice is as follows:
Mary OS Arch. Discrete
Steve DS Algorithm Arch.
Wang Discrete Graphics OS
Lisa Graphics AI
Lily Discrete Algorithm
The first column is the name, and the next column is the choice course. Now to count the number of people selected for each course:
awk processing command file course.awk content is as follows:

{
    for( i=2;i<NF;i++){
        Number[$i]++
    }
}
END {
    for(course in Number){
        print course,Number[course]
    }
}
$ awk -f course.awk reg 
Discrete 2
OS 1
DS 1
Graphics 2
Algorithm 1
Arch. 1
Description:
1. In the course.awk file, using Number[] does not need to declare in advance;
2. The first instruction, each line Starting from the second column, use the course name as the data subscript to start counting. Number[$i]++ , Number[$i]=Number[$i]+1 These two ways of writing are the same.
3. Variables starting with $ in awk can be understood as: i=2 ;$i=$2, Indicates the second column of data
4. The second instruction is to traverse the data, the subscript is the course name, and the value is the number of students who choose the course.
5. END is a reserved word for AWK, and END is also a conditional expression. The END condition is that awk has finished processing all the data, and is about to exit the awk program.
6. BEGIN is another reserved word in awk. It is also a conditional expression, corresponding to END. The instruction corresponding to BEGIN is executed once before the awk program starts executing (before reading the first line of data). Such as: modify the default separator.


2.3 awk call shell script command
awk allows to call shell script, and can use pipe to pass data to and from the system.
Example: The content of the
online.awk script online.awk is as follows:
BEGIN{
    while("who"|getline) n++
    print n
}
$ awk -f online.awk 
Description:
1), awk does not necessarily need to process text content, the above execution awk has no corresponding file name
2), | is the pipe symbol of awk. Awk treats the command "who" before pipe as a command on the shell, and sends the command to the shell for execution. The result after execution is sent back to the awk program through pipe.
3), there is only one awk input command: getline. There are two output instructions: print printf
awk getline refer to http://blog.csdn.net/convict_eva/article/details/74989777


2.4 Output to file
> output to newly generated file >> append to file
Example:
outfile.awk The script is as follows:
BEGIN{
    print "this is test line." > "outfile1"
    print "this is first line." > "outfile1"
    print "this is second line." >> "outfile1"
}
$ awk -f outfile.awk 
execution result, all three lines of data will be written to outfile1 file.
Instructions:
1. The file name should be enclosed in "", if not enclosed, it is a variable. Awk uses a variable, no need to declare it in advance, its value is an empty string or 0
2, awk uses >, a new file will be generated when it is executed for the first time, and it will be appended to this file when it is executed later; not every time a new file is generated When using >> for a new file (empty the original file)
, if the file already exists, it will be appended; if the file does not exist, a new file will be created and then appended.
So the above command, no matter how many times it is executed, the outfile1 file has only 3 lines. If the > in the above script is replaced with >> then the output of the execution will be saved.


2.5 Using system resources The
sort files are as follows:
1 9
3 7
5 5
7 3
2 8
4 6
6 4
8 2
0 10
Sort the output to the file outfile1 according to the second line
$ awk '{print $1,$2 | "sort -n - k 2 -t\" \" >> outfile1"}' sort;
Explanation: No matter how many lines there are, awk only sends the data to the shell for sorting after executing all the prints.


2.6 Write awk program in shell script
Example: Simulate cat command
awk command p The content of the file is as follows:

awk '{
    print
}' $*
$./p test
description:
1) The p file must have execute permission
2) Two spaces in the p file are required. i: first line, space between awk and '; ii: space between }' and $*.
3) There is no awk command in the shell, and the awk program should be wrapped in ''.
4) Always use "" to wrap characters in awk programs, do not use '', so as to avoid confusion with the shell.
5) $* is the usage in the shell, which means all the parameters after the instruction.
6) You can specify multiple files, and awk will process the files in the order of the files. An error will be reported if the file cannot be opened (doesn't exist, no permissions). (If the BEGIN condition is used, it will be executed without opening the file. So the file does not exist and no error will be reported.)
7) If awk is executed without any file name, STDIN will be regarded as the source of input data.
Such as: execute
$./p
or
$awk '{print}' #ie print $0, $0 can be left out, print whatever
is input.

2.7 Modify the separator of
awk 2.7.1 The default of awk is space (or tab to cut), you can set the FS value to modify the split string (must be set in BEGIN, otherwise the first line of data will not work).
Example:
$ awk 'BEGIN{FS=":"} {print $1}' /etc/passwd # If BEGIN is not added, the first line of data print is complete.

2.8 awk call function
2.8.1 awk internal function
Example:

awk '
BEGIN{
    x=1
    y=2
    test_f(x)
    print a,x,y
}
function test_f(x){
    a=3
    x+=1
    y=3
    print a,x,y
}
' $*
Execution result:
3 2 3
3 1 3
Explanation:
1) Any variable in the main program can be used
within the function 2) Variables (except parameters) within the function can also be used outside the function
3) Regardless of the function or the main program A variable, as long as the variable name is the same, is considered to be an upper variable (except for function parameters)
4) The disadvantage of this feature is that the function is called and the variable itself in the main program is modified. By passing parameters, such as the x value in the example, after being modified in the function, it does not affect the value in the main program.

2.8.2 Use system("cmd") to call shell-defined functions

#!/bin/bash
function test(){
    echo "this is test" $1
}
export -f test
awk '
    BEGIN{
        system("test abc")
        "test abc" | getline r;print "r:",r
    }
'
this is test abc
r: this is test abc
Description: pipe uses the standard output of the shell command as the input of awk

2.9 Handling multiple lines.

The awk built-in variable RS (default is \n) is used to split line data, that is to say, use the value of RS to split the input document into line by line processing. You can modify this value to change the way awk splits lines.
2.9.1. RS="" will place blank lines in the document in front of split lines.
Example:
The file with sign is as follows (blank lines in spaces):

name   time
zhang  8:50


at 9:00 p.m.
wang 9:10

zhou 8:10

xie 9:30
Split by blank line, print line:
$ awk 'BEGIN{RS=""}{print $0,NR,"line"}' sign
name time
zhang 8:50 1 line
li 9:00
wang 9:10 2 line
zhou 8 :10 3 line
xie 9:30 4 line
1. It can be seen that multiple lines separated by blank lines are processed in front of one line.
2, awk points to ignore blank lines before and after the document.
3. It is easy to understand by looking at \n as a character. Then set FS to \n, you can process the data line by line.
4, RS also supports regular

Use: find the keyword in the file, output the first 10 to the last 15 lines

awk '
BEGIN{
    file_name=ARGV[1]
    keyword=ARGV[2]
    while("cat -n " file_name " | grep " keyword " | awk \x27{print $1}\x27 " | getline num){
        printLine(num)
    }
}
function printLine(num){
    start=num-10
    end=num+15
    print "----------------------------------------------------------------------"
    system("sed -n \x27"start","end "p\x27 " file_name)
    print "----------------------------------------------------------------------"
}
' $*
Execution:
$./awk file_name keyword
Description: 
1. \x27 is a single quote


Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326740954&siteId=291194637
awk