linux Advanced -awk succinctly

Brief introduction

awk is a powerful text analysis tool, relative to grep to find, edit sed of, awk in its data analysis and report generation, is particularly strong. Awk is to simply read the file line by line, as the default delimiter spaces each row of slices, cut portions then various evaluation.

There are three different versions of awk: awk, nawk and the gawk, is not particularly described, generally refers to gawk, gawk is the GNU AWK version.

awk its name derived from the first letters of its founder Alfred Aho, Peter Weinberger, and Brian Kernighan last name. AWK does in fact have their own language: AWK programming language, the three founders have it officially defined as "style scanning and processing language." It allows you to create short programs that read input files, sort data, process the data, perform calculations on the input and generate reports, as well as countless other functions.

Instructions

awk '{pattern + action}' {filenames}

Although the operation can be complex, but the syntax is always the case in which pattern represents the content AWK find in the data, and the action is a series of commands when a match is found executed. Curly braces ({}) need not always appear in the program, they are used to group a series of instructions according to a particular pattern. pattern is a positive expression to be represented, with slash marks.

The most basic function of awk language is a browser-based rules specified in the document or string and extract information, the awk to extract information in order to carry out other text manipulation. Complete awk scripts are often used information in a formatted text file.

Usually, awk is a behavior file processing units. awk each line of the document received, and executing the first command to process text.

Call awk

There are three ways to call awk

1. command line

awk [-F field-separator] 'commands' input-file(s)

Wherein, commands are really awk command, [- F field delimiter] is optional. input-file (s) is a file to be processed.

In awk, each line file, separated by a field separator for each called a domain. Typically, in the case where the unnamed -F field delimiter, the default field separator is a space.

2.shell scripted

All of the awk command to insert a file, and the executable program awk, awk command interpreter and then as the first line of the script, again invoked by typing the name of the script.

Equivalent to shell script the first line:! # / Bin / sh

Can be replaced with: # / bin / awk!

3. all the awk command to insert a separate file, and then call:

awk -f awk-script-file input-file(s)

Which, -f option to load the script awk awk-script-file in, input-file (s) with the above is the same.

This chapter focuses on the command line.

Starter examples

----------------------------------------------------------------------------------------------------------------

1, only the first five lines removed: with the following output last -n 5

[root@www ~]# last -n 5

root pts/1 192.168.1.100 Tue Feb 10 11:21 still logged in

root pts/1 192.168.1.100 Tue Feb 10 00:46 - 02:28 (01:41)

root pts/1 192.168.1.100 Mon Feb 9 11:41 - 18:30 (06:48)

dmtsai pts / 1 192.168.1.100 Mon Feb 9 11:41 to 11:41 (00:00)

root tty1 Fri Sep 5 14:09 - 14:10 (00:01)

2 displays the most recent five account login

#last -n 5 | awk '{print $1}'

root

dmtsai

root

awk workflow is such that: reading there is a record newline separated '\ n', and then recorded in the specified field delimiter into domain, fill-in fields, $ 0 indicates all domains , $ 1 represents the first field , $ n represents the n-th field . The default field separator is a "key blank" or "[Tab] button" , the user logged represents $ 1, $ 3 represents a login user IP, and so on.

3, display / etc / passwd account

#cat /etc/passwd |awk -F ':' '{print $1}'

root

daemon

bin

sys

This is an example of awk + action, each line will be executed action {print $ 1}. -F specified field separator is ':' .

4, the display / etc / passwd accounts and account corresponding shell, and the shell and between the accounts in the tab divided

#cat /etc/passwd |awk -F ':' '{print $1"\t"$7}'

root /bin/bash

daemon /bin/sh

bin /bin/sh

sys /bin/sh

6, the display / etc / passwd accounts and account corresponding shell, between the shell and the comma-separated accounts, and add all column names in name lines, shell, add "blue, / bin / nosh" in the last line.

cat /etc/passwd |awk -F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "blue,/bin/nosh"}'

name,shell

root,/bin/bash

daemon,/bin/sh

bin,/bin/sh

sys,/bin/sh

....

blue,/bin/nosh

In this case, awk workflow is such that: first beging performed, and then read the file, read a record with a / n newline character segmentation and the records in the specified field delimiter into domain, the domain is filled, then all $ 0 domain, $ 1 represents the first field, $ n denotes the n-th field, and then begin an operation mode corresponding to action . Then start reading the second record ······ until all records have been read, and finally an END operation .

7, search / etc / passwd root lines have all keywords

#awk -F: '/root/' /etc/passwd

root:x:0:0:root:/root:/bin/bash

This is an example of using a pattern, the pattern matching (here root) row will be performed action (Action is not specified, the default output the contents of each row).

Search supports regular, for example, to find the root begins with:

awk -F: '/^root/' /etc/passwd

8, search / etc / passwd root lines have all the keywords and displays the corresponding shell

# awk -F: '/root/{print $7}' /etc/passwd

/bin/bash

This specifies the action {print $ 7}

awk built-in variable

---------------------------------------------------------------------------------------------------------------

awk has many built environment information used to set the variable, these variables can be changed, some of the most common variables are:

The number of command-line parameters ARGC

Command-line arguments arranged ARGV

ENVIRON support the use of the queue system environment variables

FILENAME awk browse the file name

Record number of FNR browse files

FS setting input field separator, which is equivalent to the command line option -F

The number of domain NF browsing history

NR number of records have been read

OFS output field separator

ORS output record separator

RS control record delimiter

Again, the variable $ 0 refers to the entire record, $ 1 represents the first field of the current line, $ 2 represents the second field of the current line, ...... and so on.

1, statistics / etc / passwd: file name, line number per row, the number of columns per row, corresponding to the complete line:

#awk -F ':' '{print "filename:" FILENAME ",linenumber:" NR ",columns:" NF ",linecontent:"$0}' /etc/passwd

filename:/etc/passwd,linenumber:1,columns:7,linecontent:root:x:0:0:root:/root:/bin/bash

filename:/etc/passwd,linenumber:2,columns:7,linecontent:daemon:x:1:1:daemon:/usr/sbin:/bin/sh

filename:/etc/passwd,linenumber:3,columns:7,linecontent:bin:x:2:2:bin:/bin:/bin/sh

filename:/etc/passwd,linenumber:4,columns:7,linecontent:sys:x:3:3:sys:/dev:/bin/sh

2, using printf replace print, you can make the code more concise, easy to read

awk -F ':' '{printf("filename:%10s,linenumber:%s,columns:%s,linecontent:%s\n",FILENAME,NR,NF,$0)}' /etc/passwd

print and printf

awk provides both print and printf function 2 print output.

Parameters may be variable print function, number or string. String must be enclosed in double quotes, separated by a comma. If there is no comma, parameters can not be distinguished in series together. Here, the role of the role of the comma delimited file and the output is the same, but the latter is only a space.

printf function, its usage and c language printf substantially similar, may be formatted string , output complex, easier to use printf , the code more understandable.

awk Programming

----------------------------------------------------------------------------------------------------------------------------

Variables and assignments

In addition to the built-in variables awk, awk can also customize variables.

1, statistics / etc / passwd account number

awk '{count++;print $0;} END{print "user count is ", count}' /etc/passwd

root:x:0:0:root:/root:/bin/bash

......

user count is 40

The count is the custom variable. Before the action {} there is only one print, in fact, just print a statement, action {} may have multiple statements to; number separated.

There is no initialization count, although the default is 0, the appropriate approach is initialized to 0:

awk 'BEGIN {count=0;print "[start]user count is ", count} {count=count+1;print $0;} END{print "[end]user count is ", count}' /etc/passwd

[start]user count is 0

root:x:0:0:root:/root:/bin/bash

...

[end]user count is 40

2, number of bytes in a file folder statistics occupied

ls -l |awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size}'

[end]size is 8657198

3, number of bytes of files in a folder under the occupation, the display unit to M:

ls -l |awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size/1024/1024,"M"}'

[end]size is 8.25889 M

Note that the statistics do not include subdirectories folder.

Conditional statements

awk conditional statements are borrowed from the C language, see the following statement by:

if (expression) {

statement;

... ...

}

if (expression) {

statement;

} else {

statement2;

}

if (expression) {

statement1;

} else if (expression1) {

statement2;

} else {

statement3;

}

1, number of bytes of files in a folder occupies, 4096 filter size of the file (usually a folder):

ls -l |awk 'BEGIN {size=0;print "[start]size is ", size} {if($5!=4096){size=size+$5;}} END{print "[end]size is ", size/1024/1024,"M"}'

[end]size is 8.22339 M

loop statement

awk in the same loop borrowed from the C language support while, do / while, for, break, continue, and C language semantics identical semantics of these keywords .

Array

awk, subscript of an array may be numbers and letters, subscript of an array is often referred key (key). Values and keys are stored in the interior of a table for key / value hash's application. Since the hash is not stored sequentially, so when the show will find an array of content, they are not displayed as you expect out of order.

And an array of variables, are automatically created when using, awk will also automatically determine which stores digital or string. Generally, the array of awk used to gather information from the record, may be used to calculate the sum of the number, and statistical tracking template word to be matched and the like.

1, display / etc / passwd account

awk -F ':' 'BEGIN {count=0;} {name[count] = $1;count++;}; END{for (i = 0; i < NR; i++) print i, name[i]}' /etc/passwd

0 root

1 daemon

2 bin

3 sys

4 sync

5 games

......

As used herein for loop through the array

linux Advanced -awk succinctly

linux Advanced -awk succinctly

Guess you like