Understanding awk

awk

1. Introduction:

Awk is a programming language used to process text and data under linux/unix.
Data can come from standard input, one or more files, or the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions. It is a powerful programming tool under linux/unix.
Used in the command line, but more often as a script.
The way awk processes text and data is like this. It scans the file line by line, from the first line to the last line, looking for lines that match a specific pattern, and perform the operations you want on these lines. If no processing action is specified, the matching line is displayed to the standard output (screen). If no mode is specified, all lines specified by the operation are processed.
Awk represents the first letter of the author's last name. Because its author is three people, namely Alfred Aho, Brian Kernighan Peter Weinberger.
Gawk is the GNU version of awk, which provides some extensions of Bell Labs and GNU.

Two. Two formal syntax formats of awk

awk [options] ‘commands’ filenames
awk [options] -f awk-script-file filenames

options:

-F For each processing content, you can specify a sub-defined separator, the default separator is a blank character (space or tab)

command:

BEGIN{
    
    }                        {
    
    }               END{
    
    }

处理所有内容之前的动作       处理内容中的动作   处理所有内容之后的动作

For example:

awk 'BEGIN{print "hello"} {print "^ok"} END{print "hello"}' /etc/passwd

BEGIN{} is usually used to define some variables, such as BEGIN {FS":";OFS="—"}

Three. The working principle of awk

    awk -F:  '{print $1,$3}' /etc/passwd
(以分号为分隔符,打印每行的第一和第三个字符)

(1) Awk, will process each line of the file, use one line as input for each process, and assign this line to the internal variable $0. Each line can also be called a record, ending with a newline character
(2) Then , The line is divided into fields (or called fields) by: (the default is a space or tab), each field is stored in a numbered variable, starting from $1,
up to 100 fields

(3) How does awk know to use blank characters to separate fields? Because there is an internal variable FS to determine the field separator. Initially, FS is assigned a blank character

When (4) awk print field, the built-in method will printfunction to print, awk between the fields printed out by a space. This space is an internal variable of OFSthe output field separator, the comma ,will then OFSbe mapped, by OFSthe value of the output can be controlled separator.

(5) After awk output, another line will be obtained from the file and stored in $0, overwriting the original content, and then separating the new string into fields and processing. This process will continue until all rows are processed.

Four. Record the internal variables related to the field:

$0: Awk variable $0to save the contents of the line currently being processed
NR: the line currently being processed is the line number awk total process. Different line numbers for different files. /etc/passwd and ./a.sh
FNR: The line number of the line currently being processed in its file.
NF: The total number of fields when each line is processed
$NF: 最后一个字段the value after the separation of the currently processed line
FS: the field separator when inputting the line, the default space
OFS: the output field separator, the default is a space, the
ORSoutput record separator, the default is a newline .

awk '{print $0}' /etc/passwd
awk '{print NR}' /etc/passwd h.sh
awk '{print FNR}' /etc/passwd h.sh
awk 'BEGIN{FS=":"}{print NF}' /etc/passwd
awk 'BEGIN{FS="/"}{print $NF}' /etc/passwd
awk 'BEGIN{FS=":"} {print $1,$3}' /etc/passwd
awk 'BEGIN{FS=":"; OFS="+++"} /^root/{print $1,$2}' /etc/passwd

For example

Combine each line of the file into one line

ORS outputs a record by default and should be entered, but here is an empty `

awk 'BEGIN{ORS="  "} {print $0}' /etc/passwd 

Five. Formatted output

printf function

awk -F: '{printf "%-15s %-10s %-15s\n", $1,$2,$3}' /etc/passwd
awk -F: '{printf "|%-15s| %-10s| %-15s|\n", $1,$2,$3}' /etc/passwd

awk -F: '{printf "|%-15s| %-10s| %-5.2f\n", $1,$2,$3}' /etc/passwd
  • %s Character type
  • %d Decimal integer
  • %f Floating point type. 2 places after the decimal point
  • %-15sAccounting for 15 characters -represent left-justified, right-justified by default
  • printf The default will not automatically wrap at the end of the line, add \n

Six.awk mode and action

Any awkstatement by mode and action components.
模式部分Determine when the action statement will trigger and trigger the event.
If the mode part is omitted, the action will always be executed.
The pattern can be any conditional statement or compound statement or regular expression.

The pattern can be a regular expression:

  • Regularly match the entire line (inclusive):

Is the currently processed line contains the specified pattern (written regular expression), the
/正则/regular expression needs to be written in the double slash

awk '/^root/' /etc/passwd
awk '$0 ~ /^root/' /etc/passwd
awk '!/root/' /tec/ passwd
awk '$0 !~ /^root/' /etc/passwd
  • Regularly match a field:

Available matching operators ( ~and !~)
字段 ~ /正则/

awk -F: '$1 ~ /^root/' /etc/passwd
awk -F: '$NF !~ /bash$/' /etc/passwd
  • To achieve complete equality of strings, you need to use ==

Strings need to use double quotes to
!=indicate not equal

awk -F: '$NF == "/bin/bash"' /etc/passwd
awk -F: '$1 == "root"' /etc/passwd

Guess you like

Origin blog.csdn.net/weixin_49844466/article/details/107847111
awk