awk text processing

awk text processing

awk is a pattern scanning and processing language, use the Linux command awk text processing can be carried out quickly and efficiently. Awk scans each line of text and perform the specified command.

awk was born in 1977, it draws on the C language programming language, the name taken from the three designer Alfred Aho, Peter Weinberger, and Brian Kernighan surname. awk version of many, as used herein, is GNU Awk on Ubuntu, you can use HomeBrew install gawk on MacOS.

usage

awk can be executed directly from the command line, you can also write .awkfile suffix and then executed. awk text processing in units, are performed for each row of the received specified behavior.

Command line execution

$ awk [ -F fs ] [ -v var=value ] 'pattern {action}' [ file ...  ]
复制代码

Wherein the -Fspecified delimiter -vspecified awk built-in variable.

For example, /etc/passwdthe contents of the file:

root:x:0:0:root:/root:/usr/bin/zsh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
复制代码

If you want to output the contents of each line, you can use

$ awk '{print $0}' /etc/passwd
复制代码

Wherein $0a scanning line to the text.

File execution

.awk File can be divided into three parts to write, as follows:

# passwd.awk

BEGIN{
  FS="\n";
  print "Before action";
}
{
  print $0;
}
END{
  print "After action";
}
复制代码

BEGIN block is used to define the behavior before processing each row, it can be used to set the built-in variable awk, after processing each row are set to take effect at the back.

END block is used to define the behavior after the text has been processed, some conclusions can be used to output information.

BEGIN and END operation of the intermediate block for each row. Directly from the command line may be used BEGIN and END block, when executed awk.

After the written document executed from the command line:

Before action
root:x:0:0:root:/root:/usr/bin/zsh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
After action
复制代码

variable

$ Digital +

$0To a scanning line, $1it indicates an item after the first partition line, $2represents the second term after the line separators, and so on.

In order to output /etc/passwdthe user name (first term), you can execute the following statement:

$ awk -F ':' '{print $1}' /etc/passwd
root
daemon
bin
sys
复制代码

Here to process the first row root:x:0:0:root:/root:/usr/bin/zshas an example, awk will first deal is in accordance with -Fthe set delimiter :these swaths is root​ x ​0 0 root /root /usr/bin/zshthen output the firstroot

Special Variables

  1. FS(field separator)

    FS is the input field separator, for example provided above :, the default value is a space available in the command line -Fis provided, or by the BEGIN block FS=is set to a string or a regular expression. E.g:

    $ awk -F ":" '{print $1,$2,$3}' /etc/passwd
    root x 0
    daemon x 1
    bin x 2
    sys x 3
    复制代码
  2. OFS(output field separator)

    OFS is the output connector field, the default output in the above example using space as the output field connector, modified by setting the variable OFS:

    $ awk -F ":" -v OFS="-" '{print $1,$2,$3}' /etc/passwd
    root-x-0
    daemon-x-1
    bin-x-2
    sys-x-3
    复制代码
  3. RS(record separator)

    In previous examples, awk are the default processing in units of text, each line of a stored record, because the default record separator RS to "\ n". There are not trying to text while stored as csv files and other units of lines, such as:

    # people.txt
    
    P1
    male
    15
    
    p2
    female
    20
    
    p3
    male
    19
    复制代码

    Above file using the "\ n \ n" separate records, each record in turn uses "\ n" separate the fields, it can be treated:

    $ awk -F "\n" -v RS="\n\n" '{print $1,$2,$3}' people.txt 
    P1 male 15
    p2 female 20
    p3 male 19
    复制代码
  4. ORS(output field separator)

    Similarly with the RS, setting the output record separator ORS.

    $ awk -F "\n" -v RS="\n\n" -v ORS="\n***\n" '{print $1,$2,$3}' people.txt
    P1 male 15
    ***
    p2 female 20
    ***
    p3 male 19
    ***
    复制代码
  5. NR(number of records)

    NR indicates the recording currently being processed is the first of several, if present in END block NR indicates the number of records processed

    $ awk -F ":" '{print "line" NR ":" $1,$2,$3}' /etc/passwd           
    line1:root x 0
    line2:daemon x 1
    line3:bin x 2
    line4:sys x 3
    复制代码

    If you are processing multiple files, then the number of entries will accumulate

    $ awk -F ":" '{print "record" NR ":" $1,$2,$3}' people.txt /etc/passwd
    record1:P1  
    record2:male  
    record3:15  
    record4:  
    record5:p2  
    record6:female  
    record7:20  
    record8:  
    record9:p3  
    record10:male  
    record11:19  
    record12:root x 0
    record13:daemon x 1
    record14:bin x 2
    record15:sys x 3
    复制代码
  6. NF(number of fields)

    NF denotes the number of fields in a record are separated so that the value provided FS related to:

    # 以 ":" 为分隔符
    $ awk -F ":" '{print "record" NR " with " NF " fields:" $1,$2,$3}' /etc/passwd
    record1 with 7 fields:root x 0
    record2 with 7 fields:daemon x 1
    record3 with 7 fields:bin x 2
    record4 with 7 fields:sys x 3
    
    # 以 "o" 为分隔符
    $ awk -F "o" '{print "record" NR " with " NF " fields:" $1,$2,$3}' /etc/passwd
    record1 with 7 fields:r  t:x:0:0:r
    record2 with 5 fields:daem n:x:1:1:daem n:/usr/sbin:/usr/sbin/n
    record3 with 3 fields:bin:x:2:2:bin:/bin:/usr/sbin/n l gin
    record4 with 3 fields:sys:x:3:3:sys:/dev:/usr/sbin/n l gin
    复制代码
  7. FILENAME

    FILENAME is the name of the file currently being processed

    $ awk -F ":" '{print FILENAME}' /etc/passwd people.txt
    /etc/passwd
    /etc/passwd
    /etc/passwd
    /etc/passwd
    people.txt
    people.txt
    people.txt
    people.txt
    people.txt
    people.txt
    people.txt
    people.txt
    people.txt
    people.txt
    people.txt
    复制代码

    This value after the start of the recording process makes sense, therefore try BEGIN block obtained null output FILENAME

  8. FNR

    NR indicates the number of items in front of the processing will be accumulated in a plurality of files, and the files in the current records indicates FNR of several

    awk -F ":" '{print "record" FNR ":" $1,$2,$3}' people.txt /etc/passwd
    record1:P1  
    record2:male  
    record3:15  
    record4:  
    record5:p2  
    record6:female  
    record7:20  
    record8:  
    record9:p3  
    record10:male  
    record11:19  
    record1:root x 0
    record2:daemon x 1
    record3:bin x 2
    record4:sys x 3 
    复制代码

Built-in functions

awk provides some built-in functions, text processing and to facilitate operations, including obtaining the string length length(), the random number is acquired rand(), the calculated sine and cosine sin()and cos().

These functions can be in the official manual query.

Record Screening

All of the above examples for each recording operation is performed, in fact can also be screened using conditions.

Regular judge

Use a regular expression pattern matching can record:

$ awk -F ':' '/root/ {print $1,$2,$3}' /etc/passwd 
root x 0
复制代码

Here containing selected rootrecords.

Conditional

Combined with built-in variables and functions awk can also be screened:

# 输出第一个字段长度大于 2 且在 第 1 条记录之后的记录
$ awk -F ':' 'length($1)>3 && NR>1 {print $0}' /etc/passwd
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
复制代码

The if statement

awk also provides an if statement:

# 输出第 3 个字段为 0 的记录
$ awk -F ':' '{if ($3==0)print $0}' /etc/passwd
root:x:0:0:root:/root:/usr/bin/zsh
复制代码

awk also for statement, in the form similar to the C language:

$ awk -v ORS="," 'BEGIN{ for(i=1;i<5;i++) print i}'  
1,2,3,4,
复制代码

Character and Numeric

awk support provides support for mathematical operators, and logical operators, and may also be cast directly between the awk string numbers, +0can be forced into a digital, stitching and spaces can be converted to a string:

awk 'BEGIN{print "origin\tnumber\tstring"}{print $0,"\t",$0+0,"\t",$0 ""}' people.txt
origin  number  string
P1       0       P1
male     0       male
15       15      15
         0       
p2       0       p2
female   0       female
20       20      20
         0       
p3       0       p3
male     0       male
19       19      1
复制代码

As used herein, the $0 ""representation of the original recording was spliced with spaces, but in some cases there will be a problem, consider the following statement:

$ awk 'BEGIN { print -12 " " -24 }'
-| -12-24
复制代码

A space where you want, but did not get the desired results in the -12 and -24 intermediate, which is due to mathematical operator priority over the splicing operation, the analytical sequence is as follows:

   -12 (" " - 24)
⇒ -12 (0 - 24)
⇒ -12 (-24)
⇒ -12-24
复制代码

To get the right results need to use the right combination of parentheses:

$ awk 'BEGIN { print -12 " " (-24) }'
-| -12 -24
复制代码

Guess you like

Origin blog.csdn.net/weixin_34122548/article/details/91364721