Three Musketeers ---- awk text processing Details

awk command format:

awk [ options ] 'program' file ...

Program: [/ the PATTERN /] the ACTION {Statement;} ...
the PATTERN part: deciding when and by what action statements triggering event to trigger;
the BEGIN, the END

ACTION statement: data specific treatment, usually placed in {} in;
Print, the printf

 

awk basic concepts:

Separator: (a delimiter input and output delimiter),

Record: Data in one row separated by a line feed, known as a record, with $ 0 to save the entire record

Field: After the partition of each data segment after delimited, is called a field (Field)

 

awk works:

1. First, a BEGIN {ACTION statement; ...} statement statement in the block;

2. Second, the standard input or reads a line from a file, according to the matching result of executing the content behind the PATTERN statement block ACTION; then, the process is repeated line by line data processing has been completed, until all the data has been read;
3 Finally, after all the statements in the block are executed, before exiting the awk process execution END {ACTION statement; ...} statement statement in the block;

Note:
. 1) the BEGIN block of statements is executed before the process starts awk data content; typically for generating a header; this block is optional statement statement block;
2) the END statement block after all data has been processed, will be execution; typically used for data collection; this block is optional statement statement block;
. 3) of the PATTERN statement block general command is the most important portion, the PATTERN statement block can not be omitted, but the ACTION may be omitted, if omitted, default execution print operation, that is: the display data in each row;
. 4) awk PATTERN statement block, when executed, the default loop through respective recording data;

 

Common options:

-f: loading program statement block from the specified file, rather than the contents of the program related to the line given by the command;

-F: Enter the specified field delimiter; default is a blank;

-v, --assign var = val: custom variables and is used to declare a variable assignment;

 

awk common usage:

1) Variable - built-in variables

FS: Input field separator, a blank field defaults

OFS: Output Field Separator, a blank field defaults

Example: awk -v FS = ':' -v OFS = ':' '{print $ 1, $ 3, $ 7}' / etc / passwd

RS [] The input Record separator: input record (line) separator, default newline

ORS [] The output Record separator: Output records (rows) separator, default newline

示例:awk  -v RS=":"   -v   ORS="#"   '{print  $0}'   /etc/passwd

NF [] The number of record: the total number of fields in each row

Example: awk -F ":" '{print NF}' / etc / passwd - passwd file number of display fields in each row

           awk -F ":" '{print $ NF}' / etc / passwd --- passwd file displayed in the last field

NR: Total [the total number of input records row]; if the processing of a document may be the value of NR as a file line numbers in each row

Example: awk '{print NR, $ 0}' / etc / fstab

FNR : the number of its line of statistics for different files, respectively;

Example:] # awk '{print FNR}' / etc / fstab / etc / issue

FILENAME name of the file currently being processed:

Example: # awk 'END {print FILENAME}' / etc / fstab 

2) variable - Custom Variables

Defined way : -v var_name = value variable names are case sensitive

Example: awk -v var = 'hello' -F: '{print $ 1 "," var}' / etc / passwd

 

Commonly used Action:

1) print: a standard output format

Format: print item1, item2, ...

Precautions: between the respective item are separated by commas; the output of each item can be a string, may be a digital, may be the current record fields can be a variable, it can be an expression of awk 

Example: awk '{print $ 1, $ 3, $ NF}' / etc / issue

2) printf: results of a specific output format

Format: printf "FORMAT" item1, item2, ..

Precautions: must be given a suitable output format ; default does not wrap , wrap if you want displayed in the output, to be displayed is given linefeed control symbols, i.e., \ n- ; the FORMAT need to separately specify an item behind each a formatted symbol;

Commonly used the FORMAT : Number% + letter designations such embodiment a plurality of digital characters displayed in a letter format%:% 20s

    % c: the content displayed in the ASCII character code table information;
    % D,% I: Display a decimal integer;
    % E,% E: scientific notation to display numbers; floating type;
    % F, F.%: Display a decimal floating-point digital form;
    % G,% G: displaying a floating point number in scientific notation;
    % U: unsigned decimal number display;
    % S: a display string;
    % X, X-%: showed no sign hexadecimal manufactured integer;
    %%:% display;

  Modifiers :

    # [. #]: The first digit is used to control the width of the display; the second number indicates the decimal point accuracy;
    as:% 5s, $ 8.3f

    -: indicates left aligned display; default is right-justified;
    +: digital sign is displayed;

Example: # awk -F: '{printf "% 20s:% - + 5d \ n", $ 1, $ 3}' / etc / passwd

 

Operator:

Arithmetic operators:

    Binary operator must be calculated by two numbers: (x + y, xy, x * y, x / y, x ^ y, x% y)

    Unary: (+ x; -x)

Example: awk -F: 'END {print 100 ^ 2}' / etc / passwd

String operator:

When no operation, i.e., string concatenation operator

Assignment operator : =, + =, - =, * =, / =, ^ =,% =, +, -

Comparison operators : ==, =,>,> =, <, <=!

Example: awk -F: '$ 3> = 1000 print $ 1}' / etc / passwd

Pattern matching operators:

~: String operator whether the left side can be matched to the right of PATTERN

! ~: String operator whether the left side can not be matched to the right of PATTERN

Example: awk -F: '$ NF ~ / bash / {print $ 0}' / etc / passwd

Logical operator that : &&; || ;!

Example: awk -F: '$ 3> = 1000 && $ 3 <= 1000000 {print $ 0}' / etc / passwd

Conditional expression:

selector(condition)?if-true-expression:if-false-expression

示例:awk -F: '{$3>=1000?usertype="Common User":usertype="SuperUser or Sysuser";print$1,":",usertype}' /etc/passwd

root: SuperUser or Sysuser
bin: SuperUser or Sysuser
daemon: SuperUser or Sysuser
adm: SuperUser or Sysuser
lp: SuperUser or Sysuser

。。。

 

PATTERN part:

1) empty: empty mode, without addition processing for each row of the file area

[!] 2) / REGEXP / : only [not] processing pattern can be matched to the line

Example: awk -F: '/ ^ r / {print $ 0}' / etc / passwd

3) relational expression  

      $3>1000

      $NF~/bash/

Example: awk -F: '$ 3> = 1000 print $ 1}' / etc / passwd

awk  -F ":" '$3>100{ print$1,":",$7}'

Example:

4) the domain lines: Line range

   Logical operation relational expression: FNR> = 10 && FNR <== 20

示例:awk 'NR>=15&&NR<=20{print NR,$0}'   /etc/passwd

   / REGEXP1 /, / REGEXP2 /: start line is matched REGEXP1 from the line until the end of the match REGEXP2, all lines during this period; all matches fall into this category, how many groups will show how many groups;

Example: awk -F: '/ ^ r /, / ^ a / {print NR, $ 0}' / etc / passwd

 

BEGIN / AND mode:

{} The BEGIN : a block of statements executed before the first line of text data file only at the beginning of the process; multi-specific format for outputting the header information;

示例:awk -F: 'BEGIN{printf "%20s %10s %20s\n","USERNAME","USERID","SHELL"}NR>=15&&NR<=20{printf "%20s %10s %20s\n",$1,$3,$7}' /etc/passwd

END {} modes : text processing is completed, but only in a block of statements when executed awk command has not exited; a plurality of aggregated data;

示例:# awk -F: 'BEGIN{printf "%20s %10s %20s\n","USERNAME","USERID","SHELL"}NR>=15&&NR<=20{printf "%20s %10s %20s\n",$1,$3,$7}END{print "------------------------------------------------------------------\n",NR " users"}' /etc/passwd

Note: the BEGIN statement block, statement blocks and sequentially PATTERN END block is usually BEGIN {} PATTERN {} END { }

 

Commonly used ACTIONS

1) expression

2) a combination of statements

3) Enter the statement

4) Output Statement

5) Control statements  

 

Control statements:

if...else:if  (condition) statement [else statement]

while循环:while(condition)statement

do...while语句:do statement while (condition)

for loop:

   for(expr1;expr2;expr3) statement

   for (var  in array)  statement

break and continue statements:

   break

   continue

exit [expression]

switch...case语句:switch (expresion)  { case value|regex : statement ... [ default: statement ] }

next statement: next

 

1) if ... else syntax

  Syntax : if (condition) statement [else statement] Usually for an entire row or a field awk some conditions are achieved;

  示例: #awk -F: '{if($3>=1000){print "CommonUser:",$1}else{print "Sysuser:",$1}}' /etc/passwd

             # awk '/^[^#]/{if(NF==6){print}}'  /etc/fstab

2)while循环

  语法:while (condition)statement :对一行内的多个字段逐一做相同或类似的操作处理时使用;对数组中的各数组元素做遍历时使用

  特点:条件为真,则进入循环,一旦条件为假,则退出循环

  示例:判断字符串长度:awk '{i=1;while(i<=NF){print $i,length($i);i++}}'   testfile

3)do...while语句

  语法:do  statement while (condition)

  意义:与while循环相同,但statement语句至少被执行一次

4)for循环

  语法:for  (expr1,expr2,expr3) statement

        expr1:变量初赋值

        expr2:循环条件判断

        expr3:变量值修正方法

   示例:awk '{for(i=1;i<=NF;i++){print $i,length($i)}}'   testfile

5)switch...case语句:用于对字符串进行比较

       switch (expression) { case value|regex:statement;case value2|regex2:statement;... [ default: statement ] }

6)break和continue语句:使用场景是多个字段间做循环时的循环控制方式

      示例:awk '{for(i=1;i<=NF;i++){if(length($i)<5){continue}else{print $i,length($i)}}}' testfile

7)next语句

      在awk处理数据时,提前结束对当前行的处理,而直接开始处理下一行;

      示例:awk -F: '{if($3%2==1){next}else{print $1,$3}}' /etc/passwd

 

数组--Array

    用户自定义的数组,一般使用关联数组:array_name[index_expression]

    注意:index_expression可以使用任意的字符串,但字符串必须放在双引号内;

    示例:awk 'BEGIN{name["leader"]="zhangsan";name["mem1"]="lisi";name["mem2"]="bob";print "Leader:",name["leader"],"Member:",name["mem1"],name["mem2"]}'

               awk 'BEGIN{name["leader"]="zhangsan";name["mem1"]="lisi";name["mem2"]="bob";for(i in name){print name[i]}}'

               查看当前系统上所有服务不同的TCP状态的连接数量的统计。

               netstat -nalt | awk '/^tcp\>/{state[$NF]++}END{for(stat in state){printf "%15s: %-10d\n",stat,state[stat]}}'

Guess you like

Origin www.cnblogs.com/wzylhj/p/12156721.html