SHELL script of commonly used text processing commands ⑤awk


Linux in both a command, is a programming language for processing data and generating reports. awk composed of three names, no special significance in itself.

grammar

  • awk [options] '[pattern]{actions}' [var1=value1 var2=value2 ...] file ...
  • awk [options] -f scripts_filename [var1=value1 var2=value2 ...] file ...

options Options

  • -F: Field separator, character segmentation, default blank characters (including spaces, tabs, etc.)
  • -f SCRIPTS_FILE: For performing the specified script file
  • -v: Variable declaration
    • awk -v a=1 -v b=2 '{print a,b}'

pattern mode

  • BEGIN
    • Specified actions that occur before the first record is processed
    • Initialization statement for defining, built-in or user variables, initialization format
  • END
    • Specifies the action occurring after the last record to be processed
    • Output for summary
  • Processing (empty)
    • Application instructions for all rows
    • /RegExp/: Dealing with regular expressions to match lines
    • /RegExp1/, /RegExp2/: A row of the specified range, such as/^a/, /^b/
    • Conditional expression
      • Relational operators:>, <,> =, <=, ==, =, a string or number of standard plastic!
      • Regular expression matching: ~, ~, like!awk -F: ‘$1 ~ /^a.*e$/{print $1,$3}’

action operation

  • Variable or array assignment
    • var=value
    • name="alice"
    • Array
      • a["age"]="haha"
      • a[1]="xxx"
  • Formatted output
    • A specified field $Nexpressed $0as a whole line, does not require additional reference variables$
    • pirnt
    • printf: C language usage with almost can be used %s, %dsuch as placeholders
  • Built-in functions
    • sub
    • substr
    • Wait
  • Control flow commands
    • if/else
    • for
    • while

awk script file format

  • Line multiple statements separated by semicolons:action1;action2

  • Each statement on separate lines, do not want to be separated by semicolons needed

  • If the operation followed a pattern, it must be left brace on the same line with the pattern (like the habit of writing java)

    BEGIN {
    	action1
    	action2
    }
    
  • Comment:#

Records and fields

  • Record definition:RecordEach line in the file is referred to as a record, separated by line feed
    • Structure of the input data has a fixed format
    • Not endless string
    • Built-in variables:
      • $0: The contents of the current row, the contents of the current record
      • ORS: Output record separator, an output record delimiter, defaults to wrap
      • RS: Record separator, record delimiter, defaults to wrap
      • NR: Number of records, the current processThe total record number
      • FNR: File's number of record, currently being processedRecord number of the file, Awk can handle a plurality of files, a new file is read when FNRre-count, so FNRNR
  • Field Definitions:Field, Each record consisting of a plurality of fields, separated by delimiter between fields
    • Built-in variables:
      • OFS: Output field separator, the output field delimiter, the default is a space
      • FS: Field separator, field separator, default whitespace
      • NF: Number of fields, number of fields in the current record
    • Field division:
      • By -F option to modify FSthe means:awk -F: ‘{print $1}’
      • Using multiple field separator, the delimiter placed in []:echo -e “a:b\tc” | awk -F’[:\t]’ ‘{print $1, $3}’

Formatted output

print

  • Parameters may be variable, the calculation expression, a string variable
  • String must beDouble quotesQuotes:echo 300 2 3 4 | awk '{print "hello" }'
  • Between parameters separated by commas:echo 300 2 3 4 | awk '{print $1,$2 }'
  • Output can be redirected:echo 300 2 3 4 | awk '$1 * $2 > 500{print $3+$4 >> "/tmp/test" }'
  • Input and output pipe through:echo 300 2 3 4 | awk '$1 * $2 > 500{print $3+$4 | "grep 7" }'
    • Note that this is the result of all the operation via conduit
  • Escape Sequences
    • \n: Newline
    • \t:Tabs
    • \r: Enter
    • \047: 47 octal value

printf

  • Returns a string with a standard output format: f, format

  • End of the line does not wrap

  • Comprising adding a "" control string

  • Modifiers

    • -: Left-justified, right-justified by default
    • +: Digital plus and minus signs (+, -)
  • Description format conversion specifier %

    • %c: Character, character
    • %s: String, string
    • %d: Digit, integer
    • %f: Fload, float
[root@localhost ~]# echo alice 23 01 |awk ' {printf "The name is: %-15s ID is %8d\n",$1,$3}'
The name is: alice           ID is        1

Exercise

  • test file contents:
Mary   2143 78 84 77
Jack    2321 66 78 45
Tom     2122 48 77 71
Mike    2537 87 97 95
Bob     2415 40 57 62
  • Output results table:
Lineno.   Name    No.    Math   English   Computer    Total
------------------------------------------------------------
1        Mary    2143    78      84        77         239     
2        Jack    2321    66      78        45         189     
3        Tom     2122    48      77        71         196     
4        Mike    2537    87      97        95         279     
5        Bob     2415    40      57        62         159     
------------------------------------------------------------
Total:                   319     393       350                  
Avg:                     63.8    78.6      70                   
  • by .awk
BEGIN {
	printf "%-10s%-10s%-10s%-10s%-10s%-10s%-10s\n","LNO","Name","No","Math","English","Computer","Total"
	printf "------------------------------------------------------------------\n"
}
{
	math+=$3;english+=$4;com+=$5;printf "%-10s%-10s%-10s%-10s%-10s%-10s%-10s\n",NR,$1,$2,$3,$4,$5,$3+$4+$5
}
END {
	printf "------------------------------------------------------------------\n"
	printf "%-30s%-10s%-10s%-10s\n","Total:",math,english,com
	printf "%-30s%-10s%-10s%-10s\n","Avg:",math/NR,english/NR,com/NR
}
  • By command
awk 'BEGIN{math=0;eng=0;com=0;printf "Lineno.   Name    No.    Math   English   Computer    Total\n";printf "------------------------------------------------------------\n"}{math+=$3; eng+=$4; com+=$5;printf "%-8s %-7s %-7s %-7s %-9s %-10s %-7s \n",NR,$1,$2,$3,$4,$5,$3+$4+$5} END{printf "------------------------------------------------------------\n";printf "%-24s %-7s %-9s %-20s \n","Total:",math,eng,com;printf "%-24s %-7s %-9s %-20s \n","Avg:",math/NR,eng/NR,com/NR}' test

Programming structure

Relational expression

  • < , <= , >, >= , == , !=
  • ~ 和 !~: Match / mismatch operator for expression to record or field matches
    • awk '$1 ~ [Ll]ove{printf }' file
    • awk '$1 !~ [Ll]ove' file

Composite mode

  • &&: Logic and
    • echo | awk 'a>b && a!=0 {print a}' a=2 b=1
  • ||: Logical or
    • echo | awk 'a>b || a!=0 {print a}' a=1 b=2
  • !: Logical NOT
    • echo | awk '! a!=0 {print a}' a=0 b=2

Conditions assignment

  • var=(EXP)?var1:var2: In fact, three head expression
    • echo| awk’{max=(a>b)? a:b; print max}’ a=1 b=2

Range mode

  • /RegExp1/,/RegExp2/
    • awk '/^a/,/^b/{print $0}' file

Arithmetic

  • +,-,*,/,%,^或**: Addition, subtraction, multiplication, division, remainder, exponentiation
  • ++,--: Plus or minus one yuan
  • =,+=,-=,*=,/=,%=,^=,**=: Assignment Operators

function

Built-in functions

String Functions
  • sub(/RegExp/,"STR1"[,"STR2"]) FILENAME: Substitute, replace

    • Find the record can match the regular expressionLeftmost longest substringFollowed by replacement string STR1substring replace the found
    • If the target string is specified on the search to replace the target string is not specified, the current record deal
    • Only appears for each lineFirst matchReplacement
    • Example: search STR2 RegExp content matching replaced STR1
      • echo "abca" | awk '{sub(/a/,"b");print}'
      • echo "ab:ca" | awk -F: '{sub(/a/,"b",$1);print}'
  • gsub(/RegExp/,"STR1"[,"STR2"]) FILENAME: Global substitute, global replacement, usage, and subthe same

  • index("字符串","查找子串"): Returns a string substringThe first timePosition, starting from a calculated index number, returns 0 if no

    • echo "ab:cd" | awk -F: '{print index($1,"a")}'
    • echo "ab:cd" | awk -F: '{print index($1,"b")}'
  • substr("字符串",开始位置,长度): Returns a string substring starting from the specified position

    • echo "abcdef" | awk '{print substr($0,1,3)}'
  • match("字符串",/正则表达式/): Returns the position of the regular expression appearing in the string, it returns 0 if it does not occur

    • Built-in variables RSTARTof the matched substring starting position in the string,Subscript starts at 1

    • Built-in variables RLENGTHfor the substring length

    • By substrstring matching the extracted sub

    • echo "abcdefABC" | awk '{match($0,/[[:upper:]]+/);print RSTART,RLENGTH;print substr($0,RSTART,RLENGTH)}'

      [root@localhost ~]# echo "abcdefABC" | awk '{match($0,/[[:upper:]]+/);print RSTART,RLENGTH;print substr($0,RSTART,RLENGTH)}'
      7 3
      ABC
      
  • split("字符串",数组名,"字段分隔符"): Third argument specifies the field separator splits the string into an array

    • echo "18/01/31" | awk '{split($0,date,"/");print date[1],date[2],date[3]}'
  • length("字符串"): Returns the length of the string (in characters)

  • blength("字符串"): Returns the length of the string (in bytes)

  • tolower("字符串"): Each string of characters will be uppercase to lowercase

  • toupper("字符串"): Each string of characters changed to uppercase lowercase

Time Functions
  • mktime( YYYY MM DD HH MM SS): Time stamp is generated by a string of numbers represented

    • awk 'BEGIN{tstamp=mktime("2013 01 04 12 12 12"); print tstamp}'
    [root@localhost ~]# awk 'BEGIN{tstamp=mktime("2013 01 04 12 12 12"); print tstamp;}'
    1357319532
    
  • systime(): Get the timestamp, returns the integer number of seconds since January 1, 1970 start to the current time (not counting leap years) of

  • strftime([格式 [, 时间戳]]): Output Formatting time, the time stamp into a string, a time stamp representing the current time is omitted

    [root@localhost ~]# awk 'BEGIN{tstamp=mktime("2013 01 04 12 12 12"); print strftime("%c", tstamp);}'
    2013年01月04日 星期五 12时12分12秒
    [root@localhost ~]# awk 'BEGIN{tstamp=mktime("2013 01 04 12 12 12"); print strftime("%c");}'
    2020年03月24日 星期二 05时29分42秒
    [root@localhost ~]# awk 'BEGIN{tstamp=mktime("2013 01 04 12 12 12"); print strftime();}'
    二 3月 24 05:32:04 EDT 2020
    
Arithmetic function
  • int(x): ToSei
  • rand(): Returns the random number n, where 0 ≤ n <1
  • sqrt(x):square root
Other functions
  • sytem("CMD"): Call the system command execution

Custom Functions

# 格式
function_name(参数1,参数2,...){
	statements
    return expression
}

variable

User-defined variables

  • Variable names consist of letters, numbers, underscores, can not start with a number
  • awk can be deduced from the context of its data type
  • Included in the string variable to "" in

Built-in variables

Attributes Explanation
$0 The current record (as a single variable)
$1~$n N-th current record fields, fields are separated by FS between
FS The default input field separator is a space
OFS Default output field separator is a space
RS Input record separator, default newline
ORS Output record separator, default newline
NF The number of fields in the current record is the number of columns
NO The number of records have been read out, is the line number, starting at 1
FNR The current record file number
ARGC The number of command line parameters
ARGV An array of command line arguments
FILENAME Enter the name of the current file
IGNORECASE If true, ignore case matching is performed
RSTART The matching function matches the first string
RLENGTH Matching function matches the length of the string

Redirect

Output redirection

  • >echo | awk '{print "abc" > "test" }
  • >>

Input redirection

  • getline
    • If a record obtained, the function returns getline 1, if the end of file is returned to 0, if an error occurs, such as opening a file fails, it returns -1, while, etc. may be incorporated into the flow control statements used
    • awk 'BEGIN{getline < "-";print}'
    • awk 'BEGIN{getline < "test.txt"; print $0}'
    • awk 'BEGIN{ while( getline line < "test" ){ print line } }'

pipeline

You can open a pipe in awk, butThe same time only by the presence of a pipeline: If the first conduit is not closed on the use of the second pipeline, the data is still transmitted to the first conduit, since the previous data and the file descriptor of the pipe (pointer) to establish a further connection. By close () closable duct

  • awk '{print $1, $2 | "sort" }' test END {close("sort")}

Conditional statements

  • if/else

    if (expression) {
    	statement;
    	...
    }
    else if (expression) {
    	statement;
    	...
    }
    else {
    	statement;
    }
    

cycle

  • while

    while (expression) {
    	statement;
    	...
    }
    
  • for

    for (i=1;i<=NF;i++){
    	statement;
    	...
    }
    
    for (i in Array){
    	# 注意这里取出来的i为Array的key
        print i,Array[i]
    }
    

Program control

  • next: Skip the current record, the next record is read
  • exit N: Exit awk execution

Array

  • Subscripts may be a number or string (corresponding dictionary)
  • Array elementNot sequential storage
  • Commonly referred to as a key index (key)
    • Variables can be used as an array subscript
    • Field values ​​as array subscripts
    • Loop through the elements in the array:for (i in Arrayname) {print Arrayname[i]}
      • I read out here is key, not value
  • Tcp statistics of the number of links
[root@localhost ~]# netstat -ant | awk '/^tcp/ {++state[$NF]} END {for(key in state) print key,"\t",state[key]}'
LISTEN 	 12
ESTABLISHED 	 2
  • Statistics are of different tcp ip link
[root@localhost ~]# netstat -ant | awk '/^tcp/ {n=split($(NF-1),array,":");if(n<=2)++S[array[(1)]];else++S[array[(4)]];++s[$NF];++N} END {for(a in S){printf("%-20s %s\n", a, S[a]);++I}printf("%-20s %s\n","TOTAL_IP",I);for(a in s) printf("%-20s %s\n",a, s[a]);printf("%-20s %s\n","TOTAL_LINK",N);}'
*                    6
192.168.159.1        2
0.0.0.0              6
TOTAL_IP             3
LISTEN               12
ESTABLISHED          2
TOTAL_LINK           14
[root@localhost ~]# netstat -ant | awk '/^tcp/ {n=split($(NF-1),array,":");if(n<=2)++S[array[(1)]];else++S[array[(4)]];++s[$NF];++N} END {for(a in S){printf("%-20s %s\n", a, S[a]);++I}printf("%-20s %s\n","TOTAL_IP",I);for(a in s) printf("%-20s %s\n",a, s[a]);printf("%-20s %s\n","TOTAL_LINK",N);}'
*                    6
192.168.159.1        2
0.0.0.0              6
TOTAL_IP             3
LISTEN               12
ESTABLISHED          2
TOTAL_LINK           14

awk knowledge

Published 42 original articles · won praise 2 · Views 918

Guess you like

Origin blog.csdn.net/weixin_42511320/article/details/105081966