Shell programming-awk text analysis tool has strong practicality (this article contains detailed graphic examples of operation)

AWK command introduction

  • AWK is a language for processing text files and a powerful text analysis tool
  • It is called AWK because it takes the first character of the Family Name of the three founders Alfred Aho, Peter Weinberger, and Brian Kernighan

working principle

  • Read the text line by line, separated by space or tab by default, save the separated fields to built-in variables, and execute editing commands according to the mode or condition
  • The sed command is often used to process a whole line, while awk tends to divide a line into multiple "fields" and then process it. The reading of awk information is also read line by line, and the execution result can be printed and displayed with the field data through the print function. In the process of using the awk command, you can use logical operators to "&&"represent "and", "||"represent "or", and "!"represent "not"; you can also perform simple mathematical operations, such as +, -, *, /, %, ^, respectively Add, subtract, multiply, divide, take remainder and power

Command format

Insert picture description here
Insert picture description here

AWK common built-in variables

Can be used directly

  • FS: Column separator. Specify the field separator for each line of text, the default is a space or a tab stop. Same as "-F"
  • NF: The number of fields in the currently processed row
  • NR: The row number (ordinal number) of the row currently being processed
  • $0: The entire line content of the currently processed line
  • $n: The nth field of the currently processed row (the nth column)
  • FILENAME: The name of the file being processed
  • RS: Line separator. When awk reads data from the file, it will cut the read data into many records according to the definition of RS, while awk only reads one record at a time for processing. The default value is\n

Application examples

  • One of the daily tasks of operation and maintenance, if you need to find out the username, user ID, group ID, etc. of /etc/passwd, you can execute the awk command

Output text by line

  • awk '{print}' testfile2Output all content in a '{print}'fixed format
  • awk '{print $0}' testfile2$0 means all
    Insert picture description here
  • awk 'NR==1,NR==3{print}' testfile2Output 1~3 lines of content
  • awk '(NR>=1)&&(NR<=3) {print}'Output 1~3 lines of content
    Insert picture description here
  • awk 'NR==1;NR==3{print}' testfile2Output 1 and 3 lines
  • awk 'NR==1||NR==3{print}' testfile2Output 1 and 3 lines
    Insert picture description here
  • awk '(NR%2)==1{print}' testfile2Output all odd lines
  • awk '(NR%2)==0{print}' testfile2Output all even lines
    Insert picture description here
  • awk '/^root/{print}' /etc/passwdOutput the line starting with root
    Insert picture description here
  • awk '/bash$/{print}' /etc/passwdOutput the line ending in bash
    Insert picture description here
  • awk 'BEGIN {x=0};/\/bin\/bash$/{x++};END {print x}' /etc/passwdSet x=0 to find the content at the end of /bin/bash to find x++ once, and then execute print x. The last value of x is the number of times found
  • BEGIN mode means that before processing the specified text, you need to perform the action specified in the BEGIN mode ; awk processes the specified text, and then executes the action specified in the END mode. The END{} statement block is often placed in printing Results and other statements
    Insert picture description here

Output text by field

  • awk -F ":" '{print $3}' /etc/passwdOutput the third field in each line (separated by a colon)
    Insert picture description here
  • awk -F ":" '{print $1,$3}' /etc/passwdOutput the first and third fields in each row
    Insert picture description here
  • awk -F ":" '$3<5{print $1,$3}' /etc/passwdOutput the contents of the first and third fields where the value of the third field is less than 5
    Insert picture description here
  • awk -F ":" '!($3<200){print }' /etc/passwdOutput the rows where the value of the third field is not less than 200! Negate
  • awk 'BEGIN {FS=":"};{if($3>200){print}}' /etc/passwdProcess the content of BEIGIN first and then print the content in the text
    Insert picture description here
  • awk -F ":" '{max=($3>$4)?$3:$4;{print max}}' /etc/passwd($3>$4)?$3:$4 ternary operator, if the value of the third field is greater than or equal to the value of the fourth field, then the value of the third field is assigned to max, otherwise the value of the fourth field is assigned Give max
    Insert picture description here
  • awk '{print NR,$0}' /etc/passwdOutput the content and line number of each line, each time a record is processed, the NR value is increased by 1
    Insert picture description here
  • awk -F ":" '$7~"/bash"{print $1}' /etc/passwdThe output is the first field of the line that is separated by a colon and contains /bash in the seventh field
    Insert picture description here
  • awk -F ":" '($1~"root")&&(NF==7){print $1,$2}' /etc/passwdOutput the first and second fields of a row with 7 fields containing root in the first field
    Insert picture description here
  • awk -F ":" '($7!="/bin/bash")&&($7!="/sbin/nologin"){print}' /etc/passwdOutput the seventh field is neither /bin/bash nor all lines in /sbin/nologinInsert picture description here

Invoke Shell commands through pipes and double quotes

  • echo $PATH | awk 'BEGIN{RS=":"};END{print NR}'Count the number of text paragraphs separated by colons. In the END{} statement block, statements such as print results are often placed
    Insert picture description here
  • awk -F: '/bash$/{print | "wc -l"}' /etc/passwdCall the wc -l command to count the number of users using bash, which is equivalent to grep -c "bash$" /etc/passwd
    Insert picture description here
  • free -m | awk '/Mem:/ {print int($3/($3+$4)*100)"%"}'View the current memory usage percentage
    Insert picture description here
  • top -b -n 1 | awk -F "," 'NR==3{print $4}' | awk '{print $1}'View the current CPU idle rate, (-b -n 1 means that only one output is required)
    Insert picture description here
  • date -d "$(awk -F "." '{print $1}' /proc/uptime) second ago" +"%F %H:%M:%S"Shows the last system restart time, which is equivalent to uptime; second ago is the time displayed in seconds ago, +"%F %H:%M:%S" is equivalent to +"%Y-%m-%d %H:% M:%S" time format
    Insert picture description here
  • awk 'BEGIN {while ("w" | getline) n++ ; {print n-2}"%"}'Invoke the w command and use it to count the number of online users
    Insert picture description here
  • awk 'BEGIN {"hostname" | getline ; {print $0}}'Call hostname and output the current hostname
    Insert picture description here
  • When no redirector about getline <or |when, awk read the first row is 1, then getline, get a second row below is 2, because then getline, awk will change the corresponding NF, NR, FNR And $0 and other internal variables, so the value of $0 at this time is no longer 1, but 2, and then print it out
  • When there is a redirection symbol <or |on the left and right of getline, getline acts on the directional input file. Since the file is just opened and has not been read by awk, it is only read by getline. Then getline returns the first line of the file. Instead of interlacing
  • cat 0923.txt | awk '{print $0; getline}'How do I print it first, then it will display 13579... OK
  • cat 0923.txt | awk '{getline;print $0}'Display getline first, then display 2468... lines
    Insert picture description here
    Insert picture description here

Advanced application

  • seq 9 | sed 'H;g' | awk -v RS='' '{for(i=1;i<=NF;i++)printf("%dx%d=%d%s", i, NR, i*NR, i==NR?"\n":"\t")}'
    Insert picture description here
  • ls -l *.txt | awk '{sum+=$5} END {print sum}'Statistics file size
    Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_53496398/article/details/114916349