Awk common usage (Part 1)

1. Extract the fields of a column

For the convenience of illustration, a small data file (ip_cluster.txt) is prepared as follows

ip               core_num     model_name
11.20.51.204     16           example_gbdt
11.20.51.205     16           example_gbdt
11.20.51.203     16           example_gbdt
11.20.246.134    16           example_gbdt
11.20.246.133    16           example_gbdt
11.20.246.131    8            example_dnn
11.20.246.130    8            example_dnn
11.20.246.129    8            example_dnn
11.20.246.128    8            example_dnn
11.20.244.121    8            example_dnn
  • Extract the machine corresponding to the example_gbdt model
awk  ' {if ($ 3 == "example_gbdt") {print $ 1}} ' ip_cluster.txt 

# If the model name is placed in the model_name variable, pay attention to the difference in quotation marks 
model_name = example_gbdt
 awk  ' {if ($ 3 == " ' $ model_name ' ") {print $ 1}} ' ip_cluster.txt

The above command can also omit if () and write the judgment condition directly. But I personally feel that if the program is easier to understand, the word if should be added.

2. Interactive input

If you do not add the following file (ip_cluster.txt), awk will stop there, waiting for the user to enter interactively, without entering a line (called a record in awk), and perform a process.

You can also input through the pipeline, such as

cat ip_cluster.txt | awk ...

3. Define variables

# Define the built-in variable 
awk
' BEGIN {OFS = ","} {print $ 1, $ 2, $ 3} ' ip_cluster.txt

# Define the common variable
awk 'BEGIN {a = 1; b = 2} {print a + b}'

4. Built-in variables

Commonly used variables are

  • NR: Number of Record, which can be understood as the number of lines
  • NF: Number of Field, can be understood as the number of columns
  • FS: Field Separator, input separator
  • OFS: Out Field Separator, output separator

5. Import multiple files

Multiple files can be added directly after awk, the effect when processing is as if the multiple files are stitched together.

6. Column assignment, manual coding

awk  ' {$ 3 = "xxxx"; print $ 0} ' ip_cluster.txt 

# output effect 
ip core_num xxxx 
11.20 . 51.204  16 xxxx
 11.20 . 51.205  16 xxxx
 11.20 . 51.203  16 xxxx
 11.20 . 246.134  16 xxxx
 11.20 . 246.133  16 xxxx
 11.20 . 246.131  8 xxxx
 11.20 . 246.130  8 xxxx
 11.20 . 246.129  8 xxxx
 11.20.246.128 8 xxxx
11.20.244.121 8 xxxx

7. Cooperation with regular expressions

# Regular expression is written between two slashes
 awk  ' /example_.*/{print $ 0} ' ip_cluster.txt

Briefly review the basic regular expressions

  • ^ Start, $ End
  • [] One character, or OR relationship, such as [xyz], [a-zA-Z], [^ az] Note: "^" in square brackets means inverse
  • * Occurs zero or more times
  • + Appears one or more times
  • ? May or may not
  • {} ab {3} c, can match abbbc; {} can also be a range, such as {3,4}, {3,}
  • () A piece of things can be seen as a whole, such as (ab) + c, can match ababc

8. Summary

Awk is very powerful, especially on the determinant data set, which shows a strong expression. With a data file, it is comparable to a small database.

Guess you like

Origin www.cnblogs.com/anhongyu/p/12725666.html