1. Extract the fields of a column
For the convenience of illustration, a small data file (ip_cluster.txt) is prepared as follows
ip core_num model_name 11.20.51.204 16 example_gbdt 11.20.51.205 16 example_gbdt 11.20.51.203 16 example_gbdt 11.20.246.134 16 example_gbdt 11.20.246.133 16 example_gbdt 11.20.246.131 8 example_dnn 11.20.246.130 8 example_dnn 11.20.246.129 8 example_dnn 11.20.246.128 8 example_dnn 11.20.244.121 8 example_dnn
- Extract the machine corresponding to the example_gbdt model
awk ' {if ($ 3 == "example_gbdt") {print $ 1}} ' ip_cluster.txt # If the model name is placed in the model_name variable, pay attention to the difference in quotation marks model_name = example_gbdt awk ' {if ($ 3 == " ' $ model_name ' ") {print $ 1}} ' ip_cluster.txt
The above command can also omit if () and write the judgment condition directly. But I personally feel that if the program is easier to understand, the word if should be added.
2. Interactive input
If you do not add the following file (ip_cluster.txt), awk will stop there, waiting for the user to enter interactively, without entering a line (called a record in awk), and perform a process.
You can also input through the pipeline, such as
cat ip_cluster.txt | awk ...
3. Define variables
# Define the built-in variable
awk ' BEGIN {OFS = ","} {print $ 1, $ 2, $ 3} ' ip_cluster.txt
# Define the common variable
awk 'BEGIN {a = 1; b = 2} {print a + b}'
4. Built-in variables
Commonly used variables are
- NR: Number of Record, which can be understood as the number of lines
- NF: Number of Field, can be understood as the number of columns
- FS: Field Separator, input separator
- OFS: Out Field Separator, output separator
5. Import multiple files
Multiple files can be added directly after awk, the effect when processing is as if the multiple files are stitched together.
6. Column assignment, manual coding
awk ' {$ 3 = "xxxx"; print $ 0} ' ip_cluster.txt # output effect ip core_num xxxx 11.20 . 51.204 16 xxxx 11.20 . 51.205 16 xxxx 11.20 . 51.203 16 xxxx 11.20 . 246.134 16 xxxx 11.20 . 246.133 16 xxxx 11.20 . 246.131 8 xxxx 11.20 . 246.130 8 xxxx 11.20 . 246.129 8 xxxx 11.20.246.128 8 xxxx 11.20.244.121 8 xxxx
7. Cooperation with regular expressions
# Regular expression is written between two slashes awk ' /example_.*/{print $ 0} ' ip_cluster.txt
Briefly review the basic regular expressions
- ^ Start, $ End
- [] One character, or OR relationship, such as [xyz], [a-zA-Z], [^ az] Note: "^" in square brackets means inverse
- * Occurs zero or more times
- + Appears one or more times
- ? May or may not
- {} ab {3} c, can match abbbc; {} can also be a range, such as {3,4}, {3,}
- () A piece of things can be seen as a whole, such as (ab) + c, can match ababc
8. Summary
Awk is very powerful, especially on the determinant data set, which shows a strong expression. With a data file, it is comparable to a small database.