awk combat usage

Work together to create and grow together! This is the 34th day of my participation in the "Nuggets Daily New Plan · August Update Challenge", click to view the details of the event

There is a very interesting news recently. The 80-year-old computer expert Brian Kernighan, also known as the K in K & R, is still contributing code to awk and adding support for Unicode. Please see the detailed background and interview. original .

Ashamed to say, I have used awk many times, intentionally or unintentionally, in the past few years, but I always forget to use it, forget to check it again, and have never systematically sorted it out. Just take this opportunity to review the positioning and common usage of awk, a classic tool, similar to grep, sed, sort, curl, etc.

Today we will review awk first .

awk

In Unix-like systems, awk is a built-in tool, which is usually used for data filtering, text processing, and similar to sed and grep, which is essentially a filter filter.

The awk tool defines its own scripting language to process text data through a series of actions. If you are interested, you can refer to The GNU Awk User's Guide for the complete documentation .

awk does not require compilation and allows users to use variables, numeric functions, string functions, and logical operators. With its help, developers can write very sophisticated but effective scripts, define a set of text search patterns, and find the action to be executed after.

Many students are more interested in why such a tool is called awk. In fact, its meaning is the name of the three founders of the mountain:

Ahoy, Weinberger, Kernighan

Ability Analysis

What can we do with awk?

  1. Scan the file content line by line;
  2. Split the input line of data into multiple fields;
  3. Pattern matching on input data;
  4. 针对匹配的行做一些操作(action)。

基本语法

awk 命令的格式如下:

awk options 'selection _criteria {action }' input-file > output-file
复制代码

事实上如果大家忘记了,直接在自己的 mac 或者 linux 上运行 awk,就会出现帮助提示:

$ awk

usage: awk [-F fs] [-v var=value] [-f progfile | 'prog'] [file ...]
复制代码

这里提供了两个选项

-f program-file : Reads the AWK program source from the file 
                  program-file, instead of from the 
                  first command line argument.
-F fs            : Use fs for the input field separator
复制代码

实战用法

假定我们有一个 employee.txt 文件,内容如下:

ajay manager account 45000
sunil clerk account 25000
varun manager sales 50000
amit manager account 47000
tarun peon sales 15000
deepak clerk sales 23000
sunil peon sales 13000
satvik director purchase 80000 
复制代码

awk 默认会打印指定文件里所有行:

$ awk '{print}' employee.txt
复制代码

此时因为没有提供需要匹配的模式,所以这个 print action 对于所有行都适用,并且 print action 如果没有参数就会默认打印一行里的所有内容。

所以运行上面命令,会打印出来原本的文本内容

ajay manager account 45000
sunil clerk account 25000
varun manager sales 50000
amit manager account 47000
tarun peon sales 15000
deepak clerk sales 23000
sunil peon sales 13000
satvik director purchase 80000 
复制代码

ok,如果只是这样就太鸡肋了,我们的 employee.txt 的特征还是很明显的,可以分析出来文本的第一列是员工姓名,第二列看起来是职位。

现在我们希望过滤出来所有 manager,可以这样运行:

$ awk '/manager/ {print}' employee.txt 
复制代码

此时会打印

ajay manager account 45000
varun manager sales 50000
amit manager account 47000 
复制代码

前面是匹配的模式,后面 {} 中的则是具体的 action。

除了模式能够调整外,action 也是可以调整的,如果我们不希望用 print 默认的行为,现在想将一行 split 成多个 field 怎么办?

awk 默认会根据空格来 split 每一行数据,并将每个 field 存储在 n in these variables. For example, a line has 4 个单词,那么分隔后的结果将会被存储在 n 这些变量中。比如一行有 4 个单词,那么分隔后的结果将会被存储在 1, 2 , 2, 3, 4 这四个变量中。注意, 4 这四个变量中。注意, 0 代表了整个行。

所以,如果我们想基于 employee.txt 打印出来各个员工的名字和薪资(最后一列),可以这样做:

$ awk '{print $1,$4}' employee.txt 
复制代码

此时会打印出来:

ajay 45000
sunil 25000
varun 50000
amit 47000
tarun 15000
deepak 23000
sunil 13000
satvik 80000 
复制代码

第一列,第四列,完美符合预期。

除了上面我们提到的 $n 这些变量外,awk 还提供了一些内置的变量供开发者使用:

NR

NR command keeps a current count of the number of input records. Remember that records are usually lines. Awk command performs the pattern/action statements once for each record in a file.

NR 即 Number of Record, 记录了当前已经计数过的 record(行)数量。比如下面的命令:

$ awk '{print NR,$0}' employee.txt 
复制代码

此时会你会看到前面加上了行号:

1 ajay manager account 45000
2 sunil clerk account 25000
3 varun manager sales 50000
4 amit manager account 47000
5 tarun peon sales 15000
6 deepak clerk sales 23000
7 sunil peon sales 13000
8 satvik director purchase 80000 
复制代码

这里我们也可以加一些分隔符,比如打印编号 + 第一列,用- 来隔开,就可以这样:

$ awk '{print NR "-" $1 }' employee.txt
复制代码

得到的输出如下:

1-ajay
2-sunil
3-varun
4-amit
5-tarun
6-deepak
7-sunil
8-satvik
复制代码

当然,我们还可以活用 NR 来输出指定行:

$ awk 'NR==3, NR==6 {print NR,$0}' employee.txt 
复制代码

这样的命令代表了我们要打印 3 - 6 行这个区间内的行,输出如下:

3 varun manager sales 50000
4 amit manager account 47000
5 tarun peon sales 15000
6 deepak clerk sales 23000 
复制代码

有时候我们想打印出来一个文件的行号,就可以取最后一行的 NR,此时所有行都计数过了,所以 NR 等价于总行数:

$ awk 'END { print NR }' employee.txt 
复制代码

这里运行结果为 8,符合预期。

NF

NF command keeps a count of the number of fields within the current input record.

NF 即 Number of Field,记录了当前输入的 record 列的数量,我们可以用 $NF 来代表最后一列。

$ awk '{print $1,$NF}' employee.txt 
复制代码

运行之后,我们看到此时打印的是第一列和最后一列

ajay 45000
sunil 25000
varun 50000
amit 47000
tarun 15000
deepak 23000
sunil 13000
satvik 80000 
复制代码

NR + NF

现在我们有了 NR 和 NF,可以联系起来做什么呢?

比如此时我们希望找到空行的行号,假设有一些行就是没数据,是空的,怎么打印?

为了测试,我在第五行加了个空行,变成了这样:

ajay manager account 45000
sunil clerk account 25000
varun manager sales 50000
amit manager account 47000

tarun peon sales 15000
deepak clerk sales 23000
sunil peon sales 13000
satvik director purchase 80000
复制代码

其实很简单,空行的 NF 一定为 0,打印行号可以用 NR,所以我们可以这样:

$ awk 'NF==0 {print NR}' employee.txt
复制代码

运行后结果是 5,符合预期。

length

awk 提供了 length 函数计算字符串长度,比如我们希望找到所有比 80 个字符还长的行,可以这样:

awk 'length($0) > 80' employee.txt
复制代码

可能有些同学不理解,这里为啥不加 print 呀?

In fact, you can try it, and you can print it without adding it, because the default action is print.

use if statement as pattern

Sometimes we want to match based on the value of a certain column. Suppose we want to find the row where the value of the third column is equal to sales, which can be done like this:

$ awk '{ if($3 == "sales") print $0;}' employee.txt
复制代码

Print the result after running:

varun manager sales 50000
tarun peon sales 15000
deepak clerk sales 23000
sunil peon sales 13000
复制代码

Numeral Calculations

Sometimes we don't necessarily have a source file, but directly want to construct the conditions in the for loop. You can refer to this example:

$ awk 'BEGIN { for(i=1;i<=6;i++) print "square of", i, "is",i*i; }' 
复制代码

Here you will find that we did not provide the source file parameter, but provided a for loop directly in the pattern, and print the following values ​​for the interval i = 1 to 6. Output result:

square of 1 is 1
square of 2 is 4
square of 3 is 9
square of 4 is 16
square of 5 is 25
square of 6 is 36
复制代码

There is no default action for the BEGIN and END rules, we must provide an explicit action print.

summary

Today we reviewed the basic concepts of awk, combined with some practical cases to understand the common usage of awk, these are far from all, awk provides a complete set of languages ​​for text processing.

What we list here is just the tip of the iceberg. If you are interested, please refer to the User Guide for in-depth study, which will be of great benefit to text processing.

Thanks for reading!

Guess you like

Origin juejin.im/post/7136957973674328072
awk