Linux: advanced shell scripting (sed/gawk)

table of Contents

Sed command advanced

Hold space and pattern space

Sed advanced article example application

Advanced gawk command


 

#This article is a study note

 

Sed command advanced

1>n command: move to the next line

The lowercase n command will tell the sed editor to move to the next line of the data stream for processing

 

1.1 Principle of n command

The n command is simply to read the next line in advance, covering the previous line of the model space (not deleted, so it is still printed to standard output). If the command is not executed successfully (not skip: the front-end conditions do not match), then give up. Any command, and re-execute sed for the newly read content.

example:

Take out even lines from aaa file

1

2

3

4

5

6

7

8

9

10

cat aaa 

This is 1    

This is 2    

This is 3    

This is 4    

This is 5    

     

sed -n 'n;p' aaa         //-n表示隐藏默认输出内容    

This is 2    

This is 4


Note: Read This is 1, execute the n command, the mode space is This is 2, execute p, print the content of the mode space This is 2, then read This is 3, execute the n command, the mode space is This is 4. Execute p, print the content of mode space This is 4, then read This is 5, execute the n command, because there is no more, so exit, and give up the p command.

Therefore, the even-numbered lines are printed out in the end.

 

2>N command: merge text lines

The uppercase N will add the next text line to the existing text in the pattern space, so that the two text lines are merged into the same pattern space, and the text lines are still separated by line breaks. If you want to find text that may be scattered on two lines in a file, this is a very useful function.

 

2.1 Principle of N command:

The N command is simply to append the next line to the pattern space, and at the same time treat the two lines as one line, but there is still a \n newline character between the two lines. If the command is not executed successfully (not skip: the front-end conditions do not match), then Give up any commands afterwards, and re-execute sed for the newly read content.

example:

Read odd lines from aaa file

1

2

3

4

5

6

7

8

9

10

11

cat aaa   

This is 1   

This is 2   

This is 3   

This is 4   

This is 5   

                                                     

sed -n '$!N;P' aaa            

This is 1   

This is 3   

This is 5

In the comment, 1 represents This is 1 2 represents This is 2 and so on

Note: Read 1, $! condition is satisfied (not the last line), execute N command, get 1\n2, execute P, print 1, read 3, $! condition is satisfied (not the last line), execute N command, get Out 3\n4, execute P, print 3, read 5, $! condition is not met, skip N, execute P, print 5

2.2 Configuration example

例1
$ cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.

$ sed '/first/{N;s/\n/ /}' data2.txt
this is the header line.
this is the first data line. this is the second data line.
this is the last line.

例2
$ cat data3.txt
on Tuesday,the linux system
administrator's group meeting will be held.
all system administrators should attend.
thank you for your attendance.

$ sed 'N;s/system.administrator/desktop user/' data3.txt
on Tuesday,the linux desktop user's group meeting will be held.
all desktop users should attend.
thank you for your attendance.

 

3>Only delete the command in the previous line, D command

When there are multiple matching lines, the D command will only delete the first line in the pattern space

 

3.1 Working principle of D command

The D command deletes the content from the beginning of the current mode space to \n (not transmitted to standard output), abandons the subsequent commands, but re-executes sed on the remaining mode space.

D command example

Read the last line from the aaa file

1

2

3

4

5

6

7

8

9

cat aaa   

This is 1   

This is 2   

This is 3   

This is 4   

This is 5   

                                                

sed 'N;D' aaa           

This is 5

Note: Read 1, execute N, get 1\n2, execute D, get 2, execute N, get 2\n3, execute D, get 3, and so on, get 5, execute N, condition Fail to exit, because there is no -n parameter, so output 5

 

3.2 Configuration example


$ cat data5.txt

this is the header line.
this is a data line.

this is the last line.

例1:
$ sed '/^$/{N;/header/D}' data5.txt
this is the header line.
this is a data line.

this is the last line.

例2:打印最后两行
sed '$!N;$!D' data5.txt

this is the last line.

 

4>Only print the previous line, P command

When there are multiple matching lines, the P command will only print the first line in the pattern space

$ sed -n 'N;/system\nadministrator/P' data3.txt
on Tuesday,the linux system

5>Exclude command (!)

例1
$ cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.

$ sed -n '/headeer/!p' data2.txt
this is the first data line.
this is the second data line.
this is the last line.

6> Branch. Used to change the data flow

Format: [address]b [:label]

The address parameter determines which rows of data will trigger the branch command, and the label parameter defines the location to jump to.

例1
$ cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.
$ sed '{2,3b;s/this is /is this/;s/line./test?/}' data2.txt
is this the header test?
this is the first data line.
this is the second data line.
is this the last test?

例2
$ sed '{/first/b jump1;s/this is the/no jump on/
>:jump1
>s/this is the/jump here on/}' data2.txt

no jump on header line
jump here on first data line
no jump on second data line
no jump on last line

例3
$ sed -n '{
> :start
> s/,//1p
> b start
> }'

 

7> Test (test) command. Used to change the execution flow of the sed editor script, which is equivalent to the "or" in the logical operator

例1
$ cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.
$ sed '{
>s/first/matched/
>t
>s/this is the/no match on/
}' data2.txt
no match on header line
this is the matched data line
no match on second data line
no match on last line

例2
$ echo "this,is,a,test,to,remove,commas."|sed -n '{
>:start
>s/,/ /1p
>b start
}'
#当无需替换时,测试命令不会跳转而是继续执行剩下的脚本

 

8>Alternative mode/backreference (&)

The & symbol is used to represent the matching pattern in the replace command. When you want to match part of the string, you can use parentheses to define the sub-pattern in the pattern, and use \1, \2, etc. to represent the calling sub-pattern.

例1
$ echo 'the cat sleeps in his hat'|sed 's/.at/"&"/g'
the "cat" sleeps in his "hat"

例2
$ echo 'the system administrator manual'|sed '
>s/\(system\) administrator/\1 user/'
the system user manual

例3
$ echo '1234567'|sed '{
>:start
>s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/
>t start
>}'
1,234,567

 

Hold space and pattern space

Mode space: it is the buffer area where the sed editor stores the text to be processed. Sed will read one line of text at a time, put it into the pattern space, and execute editing commands.

Keep space: a buffer for temporarily saving some lines.

There are five commands to manipulate the holding space:

h Copy the pattern space to the hold space (will overwrite)
H Append pattern space to hold space
g Copy hold space to pattern space
G Append holding space to pattern space
x Exchange the content of pattern space and hold space

 

例1
$ cat data2.txt
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.
$ sed -n '{1!G;h;$p}' data2.txt
this is the last line.
this is the second data line.
this is the first data line.
this is the header line.

 

 

Sed advanced article example application

Sample text:

cat data2.txt
  this is the header line.
  this is the first data line.
  this is the second data line.
  this is the last line.


实例:
例1:加倍行间距,每行后面加一行空白行;
$ sed 'G' data2.txt
this is the header line.

this is the first data line.

this is the second data line.

this is the last line.

$ sed '$!G' data2.txt      #每行后面加一空白行,最后一行不加
this is the header line.

this is the first data line.

this is the second data line.

this is the last line.
$

 

例2:给文件中的行编号;
$ sed '=' data2.txt | sed 'N;s/\n'/ /'
1 this is the header line.
2 this is the first data line.
3 this is the second data line.
4 this is the last line.
#如果有空白行的话,需要注意先去除空白行
例3:打印末尾10行
$ sed '{
> :start
> $q;N;11,$D
> b start
> }' data7.txt
例4:删除空白行
$ sed '/./,/^$/!d' data8.txt     #删除连续空白行;
$ sed '/./,$!d' data9.txt        #删除开头n行空白行;
$ sed '{                         #删除末尾空白行;
> :start
> /^\n*$/{$d;N;b start}
> }'

 

Advanced gawk command

1>Built-in variables

FIELDWIDTHS A series of numbers separated by spaces that define the exact width of each data field
FS 输入字段分隔符,默认是空格符
RS 输入记录分隔符,每行是一条记录,默认是换行符
OFS 输出字段分隔符,默认是空格符
ORS 输出记录分隔符,每行是一条记录,默认是换行符

 

数据变量:

NF 字段总数
NR 已处理的行数
ENVIRON

关联数组组成的shell环境变量值,格式如下

print ENVIRON["HOME"]

   
例1:将逗号分隔符替换为“-”,并输出前三项
$ cat data1
data11,data12,data13,data14,data15
data21,data22,data23,data24,data25
data31,data32,data33,data34,data35

$ gawk 'BEGIN{FS=",";OFS="-"} {print $1,$2,$3}' data1
data11-data12-data13
data21-data22-data23
data31-data32-data33

 

例2:将每行当作一个字段,把空白行当作记录分隔符
$ cat data2
riley mullen
123 main street
chicago,il 60601
555-1234

frank williams
456 oak street
indianapolis ,in 46201
5559876

haley snell
4231 elm street
detroit,mi 48201
555-4938

$ gawk 'BEGIN{FS="\n";RS=""} {print $1,$4}' data2
riley mullen 555-1234
frank williams 555-9876
haley snell 555-4938
例3:按固定宽度分隔字段
$ cat data1b
1005.3247596.37
115-2.349194.00
05810.1298100.1

$ gawk 'BEGIN{FIELDWIDTHS="3 5 2 5"}{print $1,$2,$3,$4}' data1b
100 5.324 75 96.37
115 -2.34 91 94.00
058 10.12 98 100.1
例4:在脚本中使用变量赋值
$ gawk '
> BEGIN{
> testing="this is a test"
> print testing
> testing=46
> print testing
> }'

$ this a testing
$ 45
遍历关联数组
$ gawk 'BEGIN{
> var[a]=1
> var[g]=2
> var[u]=3
> for (test in var)
> {
> print "index:",test," "- Value:",var[test]
> }
> }'

index: u  - Value: 3
index: a  - Value: 1
index: g  - Value: 2

删除数组元素值:
delete array[index]
匹配限定
$ gawk -F: '$1 ~ /rich/{print $1,$NF}' data1
rich /bin/bash
#这个例子会在第一个数据字段中查找文本rich。如果在记录中找到了这个模式,它会打印
该记录的第一个和最后一个数据字段值;
数学表达式应用
$ gawk -F: '$4 == 0 {print $1}' /etc/passwd
root
sync
shutdown
halt
operator
结构化命令,if语句
if (condition)
statement1
也可以将它放在一行上,像这样:
if (condition) statement1

$ gawk '{if ($1 > 20) print $1}' data4
$ gawk '{
> if ($1 > 20)
> {
> x=$1 * 2
> print x
> }
> }' data4


$ gawk '{
> if ($1 > 20)
> {
> x = $1 * 2
> print x
> } else
> {
> x = $1 / 2
> print x
> }}' data4

可以在单行上使用else子句,但必须在if语句部分之后使用分号。
if (condition) statement1; else statement2


结构化命令,while语句
while (condition)
{
statements
}

$ gawk '{
> total = 0
> i = 1
> while (i < 4)
> {
> total += $i
> i++
> }
> avg = total / 3
> print "Average:",avg
> }' data5


$ gawk '{
> total = 0
> i = 1
> while (i < 4)
> {
> total += $i
> if (i == 2)
> break
> i++
> }
> avg = total / 2
> print "The average of the first two data elements is:",avg
> }' data5

结构化命令,do-while语句类似于while语句,但会在检查条件语句之前执行命令。下面是do-while语
句的格式。
do
{
statements
} while (condition)
这种格式保证了语句会在条件被求值之前至少执行一次。当需要在求值条件前执行语句时,
这个特性非常方便。
$ gawk '{
> total = 0
> i = 1
> do
> {
> total += $i
> i++
> } while (total < 150)
> print total }' data5



for语句是许多编程语言执行循环的常见方法。gawk编程语言支持C风格的for循环。
for( variable assignment; condition; iteration process)
将多个功能合并到一个语句有助于简化循环。
$ gawk '{
> total = 0
> for (i = 1; i < 4; i++)
> {
> total += $i
> }
> avg = total / 3
> print "Average:",avg
> }' data5
格式化输出,printf,和c语言用法一样
c 将一个数作为ASCII字符显示
d 显示一个整数值
i 显示一个整数值(跟d一样)
e 用科学计数法显示一个数
f 显示一个浮点值
g 用科学计数法或浮点数显示(选择较短的格式)
o 显示一个八进制值
s 显示一个文本字符串
x 显示一个十六进制值
X 显示一个十六进制值,但用大写字母A~F


注意,你需要在printf命令的末尾手动添加换行符来生成新行。没添加的话,printf命令
会继续在同一行打印后续输出。

printf默认输出是右对齐,可通过“-”来控制成左对齐:
$ gawk 'BEGIN{FS="\n"; RS=""} {printf "%-16s %s\n", $1, $4}' data2
Riley Mullen (312)555-1234
Frank Williams (317)555-9876
Haley Snell (313)555-4938
$

处理浮点数:
$ gawk '{
> total = 0
> for (i = 1; i < 4; i++)
> {
> total += $i
> }
> avg = total / 3
> printf "Average: %5.1f\n",avg
> }' data5
Average: 128.3
Average: 137.7
Average: 176.7
$
#打印test.txt的第3行至第5行
awk 'NR==3,NR==5 {print}' test.txt
#打印test.txt的第3行至第5行的第一列与最后一列
awk 'NR==3,NR==5 {print $1,$NF}' test.txt

#打印test.txt中长度大于80的行号
awk 'length($0)>80 {print NR}' test.txt

#计算test.txt中第一列的总和
cat test.txt|awk '{sum+=$1}END{print sum}'

#添加自定义字符
ifconfig eth0|grep 'Bcast'|awk '{print "ip_" $2}'


#格式化输出
awk -F: '{printf "% -12s % -6s % -8s\n",$1,$2,$NF}' /etc/passwd

 

注:本章内容为读书笔记,摘自《Linux命令行与shell脚本编程大全》第3版。

Guess you like

Origin blog.csdn.net/qq_35229961/article/details/90693541