Linux Bash文本操作之sed

作为Linux系统中文本处理的强力工具之一，sed功能强大，用法多变，值得我们好好学习。

sed是用于过滤和转换文本的流编辑器。

一般情况下sed把当前处理的行存储在临时缓冲区，按指定命令处理之后将缓冲区内容输出到屏幕，当然可以使用-n选项使得不打印内容到屏幕。另外这些操作默认对原文本没有影响，不会改变原来的文本内容，但是如果我们确实想要将处理结果作用于原文本，使用-i选项将修改附加到原文件，注意要谨慎使用！

调用方式

命令行输入

sed -e 'command' input_file

脚本文件输入

sed -f script_file input_file

下面通过一些实际操作说明一下 sed (未加说明即是指 sed (GNU sed) 4.2.2 ，下同)常用参数的含义和用法

首先获得实验文本

cv@cv: ~/myfiles$ touch test.txt
cv@cv: ~/myfiles$ man sed | head -n 30 | tail -n 28 > test.txt
cv@cv: ~/myfiles$ cat test.txt

 1 NAME
 2        sed - stream editor for filtering and transforming text
 3 
 4 SYNOPSIS
 5        sed [OPTION]... {script-only-if-no-other-script} [input-file]...
 6 
 7 DESCRIPTION
 8        Sed  is  a  stream  editor.  A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline).  While in some ways similar to an
 9        editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient.  But it is sed's ability to filter text
10        in a pipeline which particularly distinguishes it from other types of editors.
11 
12        -n, --quiet, --silent
13 
14               suppress automatic printing of pattern space
15 
16        -e script, --expression=script
17 
18               add the script to the commands to be executed
19 
20        -f script-file, --file=script-file
21 
22               add the contents of script-file to the commands to be executed
23 
24        --follow-symlinks
25 
26               follow symlinks when processing in place
27 
28        -i[SUFFIX], --in-place[=SUFFIX]

test.txt

删除操作

d       Delete pattern space.  Start next cycle.

# 删除第二行
cv@cv:~/myfiles$ sed '2d' test.txt
# 删除第二到第十行，包括第二行
cv@cv:~/myfiles$ sed '2,10d' test.txt

# 删除空行，注意该命令只能删除纯空行，也就是不能是由空格、制表符等组成的空行
cv@cv:~/myfiles$ sed '/^$/d' test.txt

# 当然如果不想将所有的空行删除，比如我只想将前四行中的空行删掉，可以使用下面的指令
cv@cv:~/myfiles$ sed '1,4{/^$/d}' test.txt

# 删除指定行行首的空格，替换为制表符
cv@cv:~/myfiles$ sed '6,10s/^[[:space:]]*/\t/g' test.txt

了解了这些之后我们可以试着将处理作用于原文件，可以查看一下效果。

cv@cv:~/myfiles$ sed -i '1,6{/^$/d};12,${/^$/d}' test.txt
cv@cv:~/myfiles$ cat test.txt
NAME
       sed - stream editor for filtering and transforming text
SYNOPSIS
       sed [OPTION]... {script-only-if-no-other-script} [input-file]...
DESCRIPTION
       Sed  is  a  stream  editor.  A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline).  While in some ways similar to an
       editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient.  But it is sed's ability to filter text
       in a pipeline which particularly distinguishes it from other types of editors.

       -n, --quiet, --silent
              suppress automatic printing of pattern space
       -e script, --expression=script
              add the script to the commands to be executed
       -f script-file, --file=script-file
              add the contents of script-file to the commands to be executed
       --follow-symlinks
              follow symlinks when processing in place
       -i[SUFFIX], --in-place[=SUFFIX]

查看操作

-n, --quiet, --silent    suppress automatic printing of pattern space
p    Print the current pattern space.

# 输出第三到五行内容
cv@cv:~/myfiles$ sed -n '3,5p' test.txt
SYNOPSIS
       sed [OPTION]... {script-only-if-no-other-script} [input-file]...
DESCRIPTION
# 打印所有匹配行的行号
cv@cv:~/myfiles$ sed -n '/[sS]ed/=' test.txt
2
4
6
7

# 打印所有匹配行的内容
cv@cv:~/myfiles$ sed -n '/[sS]ed/p' test.txt
       sed - stream editor for filtering and transforming text
       sed [OPTION]... {script-only-if-no-other-script} [input-file]...
       Sed  is  a  stream  editor.  A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline).  While in some ways similar to an
       editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient.  But it is sed's ability to filter text

# 如果我们既想知道内容又想显式行号，可以使用下面的指令
cv@cv:~/myfiles$ sed -n -e '/[sS]ed/=' -e '/[sS]ed/p' test.txt
2
       sed - stream editor for filtering and transforming text
4
       sed [OPTION]... {script-only-if-no-other-script} [input-file]...
6
       Sed  is  a  stream  editor.  A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline).  While in some ways similar to an
7
       editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient.  But it is sed's ability to filter text

增加一行或几行内容

a text    Append text, which has each embedded newline preceded by a backslash.
i text    Insert text, which has each embedded newline preceded by a backslash.
$         Match the last line.

# 在每一行之后输出my name is lee
cv@cv: ~/myfiles$ sed 'a my name is lee' test.txt

# 在第二行之后输出
cv@cv: ~/myfiles$ sed '2a my name is lee' test.txt

# 在最后一行之后输出，也即是append
cv@cv: ~/myfiles$ sed '$a my name is lee' test.txt

# 在最后一行之前输出，也即是insert
cv@cv: ~/myfiles$ sed '$i my name is lee' test.txt

替换操作

c text    Replace the selected lines with text, which has each embedded newline preceded by a backslash.

# 将第二行替换为"my name is lee"
cv@cv:~/myfiles$ sed '2c my name is lee' test.txt

# 将第二到最后一行替换为"my name is lee"，也即是删除选中行，再插入给定的一行
cv@cv:~/myfiles$ sed '2,$c my name is lee' test.txt

除了像上面这种命令直接替换之外，还有一种更强大的模式匹配替换。

# 将文中的sed或Sed替换成SED，默认后面的参数应该是g
cv@cv:~/myfiles$ sed 's/[sS]ed/\U&/' test.txt

# 显式地使用参数g与上面得到的结果完全一样
cv@cv:~/myfiles$ sed 's/[sS]ed/\U&/g' test.txt

# 使用参数p只显示被改变的行，但直接像下面这样还不行会将所有改变的行多显式一遍，因此我们常将p和n一起使用
cv@cv:~/myfiles$ sed 's/[sS]ed/\U&/p' test.txt

# 使用quiet/silent参数完成上面的任务，只显示被改变的行
cv@cv:~/myfiles$ sed -n 's/[sS]ed/\U&/p' test.txt

# 将第一到第三行之间的所有sed或Sed替换成SED，后面使用p和gp是等价的
cv@cv:~/myfiles$ sed -n '2,4s/[sS]ed/\U&/p' test.txt
cv@cv:~/myfiles$ sed -n '2,4s/[sS]ed/\U&/gp' test.txt
# 多个命令用'；'隔离开，并且可以使用其他的分隔符来代替常用的默认'/'
cv@cv:~/myfiles$ sed -n '2,4s/[sS]ed/\U&/gp;6,7s#editor#EdItOr#gp' test.txt

还可以使用匹配到的行去替换另一种匹配得到的行，这里h是将模式空间内容拷贝到保持空间，类似于windows中复制之后保存到剪贴板上，g表示将保持空间的内容拷贝到模式空间，也就是来替换我们找到的匹配处内容，示例如下

    h H    Copy/append pattern space to hold space.
    g G    Copy/append hold space to pattern space.

# 寻找文本中包含NAME的行，并用它来替换所有包含pipeline的行
cv@cv:~/myfiles$ sed -e '/NAME/h' -e '/pipeline/g' test.txt
cv@cv:~/myfiles$ cat test.txt
NAME
       sed - stream editor for filtering and transforming text
SYNOPSIS
       sed [OPTION]... {script-only-if-no-other-script} [input-file]...
DESCRIPTION
NAME
       editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient.  But it is sed's ability to filter text
NAME

       -n, --quiet, --silent
              suppress automatic printing of pattern space
       -e script, --expression=script
              add the script to the commands to be executed
       -f script-file, --file=script-file
              add the contents of script-file to the commands to be executed
       --follow-symlinks
              follow symlinks when processing in place
       -i[SUFFIX], --in-place[=SUFFIX]

转换内容操作

\b    matches the empty string at the edge of a word
\B    matches the empty string provided it's not at the edge of a word

cv@cv:~/myfiles$ echo "one two three btw is the abbr of by the way whether twher is meaningful? SHA random code twfdoetw tw wsr239wfgrte see-you-tw-tommorrow" >> test.txt

# 下面的\U&的意思是将符合条件的过滤项转换成全部大写的形式
# \b用来匹配文本中单词开头或结尾字符，\btw这里指以tw开头的单词
cv@cv:~/myfiles$ sed -n 's/\btw/\U&/gp' test.txt
one TWo three btw is the abbr of by the way whether TWher is meaningful? SHA random code TWfdoetw TW wsr239wfgrte seeyoutwtommorrow

# tw\b这里指的是以tw结尾的单词
cv@cv:~/myfiles$ sed -n 's/tw\b/\U&/gp' test.txt
one two three bTW is the abbr of by the way whether twher is meaningful? SHA random code twfdoeTW TW wsr239wfgrte seeyoutwtommorrow

# 以tw开头且以tw结尾，匹配形如tw、twtw、tw*tw的单词
cv@cv:~/myfiles$ sed -n 's/\btw\b/\U&/gp' test.txt
one two three btw is the abbr of by the way whether twher is meaningful? SHA random code twfdoetw TW wsr239wfgrte seeyoutwtommorrow

# \B用来匹配文本中非单词开头和结尾字符，这里指的是单词中包含tw但是tw不在开头也不在结尾处
cv@cv:~/myfiles$ sed -n 's/\Btw/\U&/gp' test.txt
one two three bTW is the abbr of by the way whether twher is meaningful? SHA random code twfdoeTW tw wsr239wfgrte seeyouTWtommorrow

# 不以tw结尾的单词
cv@cv:~/myfiles$ sed -n 's/tw\B/\U&/gp' test.txt
one TWo three btw is the abbr of by the way whether TWher is meaningful? SHA random code TWfdoetw tw wsr239wfgrte seeyouTWtommorrow

# 不以tw开头也不以tw结尾，只能在单词中间存在
cv@cv:~/myfiles$ sed -n 's/\Btw\B/\U&/gp' test.txt
one two three btw is the abbr of by the way whether twher is meaningful? SHA random code twfdoetw tw wsr239wfgrte seeyouTWtommorrow

# 以tw开头但不以tw结尾，twtw这样的就不符合筛选条件
cv@cv:~/myfiles$ sed -n 's/\btw\B/\U&/gp' test.txt
one TWo three btw is the abbr of by the way whether TWher is meaningful? SHA random code TWfdoetw tw wsr239wfgrte seeyoutwtommorrow

# 不以tw开头而且以tw结尾的单词
cv@cv:~/myfiles$ sed -n 's/\Btw\b/\U&/gp' test.txt
one two three bTW is the abbr of by the way whether twher is meaningful? SHA random code twfdoeTW tw wsr239wfgrte seeyoutwtommorrow

合并指定两行

说到合并两行内容，就需要介绍N的含义和用法。

根据手册如下内容，可知N的作用是读取下一行内容到模式空间，也可以理解为把两行读入成中间带\n换行的一行内容，sed原本是按行处理文本，N选项就是告诉sed把下一行同时读取到模式空间待命。大写字母P的作用是将模式空间中的第一行内容打印到屏幕，小写字母p的作用是将当前模式空间中的所有内容都打印到屏幕，注意区分一下。

N    Append the next line of input into the pattern space.
P    Print up to the first embedded newline of the current pattern space.(Capital)
p    Print the current pattern space.(Lowercase)

如下命令中使用N选项将1和2同时读入模式空间，再依据P选项打印模式空间第一行内容，从而输出1。然后读取3和4，输出3。最后读取5，但为什么没有输出呢？因为当无法处理下一行内容时，也就是读不到第二行时，N会自动终止退出，所以后面的P也不会执行，因此没有打印5到屏幕上。

cv@cv:~/myfiles$ seq 5 | sed -n 'N;P'
1
3

与上面有所区别，这里使用$!N选项，意思是对最后一行不执行N命令，直接进行下一步，继续执行P，打印模式空间第一行，也就是读到的5，因此能输出5。

cv@cv:~/myfiles$ seq 5 | sed -n '$!N;P'
1
3
5

读取最后一行5时，该命令中N无法处理下一行内容，终止退出，不执行后面的替换语句，这里另外使用了-n参数，没有处理的命令不再输出，如果去掉该参数sed默认会输出该行内容，所以5依然可以输出到屏幕。

cv@cv:~/myfiles$ seq 5 | sed -n 'N;s/\n/ /;p'
1 2
3 4

读取最后一行到模式空间时，N没起作用，后面的替换执行也不成功，因为末尾没有换行符，但sed还是会默认输出该行内容，5也出现在了屏幕上。注意这里的小p放在替换语句结束符；之后，如果放在分号之前表示只打印输出有改动的行，就会想下面的结果那样没有5。

cv@cv:~/myfiles$ seq 5 | sed -n '$!N;s/\n/ /;p'
1 2
3 4
5
cv@cv:~/myfiles$ seq 5 | sed -n 'N;s/\n/ /p'
1 2
3 4
cv@cv:~/myfiles$ seq 5 | sed -n '$!N;s/\n/ /p'
1 2
3 4

匹配

# 寻找包含其中一种模式的行，括号是正则表达式中的选择项
cv@cv:~/myfiles$  sed -i '$d' test.txt
cv@cv:~/myfiles$ sed '/\(file\|script\)/d' test.txt
NAME
       sed - stream editor for filtering and transforming text
SYNOPSIS
DESCRIPTION
       in a pipeline which particularly distinguishes it from other types of editors.

       -n, --quiet, --silent
              suppress automatic printing of pattern space
       --follow-symlinks
              follow symlinks when processing in place
       -i[SUFFIX], --in-place[=SUFFIX]

# 删除从匹配到第一个Sed开始的行到第一个pipeline所在行之间的部分
cv@cv:~/myfiles$ sed '/Sed/,/pipeline/d' test.txt
NAME
       sed - stream editor for filtering and transforming text
SYNOPSIS
       sed [OPTION]... {script-only-if-no-other-script} [input-file]...
DESCRIPTION

       -n, --quiet, --silent
              suppress automatic printing of pattern space
       -e script, --expression=script
              add the script to the commands to be executed
       -f script-file, --file=script-file
              add the contents of script-file to the commands to be executed
       --follow-symlinks
              follow symlinks when processing in place
       -i[SUFFIX], --in-place[=SUFFIX]

未完待续

Linux Bash文本操作之sed

猜你喜欢