grep 文本处理工具

grep家族

grep：在文件中全局查找指定的正则表达式，并打印所有包含该表达式的行

egrep：扩展的grep，支持更多的正则表达式元字符

fgrep：固定的grep(fixed grep)，又是也被称作快速grep(fast grep)，他按字面解释所有的字符

grep命令格式

grep [选项] PATTERN filename filename ... ...

# grep 'Tom' /etc/passwd
# grep 'bash shell' /etc/test

grep返回的退出状态

grep返回的退出状态	说明
0	找到符合PATTERN的内容
1	没找到符合PATTERN的内容
2	找不到指定文件

grep程序的输入可以来自标准输入或管道，而不仅仅是文件，例如

# grep 'tom'
# ps aux | grep 'sshd'
# ll | grep '^d'
# grep 'alice' /etc/passwd /etc/shadow /etc/group

grep使用的元字符

grep ：使用基本元字符集

^, $, ., *, [], [^], \< \>, \(\), \{\}

egrep(或grep -E)：使用扩展元字符集

?, +, { }, |, ( )

使用扩展元字符的方式：

1. grep使用扩展元字符需要在扩展元字符前加上 \
2. egrep 或者 grep -E

\w 所有字母与数字，称为字符[a-zA-Z0-9]  'l[a-zA-Z0-9]*ve' 等同 'l\w*ve'

\W 所有字母与数字之外的字符，称为非字符[^a-zA-Z0-9]  'love[^a-zA-Z0-9]+' 等同 'love\W+'

\b 词边界  '\<love\>' 等同 '\blove\b'

grep示例

grep -E 或 egrep

# egrep 'NW' datafile   从datafile中匹配包含NW的内容

# egrep 'NW' d*         从d开头的文件中匹配包含NW的内容

# egrep '^n' datafile   从datafile中匹配以n开头的内容

# egrep '4$' datafile   从datafile中匹配以4结尾的内容

# egrep TB Savage datafile   从文件Savage和datafile中匹配TB
# egrep 'TB Savage' datafile 从文件datafile中匹配‘TB Savage’

# egrep '5\..' datafile     从datafile中匹配5.后面跟着一个任意字符的内容

# egrep '\.5' datafile      从datafile中匹配.5的内容

# egrep '^[we]' datafile    从datafile中匹配以w或e开头的内容

# egrep '[^0-9]' datafile   从datafile中匹配包含非数字的内容

# egrep [A-Z][A-Z] [A-Z]' datafile  从datafile中匹配两个大写字母，后跟一个空格，再跟一个大写字母的内容

# egrep 'ss*' datafile      从datafile中匹配至少有一个s的内容，此处相当于's+'

# egrep '[a-z]{9}' datafile 从datafile中匹配9个小写字母的内容

# egrep '\<north' datafile  从datafile中匹配以north作为单词词首的内容

# egrep '\<north\>' datafile    从datafile中匹配包含单词north的内容

# egrep '\<[a-r].*n\>' datafile 从datafile中匹配以a到r中任意一个小写字母作为单词开头，中间为任意长度的任意字符，以n作为单词词尾的内容

# egrep '^n\w*\W' datafile      从datafile中匹配以n开头，0到多个字符（数字、大小写字母），后跟上一个非字符的内容

# egrep '\bnorth\b' datafile    从datafile中匹配包含单词north的内容，相当于'\<north\>'

# egrep 'NW|EA' datafile    从datafile中匹配NW或者EA的内容

# egrep '3+' datafile       从datafile中匹配1到多个3的内容

# egrep '2\.?[0-9]' datafile    从datafile中匹配2后面跟着0到1个.，后面再跟着一个数字的内容

# egrep '(no)+' datafile    从datafile中匹配1到多个no

# egrep 'S(h|u)' datafile   从datafile中匹配Sh或者Su

# egrep 'Sh|u' datafile     从datafile中匹配Sh或者u

grep选项

-E ：开启扩展（Extend）的正则表达式。

-i, --ignore-case：忽略大小写
　　
-l, --files-with-matches：只列出匹配行所在的文件名
　　
-n, --line-number：在每一行前面加上它在文件中的相对行号
　　
-c, --count：显示成功匹配的行数
　　
-s, --no-messages：禁止显示文件不存在或者文件不可读的错误信息
　　
-q, --quiet, --silent：静默，禁止显示所有正常输出

-v, --invert-match：反向查找，只显示不匹配的行
　　
-r, --recursive：递归，针对目录使用，即 grep -r PATTERN directoryname
　　
--color：将匹配到的内容以颜色高亮显示
　　
-o, --only-matching：只显示被模式匹配到的内容
　　
-A, --after-context=NUM：-An ==> 显示匹配到的字符串所在的行及其后n行
　　
-B, --before-context=NUM：-Bn ==> 显示匹配到的字符串所在的行及其前n行
　　
-C, --context=NUM：-Cn ==> 显示匹配到的字符串所在的行及其前后各n行
　　
-w, --word-regexp：被匹配的文本只能是单词，而不能是单词中的某一部分，如文本中有liker，而我搜寻的只是like，就可以使用-w选项来避免匹配liker

示例

# 匹配出/etc/passwd、/etc/shadow、/etc/hosts三个文件中包含root的行
[root@hadoop04 shell_sed]# egrep 'root' /etc/passwd /etc/shadow /etc/hosts
/etc/passwd:root:x:0:0:root:/root:/bin/bash
/etc/passwd:operator:x:11:0:operator:/root:/sbin/nologin
/etc/shadow:root:$6$4KlcFsuc9xAZczx3$DlUn31VBk05.T4tjSUxhkLjhlKijm7T6RvOa7Az8Ni7kHyDW7fevTtHFFdA1BGFl5UehKdqEHDWnL22yadnLa.::0:99999:7:::

# 列出匹配到root的行所在的文件名
[root@hadoop04 shell_sed]# egrep -l 'root' /etc/passwd /etc/shadow /etc/hosts
/etc/passwd
/etc/shadow


# 显示成功匹配到root的行数
[root@hadoop04 shell_sed]# egrep -c 'root' /etc/passwd /etc/shadow /etc/hosts
/etc/passwd:2
/etc/shadow:1
/etc/hosts:0


# 在每一行前面加上它在文件中的相对行号
[root@hadoop04 shell_sed]# egrep -n 'root' /etc/passwd /etc/shadow /etc/hosts
/etc/passwd:1:root:x:0:0:root:/root:/bin/bash
/etc/passwd:10:operator:x:11:0:operator:/root:/sbin/nologin
/etc/shadow:1:root:$6$4KlcFsuc9xAZczx3$DlUn31VBk05.T4tjSUxhkLjhlKijm7T6RvOa7Az8Ni7kHyDW7fevTtHFFdA1BGFl5UehKdqEHDWnL22yadnLa.::0:99999:7:::


# -w此处是模式，需要使用\进行转义，不然grep会当做是选项来看
[root@hadoop04 shell_sed]# grep --help | grep '\-w'
  -w, --word-regexp         force PATTERN to match only whole words
  -H, --with-filename       print the file name for each match
  -L, --files-without-match print only names of FILEs containing no match
  -l, --files-with-matches  print only names of FILEs containing matches


# 显示匹配到的-R所在的行及其后5行
[root@hadoop04 shell_sed]# grep --help | grep -A5 '\-R'
  -R, --dereference-recursive
                            likewise, but follow all symlinks
      --include=FILE_PATTERN
                            search only files that match FILE_PATTERN
      --exclude=FILE_PATTERN
                            skip files and directories matching FILE_PATTERN

12grep 文本处理工具