Linux常用命令之-grep

简介

grep全称Global Regular Expression Print是一种强大的文本搜索工具,它能使用给定的正则表达式按行搜索文本输出,文件,目录等,统计并输出匹配的信息,grep在文本查找方面非常强悍,也是linux命令中最常用的命令之一

使用grep --help可以查看grep的语法说明,但grep的选项如此之多,以至于在不太熟悉的情况下一下看到太多的选项显得有些懵,本文谨以实用原则结合案例总结出grep的常见用法,最后在给出命令的详细说明,文中所有的项选项笔者都给出完整的英文单词

1.过滤(匹配)输出

查找在运行中的的某个进程大概算的上是非常常用的命令了,使用ps -ef查看服务器所有运行中的进程,以查找sshd的进程为例

$ ps -ef | grep sshd

输出如下:

root       1290      1      0   19:07   ?           00:00:00    /usr/sbin/sshd
root       1607   1290      0   23:09   ?           00:00:00    sshd: root@pts/0 
root       1640   1611      0   23:20   pts/0       00:00:00    grep sshd

但这会将grep sshd这个进程也输出出来,虽然大多数时候 并不影响我们用肉眼去观察,但是有些情况下,我们总是希望把包含grep的这个进程过滤掉,使用-v (invert-match)选项不显示匹配到的行

$ ps -ef | grep sshd | grep -v grep

2.在文件中查找

在文件中查找的基本命令格式为grep string file_name,意为在file_name文件中查找string并输出,加入test.txt内容如下:

abc
def
Abc

想要在文件中查找字符串abc只需使用

$ grep abc test.txt
-------------------
abc

加入有以下场景,test.txt文件很大,我们想要知道查出来的内容在哪一行,以便我们编辑文件的时候能够迅速的定位该位置,这时使用-n(line-number)命令显示行号

$ grep -n abc test.txt
--------------------
1:abc

但上面的命令并没有匹配到Abc这一行,虽然这是我们想要看到的结果,但有些场景下,我们可能有忽略大小写的需求,这里使用-i(ignore-case) 忽略大小写。

$ grep -i abc test.txt
--------------------
abc
Abc

根据正则匹配

考虑以下场景,要从以下的网络日志中显示出所有的来访IP(来访IP为第一个|前的内容)

#access.log
----------------

11.0.21.12|-|-|[10/Aug/2018:17:47:42 +0800]
11.0.23.13|-|-|[10/Aug/2018:17:47:42 +0800]
140.143.145.44|111.111.111|-|[10/Aug/2018:17:47:42 +0800]

使用-E(extended-regexp)命令根据正则匹配

$ grep -E '^([0-9]{1,3}[\.]){3}[0-9]{1,3}' access.log
-------------------------------------
11.0.21.12|-|-|[10/Aug/2018:17:47:42 +0800]
11.0.23.13|-|-|[10/Aug/2018:17:47:42 +0800]
140.143.145.44|111.111.111.111|-|[10/Aug/2018:17:47:42 +0800]

但这会把匹配到的行都匹配出来,使用-o(only-matching)命令只输出匹配到的字符串

$ grep -Eo '^([0-9]{1,3}[\.]){3}[0-9]{1,3}' access.log
----------------------------------
11.0.21.12
11.0.23.13
140.143.145.44

由于正则里^限制开头的原因,服务器的IP已经被过滤掉了。

查找目录

写到这里好像grep的用途还是很小,因为一个文件如果不大的情况下,我们想从中查找部分字符串也并不复杂。但当从一个项目拥有很多文件的工程目录中查找内容时,grep就显得很方便了。

考虑项目目录如下

$ ls
apps  etl  etl_dba.py  nohup.out  simple.celery.log  simple.celery.pid  utils

我们要查看所有关于rabbitmq的配置,使用-r(recursive)匹配目录下所有的非二进制文件

$ grep -r amqp ./

./etl/celeryconfig.py:# BROKER_URL = 'amqp://unicorn_etl:[email protected]:5672/unicorn_etl_hp'
./etl/celeryconfig.py:BROKER_URL = 'amqp://unicorn_etl:[email protected]:5672/unicorn_etl_hp'
./simple.celery.log: Connected to amqp://unicorn_etl:**@10.14.50.17:5672/unicorn_etl_hp
./simple.celery.log: Connected to amqp://unicorn_etl:**@10.14.50.17:5672/unicorn_etl_hp

使用 -h(no-filename)取消文件名

$ grep -rh amqp ./

# BROKER_URL = 'amqp://unicorn_etl:[email protected]:5672/unicorn_etl_hp'
BROKER_URL = 'amqp://unicorn_etl:[email protected]:5672/unicorn_etl_hp'
Connected to amqp://unicorn_etl:**@10.14.50.17:5672/unicorn_etl_hp
Connected to amqp://unicorn_etl:**@10.14.50.17:5672/unicorn_etl_hp

假如现在我们又想修改配置,如果能直接输出文件和行号的话,我们会更轻松的定位文件位置,结合-n命令,可以轻松的做到这一点

$ grep -rn amqp ./

Binary file ./etl/celeryconfig.pyc matches
./etl/celeryconfig.py:14:# BROKER_URL = 'amqp://unicorn_etl:[email protected]:5672/unicorn_etl_hp'
./etl/celeryconfig.py:15:BROKER_URL = 'amqp://unicorn_etl:[email protected]:5672/unicorn_etl_hp'
./simple.celery.log:1: Connected to amqp://unicorn_etl:**@10.14.50.17:5672/unicorn_etl_hp
./simple.celery.log:12: Connected to amqp://unicorn_etl:**@10.14.50.17:5672/unicorn_etl_hp

但其实我们可以看到,中间部分数据属于simple.celery.log日志文件的,使用--exclude 排除文件

$ grep -rh --exclude *.log amqp ./

Binary file ./etl/celeryconfig.pyc matches
# BROKER_URL = 'amqp://unicorn_etl:[email protected]:5672/unicorn_etl_hp'
BROKER_URL = 'amqp://unicorn_etl:[email protected]:5672/unicorn_etl_hp'

使用--color将匹配到的内容高亮显示

grep -rh --exclude *.log amqp ./  --color

Binary file ./etl/celeryconfig.pyc matches
# BROKER_URL = 'amqp://unicorn_etl:[email protected]:5672/unicorn_etl_hp'
BROKER_URL = 'amqp://unicorn_etl:[email protected]:5672/unicorn_etl_hp'

总结

以上几点用法差不多可以搞定大多数日常对grep的需求,但grep提供的功能还远不止这些,使用grep --help查看所有的选项,相信有了上面的基础,根据help输出的提示,结合自己的需求灵活使用grep命令并不是一件困难的事情。

$ grep --help

Usage: grep [OPTION]... PATTERN [FILE]...
Search for PATTERN in each FILE or standard input.
PATTERN is, by default, a basic regular expression (BRE).
Example: grep -i 'hello world' menu.h main.c

Regexp selection and interpretation:
  -E, --extended-regexp     PATTERN is an extended regular expression (ERE)
  -F, --fixed-strings       PATTERN is a set of newline-separated fixed strings
  -G, --basic-regexp        PATTERN is a basic regular expression (BRE)
  -P, --perl-regexp         PATTERN is a Perl regular expression
  -e, --regexp=PATTERN      use PATTERN for matching
  -f, --file=FILE           obtain PATTERN from FILE
  -i, --ignore-case         ignore case distinctions
  -w, --word-regexp         force PATTERN to match only whole words
  -x, --line-regexp         force PATTERN to match only whole lines
  -z, --null-data           a data line ends in 0 byte, not newline

Miscellaneous:
  -s, --no-messages         suppress error messages
  -v, --invert-match        select non-matching lines
  -V, --version             display version information and exit
      --help                display this help text and exit

Output control:
  -m, --max-count=NUM       stop after NUM matches
  -b, --byte-offset         print the byte offset with output lines
  -n, --line-number         print line number with output lines
      --line-buffered       flush output on every line
  -H, --with-filename       print the file name for each match
  -h, --no-filename         suppress the file name prefix on output
      --label=LABEL         use LABEL as the standard input file name prefix
  -o, --only-matching       show only the part of a line matching PATTERN
  -q, --quiet, --silent     suppress all normal output
      --binary-files=TYPE   assume that binary files are TYPE;
                            TYPE is 'binary', 'text', or 'without-match'
  -a, --text                equivalent to --binary-files=text
  -I                        equivalent to --binary-files=without-match
  -d, --directories=ACTION  how to handle directories;
                            ACTION is 'read', 'recurse', or 'skip'
  -D, --devices=ACTION      how to handle devices, FIFOs and sockets;
                            ACTION is 'read' or 'skip'
  -r, --recursive           like --directories=recurse
  -R, --dereference-recursive
                            likewise, but follow all symlinks
      --include=FILE_PATTERN
                            search only files that match FILE_PATTERN
      --exclude=FILE_PATTERN
                            skip files and directories matching FILE_PATTERN
      --exclude-from=FILE   skip files matching any file pattern from FILE
      --exclude-dir=PATTERN directories that match PATTERN will be skipped.
  -L, --files-without-match print only names of FILEs containing no match
  -l, --files-with-matches  print only names of FILEs containing matches
  -c, --count               print only a count of matching lines per FILE
  -T, --initial-tab         make tabs line up (if needed)
  -Z, --null                print 0 byte after FILE name

Context control:
  -B, --before-context=NUM  print NUM lines of leading context
  -A, --after-context=NUM   print NUM lines of trailing context
  -C, --context=NUM         print NUM lines of output context
  -NUM                      same as --context=NUM
      --group-separator=SEP use SEP as a group separator
      --no-group-separator  use empty string as a group separator
      --color[=WHEN],
      --colour[=WHEN]       use markers to highlight the matching strings;
                            WHEN is 'always', 'never', or 'auto'
  -U, --binary              do not strip CR characters at EOL (MSDOS/Windows)
  -u, --unix-byte-offsets   report offsets as if CRs were not there
                            (MSDOS/Windows)

猜你喜欢

转载自www.cnblogs.com/nicky-160330/p/9456789.html