grep text Three Musketeers and regular expressions

1、grep

1. What is grep, egrep and fgrep

  1. Linux systems grep command is a powerful text search tool, you can use a regular expression search text, and print out the matching rows (matched to the standard red). grep stands for Global Regular Expression Print, represents the global regular expression version, its usage rights for all users.

  2. grep works like this, it searches for a string in one or more template files. If the template includes a space, it must be quoted, all strings after the template is regarded as the file name. The search results are sent to standard output, without affecting the original file contents.

  3. grep can be used in shell scripts, because grep by returning a status value to illustrate the state of the search, if the template search is successful, it returns 0 if the search is unsuccessful, returns 1 if the file search does not exist, returns 2. We use this return value can be a number of automated text processing.

  4. egrep = grep -E: extended regular expressions  (except \ <, \>, \ b  other regularization can be removed \)

  5. fgrep = grep -F: Regular expressions are not supported, a normal string can filter

   Role: text search tool, based on user-specified "pattern" matching check of the target text line by line; print matching to the line
   mode: the regular expression filtering criteria character and text characters written
   Usage: grep [the OPTIONS] the PATTERN [ FILE ...]

Options:

 --color = auto: matching colored text to show
 -v: display is not to match the pattern of lines 
 -i: Ignore character case 
 -n: display the line numbers match 
 -c: statistics match rows  -o: display only the matched string 
 -q: silent mode does not output any information 
 -A #: after, the # line 
 -B #: before, before the line # 
 -C #: context , before and after # line 
 -e: to achieve a logical relationship between the number of options or grep -e 'CAT' -e 'Dog' File 
 -w: match whole words -E: use ERE-F: the equivalent of fgrep does not support regular expressions 
 -f file: file processing according to the pattern


2. grep combat demo

Example 1:

Disk utilization will be reverse order:

[root@ansibledata]#df | grep  /dev/sd |tr -s " " % |cut -d% -f5 |sort -nr
17
5
1

Example Two:

[root @ ansibledata] #cat / etc / passwd | grep -nA1 root # line after filtering the root 
. 1: root: X: 0: 0: root: / root: / bin / the bash 
2-bin: X:. 1:. 1 : bin: / bin: / sbin / nologin 
- 
10: operator: X:. 11: 0: operator: / the root: / sbin / nologin 
. 11-games: X: 12 is: 100: games: / usr / games: / sbin / nologin 
[root @ ansibledata] #cat / etc / the passwd | grep -nB1 filtered root root # previous row 
. 1: root: X: 0: 0: root: / root: / bin / the bash 
- 
. 9-mail: X :. 8: 12 is: mail: / var / spool / mail: / sbin / nologin 
10: operator: X:. 11: 0: operator: / the root: / sbin / nologin 
[the root @ ansibledata] #cat / etc / the passwd | grep -nC1 root # root before and after a line filter 
. 1: root: X: 0: 0: root: / root: / bin / the bash 
2-bin: X:. 1:. 1: bin: / bin: / sbin / nologin 
- 
. 9 -mail: x: 8: 12: mail: / var / spool / mail: / sbin / nologin 
10 : operator: x: 11: 0 : operator: / root: / sbin / nologin
11-games:x:12:100:games:/usr/games:/sbin/nologin

Example Three:

[root @ ansibledata] #grep -e root -e wang / etc / passwd or root user comprising wang 
root: X: 0: 0: root: / root: / bin / the bash 
operator: X:. 11: 0: operator: / root: / sbin / nologin 
wang: the X-: 1001: 1001 :: / Home / wang: / bin / bash

Example Four:

[root @ ansibledata] #grep root / etc / passwd | grep bash bash including root line and including 
root: x: 0: 0: root: / root: / bin / bash

Example five:

[root@ansibledata]#grep -v root /etc/passwd  显示除了root的字符
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt

Example VI:

Remove the intersection within a string:

[root@ansibledata]#grep -f f1 f2
aa
ccc
[root@ansibledata]#cat f1
aa
bbb
ccc
ee
[root@ansibledata]#cat f2
aa
ss
ccc

2, regular expressions 

1, REGEXP: Regular Expressions, a character mode and a special kind of written text characters, wherein some characters (character-membered) literally does not represent a character, and a control function or a wildcard

2, the program supports: grep, sed, awk, vim, less, nginx, varnish, etc.

3, divided into two categories:

4, basic regular expressions: BRE

5, extended regular expressions: ERE
     grep -E, egrep

6, the regular expression engine:

Using different algorithms, check processing software modules regular expressions PCRE (Perl Compatible Regular Expressions)

Metacharacters Category: character, the matching frequency and location of anchors, group

man 7 regex

Basic regular expression metacharacters 

. Matches any single character 
[] matches any single character within a specified range, an example: [Wang] [0-9] [AZ] [the Z-zA-A] 
[^] matches any single character outside the specified range 
[: alnum :] letters and numbers 
[: Alpha:] represents any case English characters, i.e. AZ, AZ 
[: Lower:] lowercase letters [: Upper:] uppercase 
[: blank:] white space (spaces and tabs) 
[: space:] vertical and horizontal whitespace (ratio [: blank:] contains a wide range) 
[: CNTRL:] control character (backspace, bell ...) nonprintable 
[: digit:] decimal number [: xdigit:] hexadecimal number 
[: graph:] non-blank printable characters 
[: print:] Printable characters 
[: punct:] punctuation

Practical exercise:

(1). Matches any single character
[root @ centos7-1 ~] #grep r..t / etc / passwd r behind two match with any character of 
the root: X: 0: 0: the root: / the root: / bin / the bash 
operator: X:. 11: 0: operator: / the root: / sbin / nologin 
FTP: X: 14: 50: the FTP the User: / var / FTP: / sbin / nologin 
. [centos7-1 the root @ ~] # grep home / etc / back home match the passwd with an arbitrary character 
liu: the X-: 1000: 1000: liu: / Home / liu: / bin / bash 
wang: the X-: 1001: 1001 :: / Home / wang: / bin / bash

   

 (2) [] matches any single character within a specified range of the sample: [wang] [0-9] [az] [a-zA-Z]

[root@centos7-1~]#grep [wang] /etc/passwd

  

Any single character (3) [^] matches outside the specified range

  grep [^ 0-9] / etc / fstab information matches the current profile of the non-numeric

 

(4) ifconfig ens33 | grep netmask | grep [[:. Digit:]]. Digital and content match

Note: [.] Is a point inside the brackets in point, the point [] is to match any one character.

Match times: after the character to be used in a specified number of times specified for the preceding character to appear

* Character matches any of the foregoing times, including zero 
greedy pattern: as long as possible match 
* any character of any length. 
\ 0 matches its preceding character or 1? 
\ + Character matching front least 1 
\ {n \} matches the preceding character n times 
\ {m, n \} matches the preceding character at least m times, up to n times 
\ {, n \} character matches the front up to n times 
\ {n, \} matches the preceding character at least n times
(1) * character matches any of the foregoing times, including zero 
      greedy pattern: as long as possible matches

  

(2) any character of any length *

   

(3). * Any character of any length
grep "go. * gle" f1 match more than one letter o 
grep "g. * gle" f1 matches the beginning of the letter g once more characters

  

(4) \? Match its preceding character 0 or 1 times

(5) \ + matches its preceding character at least once

  

 

(6) Remove the IP address:

ifconfig ens33 | grep -o "[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" | head -n1

 

Location anchoring: positioning to appear

^ The first anchor line, the leftmost mode for 
$ anchored end of the line, for the rightmost pattern 
^ $ for pattern matching the PATTERN entire line 
^ $ empty line 
^ [[: space:]] * $ Blank OK 
\ <or \ b anchor the first word, the left mode for the word 
\> or \ b anchor ending, for the word mode right 
\ <the pATTERN \> whole words

(1)^ 行首锚定,用于模式的最左侧

(2)$ 行尾锚定,用于模式的最右侧

[root@centos7-1data]#grep "bash$" /etc/passwd  #以bash结尾的行
root:x:0:0:root:/root:/bin/bash
liu:x:1000:1000:liu:/home/liu:/bin/bash
wang:x:1001:1001::/home/wang:/bin/bash
[root@centos7-1data]#grep "^root" /etc/passwd  #以root开头的行
root:x:0:0:root:/root:/bin/bash
[root@centos7-1data]#cat f1
google
gooogle
gooooogle
ggle
gogle
[root@centos7-1data]#grep "^google$" f1    #只筛选google的行
google
[root@centos7-1data]#grep -v "^$" f1   显示非空行
google
gooogle
gooooogle
ggle
gogle

grep  -v "^#"  /etc/fstab   显示非#开头的行

(3)\< 或 \b 词首锚定,用于单词模式的左侧

(4)\> 或 \b 词尾锚定,用于单词模式的右侧

(5)\<PATTERN\> 匹配整个单词

什么是单词的分隔符?

答曰:不能是数字、字母、下划线,其他的都可以。

[root@centos7-1data]#grep "root\>" /etc/passwd  以root为结尾的单词
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
[root@centos7-1data]#grep "\<root" /etc/passwd  以root为开头的单词
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

 分组和后向引用

(1)格式

① 分组:\(\) 将一个或多个字符捆绑在一起,当作一个整体进行处理

  分组括号中的模式匹配到的内容会被正则表达式引擎记录于内部的变量中,这些变量的命名方式为: \1, \2, \3, ...

② 后向引用

引用前面的分组括号中的模式所匹配字符,而非模式本身

\1 表示从左侧起第一个左括号以及与之匹配右括号之间的模式所匹配到的字符

\2 表示从左侧起第2个左括号以及与之匹配右括号之间的模式所匹配到的字符,以此类推

\& 表示前面的分组中所有字符

③ 流程分析如下:

 (2)或者:\|

[root@centos7-1data]#grep  "\(abc\)\{3\}" f2   #表示匹配3次abc
abcabcabc
[root@centos7-1data]#cat  > f2
xyz xyz 
abc xyz abc xyz 
abc xyz xyz abc
^C
[root@centos7-1data]#grep "\(abc\).*\(xyz\).*\1"  f2  #含义是匹配以abc和xyz中间的内容,最后以abc结尾的行
abc xyz abc xyz 
abc xyz xyz abc
[root@centos7-1data]#grep "\(abc\).*\(xyz\).*\2"  f2 # 含义是匹配以abc至xyz中间的内容,最后以xyz结尾的行
abc xyz abc xyz 
abc xyz xyz abc

 grep "^\(a\|b\)"  /etc/passwd  取出a或b开头的行

ifconfig  ens33 | grep -o "\([0-9]\{1,3\}\.\)\{3\}[0-9]\{1,3\}"  取出IP地址,前面的括号分组是第一个IP.  {3}是将此IP重复3次,最后[0-9]\{1,3\} 是最后一次的IP地址 

 3、扩展正则表达式:

(1)字符匹配:

  •  .  任意单个字符
  •  []  指定范围的字符
  •  [^] 不在指定范围的字符
  •    次数匹配:
  •  * :匹配前面字符任意次
  •  ?  : 0 或1次
  •  + :1 次或多次
  •  {m} :匹配m次 次
  •  {m,n} :至少m ,至多n次

(2)位置锚定:

  •  ^ : 行首
  •  $ : 行尾
  •  \<, \b : 语首
  •  \>, \b : 语尾
  •    分组:()
  •  后向引用:\1, \2, ...
  • 或者:

    a|b a或b

    C|cat C或cat

    (C|c)at Cat或cat

(3)总结

  除了\<, \b : 语首、\>, \b : 语尾;使用其他正则都可以去掉\。

实战演示:

(1)显示基名

[root@centos7-1data]#echo  "/etc/rc.d/init.d/function" | grep  -oE "[^/]+$"
function

(2)显示目录名:两次的grep为了处理两次/

[root@centos7-1data]#echo  "/etc/rc.d/init.d/function/" | grep  -oE ".*[^/]" | grep -Eo ".*/"
/etc/rc.d/init.d/

(3)筛选出所有的IP地址

[root@centos7-1data]#ifconfig | grep -Eo "(([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])"
192.168.34.102
255.255.255.0
192.168.34.255
192.168.34.117
255.255.255.0
192.168.34.255
127.0.0.1
255.0.0.0
192.168.122.1
255.255.255.0
192.168.122.255

 

 

  

 

 

Guess you like

Origin www.cnblogs.com/struggle-1216/p/11822001.html