Shell programming regular expression grep

definition

Regular expressions are described by using a single string to match a series of strings that meet a certain syntactic rule. In simple terms, it is a method of matching strings. Through some special symbols, it can quickly find, delete, and replace a certain string. Specific string.
Regular expressions are text patterns composed of ordinary characters and metacharacters. Among them, ordinary characters include uppercase and lowercase letters, numbers, punctuation marks and some other symbols. Metacharacters refer to special characters with special meaning in regular expressions. They can be used to specify the leading characters (that is, the characters located in front of metacharacter 1). Character) in the target object.

use

Regular expressions are very important for system administrators. A large amount of information is generated during system operation. Some of these information are very important, and some are only informative information. As a system administrator, if you look at so much information and data directly, you cannot quickly locate important information, such as "user account login failed", "service startup failed" and other information. At this time, you can quickly extract "problematic" information through regular expressions. In this way, the operation and maintenance work can be made simpler and more convenient.

classification

The string expression methods of regular expressions are divided into basic regular expressions and extended regular expressions according to different rigor and function. Basic regular expressions are the most basic part of commonly used regular expressions. Among the common file processing tools in Linux systems, grep and sed support basic regular expressions, while egrep and awk support extended regular expressions.

level

  • Basic regular expression
  • Extended regular expression

Text processing tools

  • grep
  • egrep
  • and
  • awk

composition

  1. Common characters
    Uppercase and lowercase letters, numbers, punctuation marks and some other symbols
  2. Metacharacters
    Special characters with special meaning in regular expressions

Metacharacter

 1. \:转义字符, (让具有特殊意义的元字符作为普通字符去使用)
例如:\!(!),\n(换行)等
 2. ^:匹配字符串开始的位置   (以....开始)
例如:^a,^the,^#
 3. $:匹配字符串结束的位置  (以....结束)
例如:word$
 4. .:匹配除\n之外的任意的一个字符    
例如:go.d,go..d (以g开头,d结尾,中间包括两个字符内容)
 5. *:匹配前面子表达式0次或者多次
例如:goo*d(god,good,goood....),go.*d(god,goad,gord...)
 6. [list]:匹配list列表中的一个字符
例如:go[ola]d,[abc],[a-z],[a-z0-9]
 7. [^list]:匹配任意不在list列表中的一个字符 (排除在外,不匹配)
例如:[^a-z],[^0-9],[^A-Z0-9]  
 8. \{
    
    n,m\}:匹配前面的子表达式n到m次,有\{
    
    n\},\{
    
    n,\},\{
    
    n,m\}三种格式
例如:go\{
    
    2\}d(o出现2次),go\{
    
    2,3\}d(o出现至少2次,最多3次),go\{
    
    2,\}d(o出现2次或者2次以上)
 9. 0{
    
    1}等价于“0+0{
    
    0,}等价于“0*

grep

Common commands

  • -n: indicates the display line number
  • -i: indicates case insensitive
  • -v: indicates reverse filtering
  • []: Find collective characters

Common format

  • ^: Match the beginning of the input string. Unless used in square bracket expressions, it means that the character set is not included. To match the "^" character itself, use "\"
  • $
    : Match the end position of the input string. If the Multiline property of the RegExp (regular expression) object is set, " KaTeX parse error: Undefined control sequence: \n at position 6:" also matches'\̲n̲' or'\r' (line feed). To match... "the character itself, use "$"
  • .: Match any single character except "\n\r"
  • \: Backslash, also known as escape character, removes the special meaning of the metacharacter or wildcard immediately following it.
  • : Matches the preceding sub-expression zero or more times. To match the " " character, use "*"
  • []: Character set. Match any one character contained. For example, "[abc" can match the "a" in "plain"
  • [n1-n2]: Character range. Match any character in the specified range. For example, "[az]" can match any lowercase alphabetic character from "a" to "z".

Note: Only when the hyphen (-) is inside the character group and appears between two characters, can it indicate the range of the subcharacter; if it appears at the beginning of the character group, it can only indicate the hyphen itself

  • {n}: n is a non-negative integer, matched n times. For example, "o{2}" cannot match the "o" in "Bob", but it can match the "oo" in "food"
  • {n,}: n is a non-negative integer that matches at least n times. For example, "o{2,}" cannot match the "o" in "Bob", but it can match all o in "fooood".
  • "O{1,}" is equivalent to "o+" (at least 1 occurrence). "O{0,}" is equivalent to "o*" (at least 0 occurrences)
  • {n,m}: Both m and n are non-negative integers, where n<=m, match at least n times and match at most m times

Filter the content that contains the

[root@server2 ~]# grep -n 'the' test.txt 

Insert picture description here

Filter content that does not contain the

[root@server2 ~]# grep -vn 'the' test.txt 

Insert picture description here

The filter starts with sh, ends with rt, and matches i or o in the middle

[root@server2 ~]# grep -n 'sh[oi]rt' test.txt 

Insert picture description here

Filter o appearing 2 times and appearing more than 2 times

[root@server2 ~]# grep -n 'o\{2\}' test.txt
[root@server2 ~]# grep -n 'o\{2,\}' test.txt

Insert picture description here

Filter content that contains numbers 0-9 and content that does not contain numbers 0-9

[root@server2 ~]# grep -n '[0-9]' test.txt
[root@server2 ~]# grep -n '[^0-9]' test.txt

Insert picture description here

Filter the content starting with the and the content starting with the az letter

[root@server2 ~]# grep -n '^the' test.txt
[root@server2 ~]# grep -n '^[a-z]' test.txt

Insert picture description here

Filter content ending in.

[root@server2 ~]# grep -n '\.$' test.txt

Insert picture description here

Filter blank line content

[root@server2 ~]# grep -n '^$' test.txt

Insert picture description here

Filter the content that starts with w and ends with d and the middle is any two characters

[root@server2 ~]# grep -n 'w..d' test.txt

Insert picture description here

Filter to find at least two o's

[root@server2 ~]# grep -n 'ooo*' test.txt

Insert picture description here

Matches any character from 0-9 and can be repeated 0 or more times and matches starts with w and ends with d, and any character in the middle can be repeated 0 or more times

[root@server2 ~]# grep -n '[0-9][0-9]*' test.txt
[root@server2 ~]# grep -n 'w.*d' test.txt

Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_49343462/article/details/109636024