Overview

Regular expression, also known as regular expression (Regular Expression), is a logical formula for string manipulation. It uses predefined specific characters and combinations of these specific characters to form a "regular string". This " "Rule string" is used to express a kind of filtering logic on the string.
Regular expressions are usually used in judgment statements to check whether a string of characters meets a certain format
Regular expressions are composed of ordinary characters and metacharacters
- Common characters: including uppercase and lowercase letters, numbers, punctuation marks and some other symbols
- Metacharacters: Special characters with special meaning in regular expressions, which can be used to specify the appearance mode of its leading characters in the target object
- Leading character: the character before the metacharacter
. [] ^ $ The four characters are regular expressions supported by all languages, so these four are basic regular expressions. The regularity is difficult to understand because there is a concept of equivalence, which greatly increases the difficulty of understanding, and makes many beginners seem confused. If you restore the equivalence to the original writing, it is super simple to write the regularity yourself, just like talking. Just write your regular

Basic regular expression common metacharacters

Supported tools: grep, egrep, sed, awk

Metacharacter	Description
\	Escape character, used to cancel the meaning of special symbols such as: \!, \n, $, etc.
^	The starting position of the matching string, for example: ^a, ^the, ^#, ^[az]
$	The position at the end of the matching string, for example: word$, ^$ (matching a blank line)
.	Match any character other than \n For example: ea.y, e...y
*	Match the preceding sub-expression 0 or more times For example: good, go.d
[list]	Match a character in the list, for example: ea[sla]d,[abc], [az], [a-z0-9], [0-9] (match any digit)
[^list]	Match any character in a non-list list, for example: [^0-9], [^A-Z0-9], [^az] (match any non-lowercase letter)
\{n\}	Match the preceding sub-expression n times. For example: go\{2\}d,'[0-9]\{2\}' (matches two digits)
\{n,\}	Match the preceding sub-expression no less than n times. For example: go\{2,\}d,'[0-9]\{2,\}' (matches two or more digits)
\{n,m\}	Match the preceding sub-expression n to m times. For example: go\{2,3}d,'[0-9]\{2,3\}' (matches two to three digits)
note	Because {} has other functions, most tools need to add escape characters to cancel the special meaning. When egrep and awk use {n}, {n,}, {n,m} to match, before "{}" No need to add "\"

Extended regular expression metacharacters

Supported tools: egrep, awk

Metacharacter	Description
+	Match the preceding sub-expression more than once, for example: go+d (match at least one o, such as god, good, goood, etc.)
?	Match the preceding sub-expression 0 or 1 time For example: go?d (match gd or god)
()	The string in the brackets as a whole, can be combined with +,? , * Use for example: g(oo)+d (match oo as a whole more than once, such as good, gooood, etc.)
\|	Match the string of characters in an or way, for example: g(oo\|la)d (will match good or glad)

Regular expression syntax support

Command or environment	.	[ ]	^	$		\{\}	?	+	\|	()
we	stand by	stand by	stand by	stand by	stand by
Visual C++	stand by	stand by	stand by	stand by	stand by
awk	stand by	stand by	stand by	stand by		Awk supports this grammar, just add the --posix or --re-interval parameter to the command line, see the interval expression in man awk	stand by	stand by	stand by	stand by
and	stand by	stand by	stand by	stand by	stand by	stand by
delphi	stand by	stand by	stand by	stand by	stand by		stand by	stand by	stand by	stand by
python	stand by	stand by	stand by	stand by	stand by	stand by	stand by	stand by	stand by	stand by
java	stand by	stand by	stand by	stand by	stand by	stand by	stand by	stand by	stand by	stand by
javascript	支持	支持	支持	支持	支持		支持	支持	支持	支持
php	支持	支持	支持	支持	支持
perl	支持	支持	支持	支持	支持		支持	支持	支持	支持
c#	支持	支持	支持	支持	支持	支持	支持	支持	支持	支持

匹配手机号

完整的匹配出13和15开头且共11位的手机号，非11位的都不讲匹配出

[root@localhost ~]# egrep "^(13|15)[0-9][ ]?[0-9]{4}[ ]?[0-9]{4}$" 8.txt
13133366888
157 1519 2901
[root@localhost ~]# vim 8.txt
1333333333#
155.5533385
18888888888
123456789?1
13133366888
123456789
88888787
157 1519 2901
1311885578
1567890
151234567890
13141516171819

匹配邮箱

匹配出满足格式要求的@sohu.com、@qq.com、@163.com、@wo.cn、@sina.com.cn的邮箱
邮箱格式：用户名以字母开头，中间可用最多2种符号 - 或 . ，不能使用符号结尾，用户名长度为最少6个字符

[root@localhost ~]# egrep "^[a-zA-Z][a-zA-Z0-9\.\-]{4,}[a-zA-Z0-9]@([a-zA-Z0-9_\-\.]+)\.([A-Za-z]{2,5})$" email.txt 
【用户名：因为-和.有连续和任意字符的意义，所以加上转义符\来表示。因为最少6位，所以中间用{
    
    4,}代表至少4位以上的字符
 子域名：可以包含大写A-Z，小写a-z，数字0-9，符号“-”和“.”且一次以上
.顶级域：因为以.开头所以用转义符\.表示，且包含大写A-Z，小写a-z，且2到5位，并用$来表示结尾】

qwrqwrg@sohu.com
qfgqwg.gqt-gewg@qq.com
WQ.QR1131@sina.com.cn
wer123@sina.com
[root@localhost ~]# vim email.txt
qwrqwrg@sohu.com
qfgqwg.gqt-gewg@qq.com
qe88@163.com
QFQW SFG@wo.cn
WQ.QR1131@sina.com.cn
qwrqwr@sina.123
123wer$sina.com
wer123@sina.com

Detailed analysis of regular expressions in Shell

Regular expression

Overview

Basic regular expression common metacharacters

Extended regular expression metacharacters

Regular expression syntax support

匹配手机号

匹配邮箱

Guess you like