Detailed analysis of regular expressions in Shell

Overview

  • Regular expression, also known as regular expression (Regular Expression), is a logical formula for string manipulation. It uses predefined specific characters and combinations of these specific characters to form a "regular string". This " "Rule string" is used to express a kind of filtering logic on the string.
  • Regular expressions are usually used in judgment statements to check whether a string of characters meets a certain format
  • Regular expressions are composed of ordinary characters and metacharacters
    • Common characters: including uppercase and lowercase letters, numbers, punctuation marks and some other symbols
    • Metacharacters: Special characters with special meaning in regular expressions, which can be used to specify the appearance mode of its leading characters in the target object
    • Leading character: the character before the metacharacter
  • . [] ^ $ The four characters are regular expressions supported by all languages, so these four are basic regular expressions. The regularity is difficult to understand because there is a concept of equivalence, which greatly increases the difficulty of understanding, and makes many beginners seem confused. If you restore the equivalence to the original writing, it is super simple to write the regularity yourself, just like talking. Just write your regular

Basic regular expression common metacharacters

  • Supported tools: grep, egrep, sed, awk
Metacharacter Description
\ Escape character, used to cancel the meaning of special symbols
such as: \!, \n, $, etc.
^ The starting position of the matching string,
for example: ^a, ^the, ^#, ^[az]
$ The position at the end of the matching string, for
example: word$, ^$ (matching a blank line)
. Match any character other than \n
For example: ea.y, e...y
* Match the preceding sub-expression 0 or more times
For example: goo*d, go.*d
[list] Match a character in the list, for
example: ea[sla]d,[abc], [az], [a-z0-9], [0-9] (match any digit)
[^list] Match any character in a non-list list, for
example: [^0-9], [^A-Z0-9], [^az] (match any non-lowercase letter)
\{n\} Match the preceding sub-expression n times.
For example: go\{2\}d,'[0-9]\{2\}' (matches two digits)
\{n,\} Match the preceding sub-expression no less than n times.
For example: go\{2,\}d,'[0-9]\{2,\}' (matches two or more digits)
\{n,m\} Match the preceding sub-expression n to m times.
For example: go\{2,3}d,'[0-9]\{2,3\}' (matches two to three digits)
note Because {} has other functions, most tools need to add escape characters to cancel the special meaning.
When egrep and awk use {n}, {n,}, {n,m} to match, before "{}" No need to add "\"

Extended regular expression metacharacters

  • Supported tools: egrep, awk
Metacharacter Description
+ Match the preceding sub-expression more than once,
for example: go+d (match at least one o, such as god, good, goood, etc.)
? Match the preceding sub-expression 0 or 1 time
For example: go?d (match gd or god)
() The string in the brackets as a whole, can be combined with +,? , * Use
for example: g(oo)+d (match oo as a whole more than once, such as good, gooood, etc.)
| Match the string of characters in an or way,
for example: g(oo|la)d (will match good or glad)

Regular expression syntax support

Command or environment . [ ] ^ $ \(\) \{\} ? + | ()
we stand by stand by stand by stand by stand by
Visual C++ stand by stand by stand by stand by stand by
awk stand by stand by stand by stand by Awk supports this grammar, just add the --posix or --re-interval parameter to the command line, see the interval expression in man awk stand by stand by stand by stand by
and stand by stand by stand by stand by stand by stand by
delphi stand by stand by stand by stand by stand by stand by stand by stand by stand by
python stand by stand by stand by stand by stand by stand by stand by stand by stand by stand by
java stand by stand by stand by stand by stand by stand by stand by stand by stand by stand by
javascript 支持 支持 支持 支持 支持 支持 支持 支持 支持
php 支持 支持 支持 支持 支持
perl 支持 支持 支持 支持 支持 支持 支持 支持 支持
c# 支持 支持 支持 支持 支持 支持 支持 支持 支持 支持

匹配手机号

  • 完整的匹配出13和15开头且共11位的手机号,非11位的都不讲匹配出
[root@localhost ~]# egrep "^(13|15)[0-9][ ]?[0-9]{4}[ ]?[0-9]{4}$" 8.txt
13133366888
157 1519 2901
[root@localhost ~]# vim 8.txt
1333333333#
155.5533385
18888888888
123456789?1
13133366888
123456789
88888787
157 1519 2901
1311885578
1567890
151234567890
13141516171819

匹配邮箱

  • 匹配出满足格式要求的@sohu.com、@qq.com、@163.com、@wo.cn、@sina.com.cn的邮箱
  • 邮箱格式:用户名以字母开头,中间可用最多2种符号 - 或 . ,不能使用符号结尾,用户名长度为最少6个字符
[root@localhost ~]# egrep "^[a-zA-Z][a-zA-Z0-9\.\-]{4,}[a-zA-Z0-9]@([a-zA-Z0-9_\-\.]+)\.([A-Za-z]{2,5})$" email.txt 
【用户名:因为-.有连续和任意字符的意义,所以加上转义符\来表示。因为最少6位,所以中间用{
    
    4,}代表至少4位以上的字符
 子域名:可以包含大写A-Z,小写a-z,数字0-9,符号“-”和“.”且一次以上
.顶级域:因为以.开头所以用转义符\.表示,且包含大写A-Z,小写a-z,且25位,并用$来表示结尾】

qwrqwrg@sohu.com
qfgqwg.gqt-gewg@qq.com
WQ.QR1131@sina.com.cn
wer123@sina.com
[root@localhost ~]# vim email.txt
qwrqwrg@sohu.com
qfgqwg.gqt-gewg@qq.com
qe88@163.com
QFQW SFG@wo.cn
WQ.QR1131@sina.com.cn
qwrqwr@sina.123
123wer$sina.com
wer123@sina.com

Guess you like

Origin blog.csdn.net/TaKe___Easy/article/details/114753552