Regular expressions and wildcards -- *? The difference between regular expressions and wildcards

1 Introduction

Recently, because of work, I need to write some automation scripts, which need to use regular expressions to match specific strings, so I checked some information about regular expressions. It is mentioned in the data: *Match the previous subexpression 0 or any number of times . I wondered at the time, does * mean the number of matches or can it match any character ? Because in my impression, * can also match any character .

After continuing to consult the information, I realized that I had confused regular expressions and wildcards. The meanings of * in regular expressions and wildcards are different .

There are similar characters: ?

2. Regular expressions and wildcard usage scenarios

In some scenarios, we use regular expressions, and in some scenarios we need to use wildcards.

Since I use it a lot in the linux environment, let's take a shell script as an example:

  Regular expression usage scenarios:

Regular expressions are mainly used to match strings in files, and the main operations are grep, awk, and sed. To put it simply, the focus is on the specific content of the operation file, which is mainly used in shell scripts .

Usage scenarios of wildcards:

Wildcards are also called file name replacement, so they are mainly used to match file names and perform related operations, and are mainly used in the shell command line . Common operations are:

ls, find, cp, mv, etc.

The summary is: regular expressions are mainly used in shell scripts, while wildcards are mainly used in shell command lines .

3. The difference between regular expressions and wildcards

3.1 Basic usage of wildcards

Wildcards are mainly used in the shell command line. Common matching rules are:

The main usage of wildcards
wildcard meaning example
* matches 0 or more of any character a* matches any file starting with a
matches any single character a?.txt can match ab.txt, ac.txt, but not abc.txt
[] matches any character in parentheses

[abc].txt can match a.txt, b.txt, c.txt

[!] matches any character not enclosed in parentheses [!abc]* can match any file that should not start with abc
[a-z] Any single character that matches az, can only be used to find files and not to create files [az]* matches any file starting with az
{a,b,z} Comma separated represents a single character, which can be used to create and find files {a,b,z}* means any file starting with a or b or z
{a..z} .. delimits continuous characters and represents a range {a..z}* means files starting with any lowercase letter

In particular, it should be noted that:

Due to some characters in the above table like *? Characters such as [] have special meanings (usages). For example, * does not represent the character itself, but represents any character, so if you need to match * itself, you need to escape. Escaping is represented by \, such as

\* means * itself, \*abc can match the string "*abc"

To help you understand better, here are some demos:

The structure of the current directory is as follows:

ls [hs]* # 列出所有h或s开头的文件,包括h或s开头的文件夹内的文件

 

touch  test{1..3}.txt  # 创建text1到test3.txt

3.2 Basic usage of regular expressions

Regular expressions are mainly used in shell scripts. Common usage is as follows:

The main usage of regular expressions
metacharacter meaning Usage example
() Indicates a word expression, enclosed in parentheses is a whole
* The previous subexpression or character matches 0 or any number of times

ab* can match a, ab, abb, abbb, etc.

h(ab)* can match h or hab or habab

Matches the previous subexpression or character 0 or 1 times h(ab)* can match h or hab, but not habab
+ 1 or more occurrences of the previous subexpression, extended regular expression
. Matches any single character except newline \n
[] Match any one of the specified characters in parentheses, only one character [aeiou] can match oe in google
[^] matches any character not in parentheses
^ Matches the beginning of the line (when not inside []) ^hello matches the string beginning with hello
$ match end of line
\ Escape character, cancel special meaning \ can match the character *, at this time * no longer indicates the number of matches
{n} Indicates that the preceding character is exactly n times
{n,} Indicates that the previous character appears at least n times
{n,m} Indicates that the previous character appears n~m times
\1 Quote the first left parenthesis and all the content enclosed by the corresponding right parenthesis, \2 is the same

Same as the wildcard rules, when matching the special characters (metacharacters) in the above table, they also need to be escaped:

For example, \* means * the character itself, and the number of times * no longer means that the previous character can appear any number of times.

Guess you like

Origin blog.csdn.net/weixin_43354152/article/details/130918302