Some things about POSIX regular expressions

Some things about POSIX regular expressions

Snowy Toad Jump 360 Cloud Computing

Heroine declaration

These special characters are often encountered in programming: *?+[]{}^$()|, what special meaning do they have? What are wildcards, BRE, ERE, and PCRE? This article introduces the first three.
PS: Rich first-line technology and diversified forms of expression are all in the "HULK first-line technology talk", please pay attention!

There are 3 questions below. If you can answer them, then this article may be a piece of cake for you, wasting your precious time, so I can only say sorry here. If you don't know something, you should know something about these after reading this article, you can try to come back and look at these three questions.

  1. What is a wildcard? Does [AZ] represent the same characters in different locales?
  2. Under what circumstances are wildcards used, and when are regular expressions used?
  3. What are the two types of POSIX regular expressions? What is the difference between them?

    Introduction

    Because the shell frequently uses file names, the shell body uses special characters to help quickly specify a set of file names—wildcards.


* 匹配任意多个字符
? 匹配任意一个字符
[characters] 匹配任意一个字符
[!characters] 匹配任意不是字符集中的字符
[[:class:]] 任意属于指定字符类中的字符

First introduce the first four:
(all files), g (files beginning with g), b*.txt (txt files beginning with b), Data??? (files with data length 7 at the beginning), [abc] (Files starting with a or b or c), abc[0-9][0-9] (files starting with abc followed by two numbers), [AZ] (this will display differently under different language settings the result of)

Let's look at a few examples: The following example is under centos.
Some things about POSIX regular expressions

AZ difference

We see that the last one above does not output the result we want. Why?

Let's listen to us slowly. When UNIX just debuted, it was only in the United States. So there were 0-127 ASCII at that time. Their alphabetical order was ABC...XYXabc...xyz, but with The slow development of UNIX began to spread in non-English-speaking countries, so ASCII characters were expanded, using the entire 8 bits and adding characters 128-255, so that more languages ​​can be accommodated.

In order to support this ability, the POSIX standard introduces a concept called locale, which can be adjusted to the language habits of one's own country. Some countries are sorted according to sub-codes, and their sorting is aAbBcC...xXyYzZ.

In this way, when ls [AZ]* is used, all letters except a are actually matched.

Let's take a look at an example, and actually modify the locale to see the result:

Some things about POSIX regular expressions

We saw the above change to LANG="POSIX" before switching the language, and then switch to other, so that we can see the difference, I tried to switch directly and it didn't work. Where can wildcards be used? Unix commands that can add file names can use wildcards, ls, cat, less, more, vim, grep, sed, awk...

Character class

Because there will be different performances according to different locales, the POSIX standard has come up with a character set, and directly specify the commonly used ones.

[:alnum:] matches any letter or number
[:alpha:] matches any letter
[:digit:] matches any number
[:lower:] matches lowercase letters
[:upper:] matches uppercase letters

Let's test it:

Some things about POSIX regular expressions

I found this is good, but it's a little overwhelming to express the range [AM].

POSIX regular expression

We often hear that our program supports POSIX BRE regular expressions, and this supports POSIX ERE regular expressions, so let’s talk about POSIX regular expressions. Here is just a brief introduction to their differences and specific usage. What each metacharacter represents is not elaborated.

POSIX basic VS extended regular expression

POSIX divides regular expressions into basic regular expressions (BRE) and extended regular expressions (ERE).

So what is the difference between them? In fact, they are quite similar in that they support different metacharacters. BRE supports ^$.[]*\, and ERE supports these (){}?+| in addition to the above.

Application support programs that support
BRE are {sed,grep...}, and programs that support ERE are {egrep,grep -E,awk...}

Exercise:
The following is the basic regular expression supported by grep. If you encounter an extended regular expression, you can use {to represent it.

Some things about POSIX regular expressions

The following is egrep to achieve the same command.

Some things about POSIX regular expressions

If you need to see the difference between them, you can see it in an example.

Some things about POSIX regular expressions
Some things about POSIX regular expressions

The above example can be understood as follows:

The basic regular expression {is the string "{", {is equivalent to {in the extended regular expression. In the following examples, we see "No problem for normal use". If there is no need to add \ Adding the places that should be added will produce unpredictable results. The results of each person's attempt may be different, so after remembering these contents, be sure to write the correct regular expressions, or you will not be able to understand them for a while What do you mean.

Guess you like

Origin blog.51cto.com/15127564/2668361
Recommended