Regular expressions are also called regular expressions and regular expressions. Regular expressions use a single string to describe and match a series of strings that meet a certain syntactic rule. Regular expressions are a method of matching strings. Through some special symbols, it can quickly find, delete, and replace a specific String
Regular expressions are generally used in script programming and text editors. Many text processors and programming languages support regular expressions, such as the text processors (grep, egrep, sed, awk) in the linux system and the widely used Python language. Regular expressions have a very powerful text matching function, which can process text quickly and efficiently in the text ocean.
The string expression method of regular expression is divided into basic regular expression and extended regular expression according to different rigor and function . Basic regular expressions are the most basic part of commonly used regular expressions. Among the common file processing tools in Linux systems, grep and sed support basic regular expressions , while egrep and awk support extended regular expressions .
Matches beginning with wo and ending with d, with more than 2 o characters in the middle
grep -n ‘wo\{
2,\}d
1.6 Summary of metacharacters
character
usage
^
Match the beginning of the input string. Unless used in square bracket expressions, it means that the character set is not included. To match the "^" character itself, use "^"
$
End with what
.
Any single character
\
Use with metacharacters to convert metacharacters to ordinary characters
*
Matches the number of previous characters
[]
One of the characters in the middle matches
[^]
Assignment character set. Matches an arbitrary character that is not included. For example, "[^bc]" can match any letter in "plain"
[n1-n2]
Character range. Match any character in the specified range. For example, "[az]" can match any lowercase alphabetic character from "a" to "z". Note: Only when the hyphen (-) is inside the character group and appears between two characters, can it indicate the range of characters; if it appears at the beginning of the character group, it can only indicate the hyphen itself
{n}
n is a non-negative integer, matching certain n times. For example, "o{2}" cannot match the "o" in "Bob", but it can match the "oo" in "food"
{n,}
n is a non-negative integer that matches at least n times. For example, "o{2,}" cannot match the "o" in "Bob", but it can match all o in "foooood". "O{1,}" is equivalent to "o+". "O{0,}" is equivalent to "o*"
{n,m}
Both m and n are non-negative integers, where n<=m, match at least n times and match at most m times
Role: Repeat one or more of the previous character
?
Function: zero or one character before
|
Function: Use or (or) to find multiple characters
()+
Function: Identify multiple repeated groups, example: "egrep -n'A(xyz)+C' test.txt". The command is to query the beginning of "A" and the end of "C", and there is more than one "xyz" string in the middle.
Three, text processor
3.1 sed tool
sed (Stream EDitor) is a powerful and simple text parsing and conversion tool that can read text and edit the text content (delete, replace, add, move, etc.) according to specified conditions, and finally output all lines or only output processing Certain lines. Sed can also implement quite complex text processing operations without interaction, and is widely used in Shell scripts to complete various automated processing tasks.
The work flow mainly includes reading, executing and displaying three processes. Reading: sed reads a line of content from the input stream (file, pipe, standard input) and stores it in a temporary buffer (also known as pattern space) ). Execution: By default, all sed commands are executed sequentially in the pattern space, except for the specified line address, otherwise the sed command will be executed on all lines at once. Display: Send the modified content to the output stream. After sending the data, the pattern space will be cleared. Before all the file contents have been processed, the above emptying will be repeated until the contents are all cleaned up. The default is to execute in the pattern space, so the input file will not change in any way, unless redirection is used to store the output.
"Operation" is used to specify the action behavior of file operations, that is, the sed command. Normally, it is the format of "[n1[,n2]]" operating parameters. n1, n2 are optional, representing the choice to operate If the operation needs to be performed between 5-20 lines, it is expressed as "5, 20 action behavior".
The nl command is used to count the number of lines in a file, and the results of the command execution can be viewed more intuitively with this command.
The s (string replacement), c (full line/block replacement), and y (character conversion) command options are required when using the sed command to perform the replacement operation .
In Linux/UNIX systems, awk is a powerful editing tool. It reads the input text line by line, searches it according to the specified matching mode, and performs formatting output or filtering processing on the content that meets the conditions. Under the circumstances, quite complex text operations are realized.
The execution result of wk can be printed and displayed through the print function. In the process of using the awk command, you can use the logical operators "&&" to mean "and", "||" to mean "or", and "!" to mean "not"; you can also perform simple mathematical operations, such as +,- , *, /, %, ^ represent addition, subtraction, multiplication, division, remainder and power respectively.
Commands are often used to replace, compress, and delete characters from standard input. You can replace a group of characters into another group of characters, often used to write beautiful single-line commands, very powerful.