Linux Three Musketeers (grep, sed, awk)

Linux Three Musketeers

The Linux Three Musketeers refer togrepsedawkAmong the three commands, grep is mainly for searching , sed is mainly for editing , and awk is mainly for splitting .

grep

grep is an abbreviation for global regular expressions print. The grep command can search for a specific character pattern in one or more files . This pattern can be a single character, string, word, or sentence. grep can search for specified strings in text, and is one of the most commonly used text processing tools in linux. Wildcards for regular expressions are as follows:

  • *: will match 0 or more characters.
  • .: will match any character, and can only be one character.
  • [xyz]: Match any character in square brackets.
  • [^xyz]: Match any character in square brackets.
  • ^: Lock the beginning of the line.
  • $: Lock the end of the line.
  • ?: Match the preceding subexpression 0 or 1 time.
  • +: Match the preceding subexpression one or more times.
  • |: Matches |a regular expression before or after a symbol.
  • {n,m}: The difference between matching at least n times and matching at most m times and BRE is that there is no need to add \.

In basic regular expressions, wildcards are used for their original meaning and need to be added \as escape characters.

The grep command is used to search for a specific pattern in each file. When grep is used, the contents of each line containing the specified character pattern will be printed to the screen, but the grep command does not change the contents of the file.

grep command formatgrep [选项] 模式 文件名

The pattern here is either a string or a regular expression. Commonly used options are shown in the table below.

  • -c: List only the number of lines in the file that contain the pattern, that is, the total number of lines matched.

  • -i: Ignore letter case in the pattern.

  • -l: List filenames with matching lines.

  • -n: List the line number at the beginning of each line.

  • -v: List lines that do not match the pattern.

  • -w: Search for the expression as a complete single character, ignoring those partially matched lines.

  • -color=autoOr -color: means to color the matched text.

  • -o: Only display the strings of symbolic conditions, but not the entire line, each string of symbolic conditions is displayed on a separate line.

  • -w: Match the entire word, if the word is included in the string, no match will be made.

If you are searching multiple files, the search results of the grep command will only display the filenames that match the pattern found in the files. If you search a single file, the results of the grep command will display every line that contains the matching pattern.

sed

sed principle

sed is a stream editor that edits a stream of data based on a pre-provided set of rules before the editor processes the data. The sed editor can process data in a data stream according to commands , either entered from the command line or stored in a command text.

sed is a powerful tool for manipulating, filtering and converting text content. Common functions include addition, deletion, modification , query , filtering , and line fetching .

Format of sed commandas follows:sed [options] [sed-commands] [input-file]

  • Commonly used options are: -n---- Suppress default output, -e---- Execute multiple editing commands, -i---- Modify directly in the source file.
  • sed-commands: It can be either a single sed command or a combination of multiple sed commands.
  • input-file: Optional, sed can also get input from standard output such as pipes.

sed reads a line from a file or pipeline , puts it in the pattern space , and processes it. After processing a line, it reads a line and processes another line . The pattern space is a temporary cache inside sed , which is used to store the read content.

Sed can process a single line or multiple lines. If the address range is not specified in front of the sed command, all lines will be matched by default.

sed process

The workflow of sed mainly includes three processes of reading , executing and displaying .

read process: sed reads a line from the input stream (file, pipe, standard input) and stores it in a temporary buffer (also known aspattern space)。

Implementation process: By default, all sed commands are executed smoothly in the pattern space. Unless the address of the line is specified, the sed command will be executed sequentially on all lines.

show process: Send the modified content to the output stream. After the data is sent, the pattern space will be cleared, and the above process is repeated until all the contents of the file are processed.

sed-options

sed command format

  • sed -e '操作' 文件1 文件2
  • sed -n -e '操作' 文件1 文件2
  • sed -f 脚本文件 文件1 文件2
  • sed -e -i '操作' 文件1 文件2

Common options:

  • -eOr --expression: Indicates that the specified command is used to process the input text file. It can be omitted when there is only one operation command. It is generally used when executing multiple operation commands.
  • -fOr --file: Indicates that the specified script file is used to process the input text file.
  • -hOr --help: Display help.
  • -nOr s --quiet: Suppresses the output of the sed editor, but can be used with the p command to complete the output.
  • -i: Modify the text file directly.
  • -r, -E: use extended regular expressions
  • -s: Treat multiple files as separate files, rather than as a single continuous stream of long files.

Common operations:

  • s: Replace the specified character (replacement).
  • d: Delete the specified line (Delete).
  • a: Add one line of specified content on the specified line (increase).
  • i: Insert a row of specified content in the specified previous row (insert).
  • c: Replace the content of the selected line with the specified content (replace).
  • y: Character conversion, the character length after conversion must be the same.
  • p: Print, if the line is specified at the same time, it means to print the specified line, if no line is specified, it means to print all the content; if there are non-printing characters, it will be output in Ascii code. Usually used with the _n option.
  • =: Print line number.
  • l: Allow text and non-printable ASCII characters in the data stream.

sed search

Use the sed command to view:

method one:sed ' ' /etc/shadow

root@chengyan-virtual-machine:~# sed ' ' /etc/shadow
root:$6$lvkzBBp4$EL4M3jGWlhVG73hngVOXVO1o3vtTaLIt7uNrlkC1:19201:0:99999:7:::
daemon:*:17379:0:99999:7:::
bin:*:17379:0:99999:7:::
sys:*:17379:0:99999:7:::
sync:*:17379:0:99999:7:::
games:*:17379:0:99999:7:::
man:*:17379:0:99999:7:::
lp:*:17379:0:99999:7:::
mail:*:17379:0:99999:7:::

Method Two:sed -n 'p ' /etc/shadow

root@chengyan-virtual-machine:~# sed -n 'p ' /etc/shadow
root:$6$lvkzBBp4$EL4M3jGWlhVG73hngVOXVO1o3vtTaLIt7uNrlkC1:19201:0:99999:7:::
daemon:*:17379:0:99999:7:::
bin:*:17379:0:99999:7:::
sys:*:17379:0:99999:7:::
sync:*:17379:0:99999:7:::
games:*:17379:0:99999:7:::
man:*:17379:0:99999:7:::
lp:*:17379:0:99999:7:::
mail:*:17379:0:99999:7:::

View the specified line:

root@chengyan-virtual-machine:~# sed -n '3p' /etc/shadow
bin:*:17379:0:99999:7:::

Using regular expressions: match lines starting with root

root@chengyan-virtual-machine:~# sed -n '/^root/p' /etc/shadow
root:$6$lvkzBBp4$EL4M3jGWlhVG73hngVOXVO1o3vtTaLIt7uNrlkC1:19201:0:99999:7:::

View consecutive lines: view the contents of lines 3-6

root@chengyan-virtual-machine:~# sed -n '3,6p' /etc/shadow 
bin:*:17379:0:99999:7:::
sys:*:17379:0:99999:7:::
sync:*:17379:0:99999:7:::
games:*:17379:0:99999:7:::

View the last line of the file:

root@chengyan-virtual-machine:~# sed -n '$p' /etc/shadow
sshd:*:18964:0:99999:7:::

sed delete

Deleting the specified line is not a real deletion. Knowledge will display the deleted result, but it does not really delete the content in the file. If you want to really delete the content in the file, you need to add options -i.

Remove blank lines in text:sed '/^$/d' test.txt

root@chengyan-virtual-machine:~# cat -n test.txt 
     1
     2  1
     3  2
     4  3
     5  4
     6
     7  6
     8  7
     9  8
    10  9
    11
root@chengyan-virtual-machine:~# sed '/^$/d' test.txt 
1
2
3
4
6
7
8
9
root@chengyan-virtual-machine:~# 

Delete specified lines:

root@chengyan-virtual-machine:~# cat -n test.txt 
     1
     2  1
     3  2
     4  3
     5  4
     6
     7  6
     8  7
     9  8
    10  9
    11
root@chengyan-virtual-machine:~# sed '2d' test.txt 

2
3
4

6
7
8
9

root@chengyan-virtual-machine:~# 

sed replacement

command format: sed 指定行 's/需要替换的字符串/替换后的字符串/替换标记'or[address]s/pattern/replacement/flag

flag mark:

  • g: Indicates that all matching lines are to be replaced.

  • w: Save the replaced result to the document.

  • n: 1-512, it means that the specified character string to be replaced will be replaced only when it occurs for a few times.

  • w file: Write the content in the buffer to the specified file.

  • &: Replace with the content matched by the regular expression.

  • \n: Match the nth substring, which was previously \(\)specified in pattern.

  • \: Escape.

Replace test in the file with taget:

root@chengyan-virtual-machine:~# cat test1.txt 
This is a test file to test replace sed command.
root@chengyan-virtual-machine:~# sed 's/test/taget/g' test1.txt 
This is a taget file to taget replace sed command.
root@chengyan-virtual-machine:~# 

The addition of sed

Add below the second line:

root@chengyan-virtual-machine:~# cat test.txt 

1
2
3
4

6
7
8
9

root@chengyan-virtual-machine:~# sed '2a ######' test.txt 

1
######
2
3
4

6
7
8
9

root@chengyan-virtual-machine:~#

awk

principle of awk

Awk is a powerful text analysis tool . Compared with grep search and sed editor, awk is particularly powerful when it analyzes data and generates reports. Simply put, it reads the file line by line, slices each line with spaces as the default delimiter, and performs various analysis and processing on the cut parts.

awk-options

command formatawk [选项] '脚本命令' 文件名

Common options:

  • -F fs: Specifies that fs is used as the delimiter of the input line. The default delimiter of the awk command is a space or a tab.
  • -f file: Read the awk script command from the script file instead of directly inputting the command on the command line.
  • -v var=val: Before executing the process, set a variable var, and give its device an initial value of val.

The power of awk lies in the script command, which consists of two parts, namely matching rules and executing commands.

匹配规则{执行命令}

  • Matching rules are used to specify that script commands can be applied to specific lines in the text content, and can be specified using strings or regular expressions.

  • The entire script command is enclosed in single quotes, and the execution command part needs to be enclosed in curly braces.

root@chengyan-virtual-machine:~# cat test.txt 

1
2
3
4

6
7
8
9

root@chengyan-virtual-machine:~# awk '/^$/{print "Blank line"}' test.txt 
Blank line
Blank line
Blank line
root@chengyan-virtual-machine:~# 

Among them, /^$/is a regular expression whose function is to match blank lines in the text. At the same time, it can be seen that the print command is used to execute the command, and the function of this command is to output the specified text.

One of awk's main features is its ability to process data in text files by automatically assigning a variable to each data element in a line. By default, awk will assign the following variables to the data fields it finds in the text line.

  • $0 represents the entire line of text.
  • $1 represents the first data field in the text line.
  • $2 represents the second data field in the text line.
  • $n represents the nth data field in the text line.

The default field separator of awk is any blank character. In a text line, each data field is separated by a bullet separator. When awk reads a line of text, it will replace each data field with a predetermined field separator.

root@chengyan-virtual-machine:~# cat data.txt 
One line of test txt.
Two lines of test txt.
Three lines of test text.
root@chengyan-virtual-machine:~# awk '{print $1}' data.txt 
One
Two
Three
root@chengyan-virtual-machine:~# 

Only the field variable is used above $1to mean "display only the first data field of each line of text". To read files with other field separators, you can -Fspecify them manually with options.

Awk allows the combination of multiple commands to be called a normal program. To use multiple commands in a program script on the command line, just put a semicolon between the commands.

root@chengyan-virtual-machine:~# echo "My name is Rich" | awk '{$4="Christine";print $0}'
My name is Christine
root@chengyan-virtual-machine:~# awk '{
    
    
> $4="Christine";
> print $0
> }'
My name is Rich
My name is Christine
His name is wanghao
His name is Christine

When the initial single quotation mark is used, the bash shell will use it >to prompt for more data. You can add a command to each line until you have entered the final single quotation mark. Because the file name is not specified in the command line, the awk program requires user input to obtain data, so when running this program, it will always wait for the user to input text. At this time, if you want to exit the program, you only need to enter CTRL+D.

root@chengyan-virtual-machine:~# cat awk.sh 
{
    
    print $1"'s home directory is " $6}

root@chengyan-virtual-machine:~# awk -F : -f awk.sh /etc/passwd
root's home directory is /root
daemon's home directory is /usr/sbin
bin's home directory is /bin
sys's home directory is /dev
sync's home directory is /bin
games's home directory is /usr/games
man's home directory is /var/cache/man

In a script file, multiple commands can be specified, as long as one command is placed on one line.

keywords

BEGIN

In awk, you can also specify the timing of script commands to run. By default, awk will read a line of text from the input, and then execute the program script for the data in this line, but sometimes it may be necessary to run some script commands before processing the data, which is Requires the use of the BEGIN keyword.

The BEGIN keyword will force awk to execute the script command specified after the keyword before reading the data.

root@chengyan-virtual-machine:~# awk 'BEGIN{
    
    print "The data file contents:"}
> {
    
    print $0}' data.txt
The data file contents:
One line of test txt.
Two lines of test txt.
Three lines of test text.
root@chengyan-virtual-machine:~#

This script command is divided into two parts. The script command in the BEGIN part will be run before the awk command processing function, and the second script command is actually used to process data.

END

The END keyword allows specifying some script commands that awk will execute after reading the data.

root@chengyan-virtual-machine:~# awk 'BEGIN{
    
    print "The data file contents:"}
{
    
    print $0}
END{
    
    print "End of file"}' data.txt
The data file contents:
One line of test txt.
Two lines of test txt.
Three lines of test text.
End of file
root@chengyan-virtual-machine:~# 

variable

In the awk script program, it is supported to use variables to access values. Awk supports two different types of variables, namely built-in variables and custom variables.

The built-in variables are created by awk itself and can be directly used by users. These variables are used to store the information for processing certain fields and records in the data file. Custom variables are variables that awk supports users to create.

Common built-in variables include data field variables ( $0,$1,$2,....) and other variables.

Character and record separator variables:

  • FIELDWIDTHS: A column of numbers separated by spaces, defining the exact width of each data field.
  • FNR: The record number of the current input document, often used when there are multiple input documents.
  • NR: The current record number of the input stream.
  • FS: Enter the field separator.
  • RS: Enter the record separator, the default is newline \n.
  • OFS: output field delimiter, default is space.
  • ORS: output field delimiter, the default is newline \n.

Environment information variables:

  • ARGC: The number of command line parameters.
  • ARGIND: The position of the current file in ARGC.
  • ARGV: An array containing command-line arguments.
  • CONVFMT: Number conversion format, the default value is %.6g.
  • ENVIRON: An associative array of current shell environment variables and their values.
  • ERRNO: The system error number when an error occurs while reading or closing the input file.
  • FILENAME: The name of the current input document.
  • FNR: The number of data rows in the current data file.
  • IGNORECASE: When set to a non-zero value, ignore the string size of the string appearing in the awk command.
  • NF: Total number of fields in the data file.
  • OFMT: Number output format, the default value is %.6g.
  • RLENGTH: The length of the substring matched by the match function.
  • TSTART: The starting position of the substring matched by the match function.

FS/OFS

The variables FS and OFS define how awk handles the data fields in the data stream.

root@chengyan-virtual-machine:~# cat data.txt 
data11,data12,data13,data14,data15
data21,data22,data23,data24,data25
data31,data32,data33,data34,data35
root@chengyan-virtual-machine:~# awk 'BEGIN{
    
    FS=",";OFS="-"}{
    
    print $1,$2,$3}' data.txt 
data11-data12-data13
data21-data22-data23
data31-data32-data33
root@chengyan-virtual-machine:~# awk 'BEGIN{
    
    FS=",";OFS="--"}{
    
    print $1,$2,$3}' data.txt 
data11--data12--data13
data21--data22--data23
data31--data32--data33
root@chengyan-virtual-machine:~# 

FIELDWIDTHS

The FIELDWIDTHS variable allows the user to read records without relying on field separators. If the data does not have a separator set, it is placed in a specific column. In this case, the FIELDWIDTHS variable must be set to match the position of the data in the record. Once the FIELDWIDTH variable is set, awk ignores the FS variable and calculates the fields based on the field width provided.

root@chengyan-virtual-machine:~# cat data1.txt 
1005.3247596.37
115-2.349194.00
05810.1298100.1
root@chengyan-virtual-machine:~# awk 'BEGIN{
    
    FIELDWIDTHS="3 5 2 5"}{
    
    print $1,$2,$3,$4}' data1.txt 
1005.3247596.37   
115-2.349194.00   
05810.1298100.1   
root@chengyan-virtual-machine:~# 

Once the value of the FIELDTHS variable is set, it cannot be changed, so it does not apply to variable-length fields.

RS/ORS

The variables RS and ORS define how the awk program handles the fields in the data stream. By default, awk sets RS and ORS as newline characters. The default RS value indicates that each new line of text in the input data stream is a new record.

root@chengyan-virtual-machine:~# cat data2.txt 
Riley Mullen
123 Main Street
Chicago,IL 60601
(312)555-1234

Frank Wiliams
456 Oak Street
Indianapolis,IN 46201
(317)555-9876

Haley Snell
4231 Elm Street
Detroit,MI 48201
(313)555-4938
root@chengyan-virtual-machine:~# awk 'BEGIN{FS="\n";RS=""}{print $1,$4}' data2.txt 
Riley Mullen (312)555-1234
Frank Wiliams (317)555-9876
Haley Snell (313)555-4938
root@chengyan-virtual-machine:~#

FNR/NR

The FNR variable contains the number of processed records in the current data file, and the NR variable contains the total number of processed records.

root@chengyan-virtual-machine:~# cat data.txt 
data11,data12,data13,data14,data15
data21,data22,data23,data24,data25
data31,data32,data33,data34,data35
root@chengyan-virtual-machine:~# awk '
> BEGIN{
    
    FS=","}
> {
    
    print $1, "FNR="FNR, "NR="NR}
> END{
    
    print "There were",NR,"records processed"}' data.txt data.txt
data11 FNR=1 NR=1
data21 FNR=2 NR=2
data31 FNR=3 NR=3
data11 FNR=1 NR=4
data21 FNR=2 NR=5
data31 FNR=3 NR=6
There were 6 records processed
root@chengyan-virtual-machine:~# 

It can be found that when using a data file as input, the values ​​​​of FNR and NR are the same. If multiple files are used as input at the same time, the value of FNR will be reset when each data file is processed, and the value of NR will be Continue counting until all data files have been processed.

Guess you like

Origin blog.csdn.net/qq_41323475/article/details/127893816