Shell regular expression editing Three Musketeers of sed, awk

Shell regular expression editing Three Musketeers of sed, awk

One, sed editor

1. The definition of sed

(1) Sed is a stream editor, the stream editor will edit the data stream based on a set of rules provided in advance before the editor processes the data.

(2) The sed editor can process the data in the data stream according to commands, which are either input from the command line or stored in a command text file.

2. The workflow of sed mainly includes three processes of reading, executing and displaying:

(1) Read: sed reads a line of content from the input stream (file, pipe, standard input) and stores it in a temporary buffer (also known as pattern space).

(2) Execution: By default, all sed commands are executed sequentially in the pattern space. Unless the row address is specified, the sed command will be executed sequentially on all rows.

(3) Display: Send the modified content to the output stream. After sending the data, the pattern space will be emptied. Before all the file contents are processed, the above process will be repeated until all the contents are processed.

Note: By default, all sed commands are executed in the pattern space, so the input file will not change in any way, unless redirection is used to store the output.

3. The sed command format:

sed -e'operation' file 1 file 2…

sed -n -e'operation' file 1 file 2…

sed -f script file file 1 file 2…

sed -i -e'operation' file 1 file 2…

sed -e 'n {

Operation 1

Operation 2

}'File 1 File 2…

4. Common options for sed:

-e or -expression=: It means to use the specified command to process the input text file. It can be omitted when there is only one operation command, and it is generally used when executing multiple operation commands.

-f or -file=: Indicates the script file specified by the user to process the input text file.

-h or -help: display help.

-n, -quiet or silent: Disable the output of the sed editor, but can be used with the p command to complete the output.

-i: directly modify the target text file.

5. Common operations:

s: Replace, replace the specified character.

d: Delete, delete the selected row.

a: Increase, add a line of specified content below the current line.

i: Insert, insert a line of specified content above the selected line.

c: Replace, replace the selected line with the specified content.

y: character conversion, the length of characters before and after conversion must be the same.

p: Print. If you specify a line at the same time, it means to print the specified line; if you don't specify a line, it means to print all the content; if there are non-printable characters, it will be output as ASCII. It is usually used with the "-n" option.

=: Print the line number.

l (lowercase L): print the text in the data stream and non-printable ASCII characters (such as the terminator $, the tab character \t)

6. Print content

(1) Print content: sed -n -e'p' testfile1
Insert picture description here
(2) Print line number: sed -n -e'=' testfile1

Insert picture description here
(3) Print line number and content
Insert picture description here
Insert picture description here
(4) Print ASCII characters
Insert picture description here

7. Use address

The sed editor has 2 addressing modes:

(1) Express the interval in number form
Insert picture description here
Insert picture description here
(2) Use text mode to filter rows

sed -n'/user/p' /etc/passwd—————— Print the user line in the /etc/passwd file

sed -n'/^a/p' /etc/passwd—————— Print the line starting with a in the /etc/passwd file

sed -n'/bash$/p' /etc/passwd ———————— Print the line ending with bash in the /etc/passwd file

sed -n'/ftp|root/p' /etc/passwd———————— Print the line containing ftp or root in the etc/passwd file

sed -n '2,/nobody/p' /etc/passwd ———————— Print from the second line to the first line containing nobody

sed -n '2,/nobody/=' /etc/passwd ————————— Print from the second line to the first line number containing nobody

sed -nr'/ro{1,}t/p' /etc/passwd————-r means that regular expressions are supported, print out etc/passwd file matching o appears more than once

8. Delete rows

Insert picture description here

[root@localhost ~]#sed'/^ KaTeX parse error: Expected'EOF', got'#' at position 19:… testfile1 # ̲delete blank lines [root@loca… /d' testfile1 #Because of all the previous printing The contents of are not directly modified files. If you want to modify the contents of the file, you can use the -i operation to modify directly

[root@localhost ~]#sed'/nologin$/d' /etc/passwd #Delete the line ending with nologin

[root@localhost ~]#sed'/nologin$/!d' /etc/passwd #! Represents the inversion operation, that is, delete all lines except nologin

[root@localhost ~]#sed'/2/,/3/d' testfile2 #Open the line delete function from the first position, turn off the line delete function from the second position, that is, from the first position with characters The line 2 starts to be deleted, and it continues to the line with the character 3

9. Replace

Format: line range s/old string/new string/replacement mark

4 types of replacement tags:

Number: Indicates the number of matches that the new string will replace

g: Indicates that the new string will replace all matches

p: Print the line matching the replace command, used with -n

w file: write the result of the replacement to the file

sed -n's/root/admin/p' /etc/passwd ———————— Replace root with admin The default first root

sed -n's/root/admin/2p' /etc/passwd———————— Replace the second root of each line with admin

sed -n's/root/admin/gp' /etc/passwd ———————— Replace all roots with admin

sed's/root//g' /etc/passwd—————————— Replace all roots with empty, which means to delete

sed '1,20 s/^/#/' /etc/passwd———————— add # sign at the beginning of lines 1-20

sed'/^root/ s/$/#/' /etc/passwd———————— Find the line starting with root and add # at the end

sed -f script.sed testfile2———————— Use the commands in script.sed with the following commands

sed '1,20w out.txt' /etc/passwd———————— Save 1-20 lines of output to the out.txt file
sed '1,20 s/^/#/w out.txt' /etc/passwd —————— output to the out.txt file with the # sign at the beginning of lines 1-20

sed -n's//bin/bash//bin/csh/p' /etc/passwd
sed -n's!/bin/bash!/bin/csh!p' /etc/passwd—————— ——Use "!" as the string separator and replace bash with csh

Insert:
sed'/45/c ABC' testfile2———————— The line containing 45 is found, and the entire line is replaced with ABC

sed'/45/ y/45/AB/' testfile2 ———————— (just understand)

sed '1,3a ABC' testfile2———————————— Insert ABC under lines 1-3

sed '1i ABC' testfile2———————————— Add ABC to 1 line

sed '5r /etc/resolv.conf' testfile2———————— Import the /etc/resolv file into the fifth line of testfile2

sed'/root/{H;d};$G' /etc/passwd—————— Cut the line containing root to the end, H means copy to the clipboard, G means paste to the specified line

sed '1,2H;3,4G' /etc/passwd—————— Copy lines 1-2 below lines 3-4

Two, awk editor

In the linux/unix system, awk is a powerful editing tool. It reads the input text line by line and searches according to the specified matching mode. It formats and filters the content that meets the conditions. It can be used in non-interactive Under the circumstances, quite complex text operations are realized, which are widely used in shell scripts to complete various automated configuration tasks.

1. Working principle:

Read text line by line, separated by space or tab by default, save the separated fields to built-in variables, and execute editing commands according to the mode or condition.

2. The difference between awk and sed editor:

The sed command is often used to process a whole line, while awk tends to divide a line into multiple "fields" and then process it. The reading of awk information is also read line by line, and the execution result can be printed and displayed with the field data through the print function. In the process of using the awk command, you can use the logical operators "&&" to mean "and", "||" to mean "or", and "!" to mean "not"; you can also perform simple mathematical operations, such as +,- , *, /, %, ^ represent addition, subtraction, multiplication, division, remainder and power respectively.

3. Command format:

awk option'mode or condition {operation}' file 1 file 2…

awk -f script file file 1 file 2…

4. Common built-in variables of awk (can be used directly):

FS: Column separator. Specify the field separator for each line of text, the default is a space or tab. It has the same effect as "-F".

NF: The number of fields in the row currently being processed.

NR: The row number of the row currently being processed.

$0: The entire line content of the currently processed line.

$n: The nth field (nth column) of the currently processed row.

FILENAME: The name of the file being processed.

RS: Line separator. When awk reads data from a file, it will cut the data into many records according to the definition of RS, while awk only reads one record at a time for processing. The default value is'\n'

5. Example:

(1) Output text by line

awk'{print}' testfile1 #output all content

awk'{print $0}' testfile1 #Output all content
Insert picture description here
Output specified line content
Insert picture description here
Output odd and even line content
Insert picture description here
Output the content of lines beginning with and ending
Insert picture description here
with... Count the number of lines ending with...

awk'BEGIN {x=0};//bin/bash KaTeX parse error: Expected'EOF', got'#' at position 35: …}' /etc/passwd # ̲Statistics ending with /bin/bash... " /etc/passwd

(2) Output text by field

awk -F ":"'{print $3}' /etc/passwd #output the third field in each line (separated by ":")

awk -F ":"'{print $1,$3}' /etc/passwd # output the first and third fields in each line (separated by ":")

awk -F “:”'$3<5{print $1,$3}' /etc/passwd # output the first and third fields of the line where the value of the third field is less than 5

awk -F “:”'!($3<200){print}' /etc/passwd # output the content of the line where the value of the third field is not less than 200

awk'BEGIN {FS=":"};{if ($3>=200){print}}' /etc/passwd #Process the content in BEGIN first (change the column separator to:) and then print the content in the text (If the value of the third paragraph is greater than or equal to 200, it will be output)

awk -F ":"'{max=($3>$4)?$3:$4;{print max}}' /etc/passwd
#($3>$4)?$3:$4 is the ternary operator, if the third If the value of the field is greater than the value of the fourth field, the value of the third field is assigned to max, otherwise the value of the fourth field is assigned to max

awk -F ":"'{print NR,$0}' /etc/passwd # output the content and line number of the inner line, if a record is not processed, the NR value (the line number of the currently processed line) plus 1

awk -F ":"'$7~"/bash"{print $1}' /etc/passwd #Output is the first field of the line separated by colons and the seventh field contains /bash

awk -F ":"'($1~"root")&&(NF==7){print $1,$2}' /etc/passwd # output the first field of the line that contains root and has 7 fields 1, 2 fields (NF: the number of fields in the row currently being processed)

awk -F “:”'($7!="/bin/bash")&&($7!="/sbin/nologin"){print}' /etc/passwd #The seventh field of output is not /bin/bash , Not all lines in /sbin/nologin

(3) Invoking shell commands through pipe symbols and double quotes

echo $PATH | awk'BEGIN{RS=":"};END{print NR}' #Count the number of text paragraphs separated by colons. END{} statements are often placed in statements such as print results

awk -F “:”'/bash KaTeX parse error: Expected'EOF', got'#' at position 36: …/etc/passwd # ̲Call the wc -l command for statistics use… "/etc/passwd

free -m | awk'/Mem:/ {print int($3/($3+$4)*100)"%"}' #View the current memory usage percentage (int refers to the character type, here stands for the integer type, that is, there is no decimal point)

top -b -n 1 | grep Cpu | awk -F',''{print $4}' | awk'{print $1}'
#View the current CPU idle rate, (-b -n 1 means only 1 output is required Result) The
whole sentence command means: dynamically output the result of a process (top -b -n 1); filter out the Cpu line (grep Cpu); separate it with a comma and print out the fourth column (awk -F',' '{print $4}'); Then print out the first value of the fourth column filtered out (awk'{print $1}')

date -d "$(awk -F "."'{print $1}' /proc/uptime) second ago" +"%F %H:%M:%S"
#Display the time of the last system restart, which is equivalent to uptime: second ago is the time that shows how many seconds ago, +"%F %H:%M:%S" is equivalent to +"%Y-%m-%d %H:%M:%S" time format

awk'BEGIN {n=0; while ("w" | getline) n++; {print n-2}}'
#Call the w command and use it to count the number of online users; the w command can get the detailed information of the currently online users; getline is to fetch lines; printing n-2 lines is because the first two lines of the information displayed by w are useless, so the first two lines are removed.
[root@gcc zhengze1]
#w 15:41:01 up 19:50, 1 user, load average: 0.00, 0.01, 0.05
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
root pts/0 192.168.200.1 11:19 5.00s 0.12s 0.01sw

awk'BEGIN {"hostname" | getline; {print $0}}' #call the hsotname command and output the current

When there is no redirection "<" or "|" on the left and right sides of getline, awk first reads the first line, which is 1, and then getline, and gets the second line below 1, which is 2, because after getline, awk will Change the corresponding internal variables such as NF, NR, FNR, and $0, so the value of $0 at this time is no longer 1, but 2, and then print it out.
When there are redirection characters "<" or "|" on the left and right of getline, getline acts on the directional input file. Since the file is just opened and has not been read into a line by awk, only getline reads it, then getline returns this The first line of the file, not every other line.

seq 10 | awk'(getline; print $0)' #can get even lines
seq 10 | awk'{print $0; getline}' #can get odd lines

Guess you like

Origin blog.csdn.net/tefuiryy/article/details/111869874