Linux Bash Shell Programming (7): String interception and processing (cut, printf, awk, sed)

Linux Bash Shell Programming (7): String interception and processing (cut, printf, awk, sed, sort) with examples

  In the previous section , we learned about the basic functions and usage of regular expressions. In this section, we will study string interception, formatted output, and string processing commands

Shell7

cut command

The cut command is a string interception command in Bash, which can cut out a few columns of a file with a uniform separator (arbitrary) in a line

cut [options] <filename>
Options Description
-b Select only the specified bytes
-c Select only the specified characters
-f Only select these specified domains
-d Use the specified delimiter (the default is a tabtab character) (used in -f mode) The delimiter must be a single character
-s Do not display lines that do not contain delimiters (used in -f mode), displayed by default
  • Among them, the first three items are mandatory and mutually exclusive options (choose one of three) , which -dmeans separated by bytes, -cmeans separated by characters, -fmeans separated by delimiters (form a domain that may contain multiple characters or empty space), generally Domain separation is more common
  • If a line does not contain a delimiter, the command will change the line and output the entire line, unless the -soption is specified
  • -fThe field number parameter is required after the option, which can be multiple lines (separated by commas) or a set (ab)
  • -dThe option specifies the delimiter, which needs to be enclosed in single quotes and can be a space
  • The cut command may have limitations, but the syntax is simpler and easy to implement

Example:

#截取用户配置文件passwd文件中,所有用户名以及对应UID
cut -s -d ":" -f 1,3 passwd
root:0
daemon:1
bin:2
#文件内容仅截取部分

#截取passwd文件中,所有组的附加用户
cut -d ":" -f 1,4 gshadow
root:
daemon:
bin:
cdrom:zheng
floppy:zheng
#文件内容仅截取部分

#截取所有非root非系统用户
zheng@Kali:~/temp$ grep "/bin/bash" /etc/passwd | cut -d ":" -f 1
root
postgres #这个用户是某些服务需要的管理用户,同样具有可bash登录特性,需要额外排除
zheng
test

printf command

printf is a formatted output command of bash, printf can also be used as standard output in the awk command, which is used to output string\digital content that allows a defined format. The syntax is similar to the printf formatted output command in C language

printf '<输出类型><输出格式>' <输出内容>
#输出类型字符串中同样可以加入字符串用于说明,输出内容可以专注于变量等
  • The output type and output format need to be enclosed in single quotes
  • The output content is generally numbers, variables, etc., separated by spaces
Output type Description
%ns Output string, n is a number, which means output several characters (n can be omitted)
% ni Output an integer, n is a number, which means to output several numbers (n can be omitted)
% m.nf Output floating-point numbers, m and n are numbers, indicating the number of output digits (including integers and decimals) and the number of decimal places%4.3f means outputting a number with one integer and four decimal places

Example:

#将输出内容识别为字符串类型输出
printf '%s' 1 2 as 12 3
12as123 #此行后没有换行符,直接开始下一行

printf '%s\n' 1 2 as 12 #按字符串输出,并且每个输出内容后增加换行符
1
2
as
12

#按字符串输出,并且三个一组增加空格和换行
printf '%s %s %s\n' 1 2 as 12 4 3
1 2 as
12 4 3

#printf的输出格式中也可以增加一些文字内容,后面的内容专注于变量输出
printf 'Hello, %s\n' "Zheng"
Hello, Zheng

awk command

  Compared with the cut command, the awk command is more powerful. It can intercept character strings separated by spaces of different lengths, and perform functions such as function programming, conditional judgment, and flow control on the character strings. But at the same time, its language structure is more complicated than cut, similar to a programming language.

awk 'pattern1{action1}pattern2{action2}...' <filename>
  • pattern: condition, generally a relational expression (for example, x>1), can be empty, the default does not pass the conditional judgment, all actions are executed

  • action: Action, which can be formatted output (awk supports printf, print) commands or flow control statements

  • The awk command still processes the input by line

  • printfprintThe difference between the command and the command is that the latter automatically adds a newline after the end of the output, while the former does not

  • After the awk command reads the line string, it separates the content by the separator (if there are multiple spaces, it can also be separated), and it is $nexpressed by using , n is a number, which $0means the whole line content, $1means the first column, $2means the second column, And so on

  • The awk command provides a pre-made variable FSas a separator. The command is correct taband spacevalid by default , but if it is other symbols, it needs to be preset, usually using BEGINcondition presets

  • The BEGIN condition, as a patternuse, declares the BEGINcondition to actionbe executed before the awk command reads the first line of string, and can perform commands that need to be executed once in advance

  • END condition, usage is the same as BEGIN, executed once after all content is read

    For example: it is necessary to define the delimiter before intercepting the passwd variable,

    awk 'BEGIN{FS=":";print "Begin"}END{print "End"}{print $1 "\t" $3}' /etc/passwd
    Begin
    root    0
    daemon  1
    bin     2
    sys     3
    End
    #可以看到,分界符在一开始(未读取数据前)就被定义,正常截取并输出了第一行
    
    #但如果没有使用 BEGIN 条件,而是将分界符定义与格式化输出放在一起
    awk '{FS=":";print $1 "\t" $3}' /etc/passwd
    root:x:0:0:root:/root:/bin/bash
    daemon  1
    bin     2
    sys     3
    #在定义分界符之前,第一行数据就已经被读入,无法对第一行数据重新截取,导致整行输出
    

Example:

#下面的awk命令示例没有条件仅有动作
df -h | awk '{printf $1 "\t" $5 "\t" $6 "\n"}' 
文件系统        已用%   挂载点
udev    0%      /dev
tmpfs   1%      /run
/dev/sda5       38%     /
tmpfs   0%      /dev/shm
tmpfs   0%      /run/lock
tmpfs   0%      /sys/fs/cgroup
/dev/sda1       28%     /boot
tmpfs   1%      /run/user/1000

The following example provides a function to check the occupancy of the file system mounted on the root directory and alarm when it is too high

First of all, now use the pipe character in the command line to gradually intercept the required occupancy content

#原理:使用df命令查看文件系统占用信息
df

#筛选出需要的根目录挂载信息,每一行以挂载位置结尾,根目录仅有"\",可以作为判断依据
df | grep "/$"
#正则表达式内容见上一节

#接下来得到一行信息,该信息以space分隔,需要使用awk截取命令,获得第五列信息
df | grep "/$" | awk '{print $5}' 

#接下来,需要将百分号去掉,仅需要一个数字
df | grep "/$" | awk '{print $5}' | cut -d "%" -f 1
#由于占用率可能是一位或两位,稳妥方法使用域截取

#命令结果是需要的正确信息
df | grep "/$" | awk '{print $5}' | cut -d "%" -f 1
38

After getting the occupancy information we need, write it into the script to compare the size,

#以下是脚本df中内容

#!/bin/bash

#Author:Zheng

declare -i a
a=$(df | grep "/$" | awk '{print $5}' | cut -d "%" -f 1)
if [ $a -lt 80 ]; then #条件判断语句在后面内容中会讲到
        echo "Storage space normal" #如果a小于80
else
        echo "Warning:Not enough storage space" #如果a大于80
fi
echo -e "root storage used $a%"

Get the effect:

0zheng@Kali:~/Shell$ ./df.sh 
Storage space normal
root storage used 38%

There are many other functions of the awk command (such as process control, functional programming, etc.). Due to space limitations, we will not discuss it in more depth. If you are interested, you can check other information

sed command

sed is a lightweight stream editor included in almost all UNIX platforms (can accept data streams from pipes). Sed can select, replace, delete, and add data

sed [选项] {
    
    脚本} [文件]
Options Description
-n Silent output (all data will be output to the screen by default), only the lines processed by the sed command will be output to the screen
-i Use the modified result of sed to directly modify the file that reads in the data instead of outputting it on the screen
action Description
a Append, add any line after the current line, except for the last line, you need to add "\" at the end of each line to indicate that the data is not over
c Line replacement, replace the original data line with the character string after c. When replacing multiple lines, add "\" at the end of each line except the last line to indicate that the data is not over
i Insert, insert any row before the current row, add "" for multiple rows
d Delete the specified row
p Print, output the specified line
s String replacement, replace another string with one string, the format is "line range s/old string/new string/g" (similar to vim)
  • It is generally recommended to enable the -noption when outputting , otherwise the command will re-output all the lines read
  • Append, insert, and replace lines if you modify multiple lines, insert the first line of content after the action with a space, and then use a backslash to enter to continue inserting the following content

Example:

#以下是示例文件b中内容
ID      Name    gender  Mark
1       LiHua   M       86
2       HZ      M       90
3       Cooper  M       89

#下面开始测试
#测试1:追加动作a(多行)
sed '4a End\
> Hello World' b
ID      Name    gender  Mark
1       LiHua   M       86
2       HZ      M       90
3       Cooper  M       89
End
Hello World
#测试2:行替换命令
sed '4c Cooper Absent\
> End' b
ID      Name    gender  Mark
1       LiHua   M       86
2       HZ      M       90
Cooper Absent
End
#测试3:插入命令
sed '1i Test Results' b
Test Results
ID      Name    gender  Mark
1       LiHua   M       86
2       HZ      M       90
3       Cooper  M       89
#测试4:删除行命令
sed '2,4d' b #注意,逗号表示行范围的始末,非单独行
ID      Name    gender  Mark
#测试4:输出指定的行
sed -n '3p' b
2       HZ      M       90
#测试5:字符串替换
sed '4s/M/F/g' b
ID      Name    gender  Mark
1       LiHua   M       86
2       HZ      M       90
3       Cooper  F       89

sort command

The sort command sorts the string lines in a certain order

sort [options] [filename]
Options Description
-f Ignore case
-n Sort by numeric type (default string type)
-r Reverse order
-t According to the specified delimiter (default tab tab)
-k n,m Sort according to the specified field range, starting from n field and ending with m (default to the end of the line)
  • The specified field of the -k option refers to a column, and a single field can be specified (-kn)

Example:

#以下是passwd原文件前几行内容
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync

#按用户名字符串排序
sort /etc/passwd
_apt:x:100:65534::/nonexistent:/usr/sbin/nologin
avahi:x:124:129:Avahi mDNS daemon,,,:/run/avahi-daemon:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin

#按组ID排序(需要指定分隔符,且排序依据为数字型)
sort -t ":" -k 4 -n /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin

index

The next section, Linux Bash Shell Programming (8): Conditional Judgment and Examples We will start to learn the conditional judgment and flow control statements in Bash

The previous section, Linux Bash Shell Programming (6): Application examples of basic metacharacters in regular expressions

Guess you like

Origin blog.csdn.net/Zheng__Huang/article/details/108015558