Linux Bash Shell Programming (7): String interception and processing (cut, printf, awk, sed, sort) with examples
In the previous section , we learned about the basic functions and usage of regular expressions. In this section, we will study string interception, formatted output, and string processing commands
cut command
The cut command is a string interception command in Bash, which can cut out a few columns of a file with a uniform separator (arbitrary) in a line
cut [options] <filename>
Options | Description |
---|---|
-b | Select only the specified bytes |
-c | Select only the specified characters |
-f | Only select these specified domains |
-d | Use the specified delimiter (the default is a tab tab character) (used in -f mode) The delimiter must be a single character |
-s | Do not display lines that do not contain delimiters (used in -f mode), displayed by default |
- Among them, the first three items are mandatory and mutually exclusive options (choose one of three) , which
-d
means separated by bytes,-c
means separated by characters,-f
means separated by delimiters (form a domain that may contain multiple characters or empty space), generally Domain separation is more common - If a line does not contain a delimiter, the command will change the line and output the entire line, unless the
-s
option is specified -f
The field number parameter is required after the option, which can be multiple lines (separated by commas) or a set (ab)-d
The option specifies the delimiter, which needs to be enclosed in single quotes and can be a space- The cut command may have limitations, but the syntax is simpler and easy to implement
Example:
#截取用户配置文件passwd文件中,所有用户名以及对应UID
cut -s -d ":" -f 1,3 passwd
root:0
daemon:1
bin:2
#文件内容仅截取部分
#截取passwd文件中,所有组的附加用户
cut -d ":" -f 1,4 gshadow
root:
daemon:
bin:
cdrom:zheng
floppy:zheng
#文件内容仅截取部分
#截取所有非root非系统用户
zheng@Kali:~/temp$ grep "/bin/bash" /etc/passwd | cut -d ":" -f 1
root
postgres #这个用户是某些服务需要的管理用户,同样具有可bash登录特性,需要额外排除
zheng
test
printf command
printf is a formatted output command of bash, printf can also be used as standard output in the awk command, which is used to output string\digital content that allows a defined format. The syntax is similar to the printf formatted output command in C language
printf '<输出类型><输出格式>' <输出内容>
#输出类型字符串中同样可以加入字符串用于说明,输出内容可以专注于变量等
- The output type and output format need to be enclosed in single quotes
- The output content is generally numbers, variables, etc., separated by spaces
Output type | Description |
---|---|
%ns | Output string, n is a number, which means output several characters (n can be omitted) |
% ni | Output an integer, n is a number, which means to output several numbers (n can be omitted) |
% m.nf | Output floating-point numbers, m and n are numbers, indicating the number of output digits (including integers and decimals) and the number of decimal places%4.3f means outputting a number with one integer and four decimal places |
- The output format of the printf command is
echo
exactly the same as that of the command. I won’t repeat it here. Please refer to Linux Bash Shell Programming (1): Shell Overview and Hello World Implementation
Example:
#将输出内容识别为字符串类型输出
printf '%s' 1 2 as 12 3
12as123 #此行后没有换行符,直接开始下一行
printf '%s\n' 1 2 as 12 #按字符串输出,并且每个输出内容后增加换行符
1
2
as
12
#按字符串输出,并且三个一组增加空格和换行
printf '%s %s %s\n' 1 2 as 12 4 3
1 2 as
12 4 3
#printf的输出格式中也可以增加一些文字内容,后面的内容专注于变量输出
printf 'Hello, %s\n' "Zheng"
Hello, Zheng
awk command
Compared with the cut command, the awk command is more powerful. It can intercept character strings separated by spaces of different lengths, and perform functions such as function programming, conditional judgment, and flow control on the character strings. But at the same time, its language structure is more complicated than cut, similar to a programming language.
awk 'pattern1{action1}pattern2{action2}...' <filename>
-
pattern: condition, generally a relational expression (for example, x>1), can be empty, the default does not pass the conditional judgment, all actions are executed
-
action: Action, which can be formatted output (awk supports printf, print) commands or flow control statements
-
The awk command still processes the input by line
-
printf
print
The difference between the command and the command is that the latter automatically adds a newline after the end of the output, while the former does not -
After the awk command reads the line string, it separates the content by the separator (if there are multiple spaces, it can also be separated), and it is
$n
expressed by using , n is a number, which$0
means the whole line content,$1
means the first column,$2
means the second column, And so on -
The awk command provides a pre-made variable
FS
as a separator. The command is correcttab
andspace
valid by default , but if it is other symbols, it needs to be preset, usually usingBEGIN
condition presets -
The BEGIN condition, as a
pattern
use, declares theBEGIN
condition toaction
be executed before the awk command reads the first line of string, and can perform commands that need to be executed once in advance -
END condition, usage is the same as BEGIN, executed once after all content is read
For example: it is necessary to define the delimiter before intercepting the passwd variable,
awk 'BEGIN{FS=":";print "Begin"}END{print "End"}{print $1 "\t" $3}' /etc/passwd Begin root 0 daemon 1 bin 2 sys 3 End #可以看到,分界符在一开始(未读取数据前)就被定义,正常截取并输出了第一行 #但如果没有使用 BEGIN 条件,而是将分界符定义与格式化输出放在一起 awk '{FS=":";print $1 "\t" $3}' /etc/passwd root:x:0:0:root:/root:/bin/bash daemon 1 bin 2 sys 3 #在定义分界符之前,第一行数据就已经被读入,无法对第一行数据重新截取,导致整行输出
Example:
#下面的awk命令示例没有条件仅有动作
df -h | awk '{printf $1 "\t" $5 "\t" $6 "\n"}'
文件系统 已用% 挂载点
udev 0% /dev
tmpfs 1% /run
/dev/sda5 38% /
tmpfs 0% /dev/shm
tmpfs 0% /run/lock
tmpfs 0% /sys/fs/cgroup
/dev/sda1 28% /boot
tmpfs 1% /run/user/1000
The following example provides a function to check the occupancy of the file system mounted on the root directory and alarm when it is too high
First of all, now use the pipe character in the command line to gradually intercept the required occupancy content
#原理:使用df命令查看文件系统占用信息
df
#筛选出需要的根目录挂载信息,每一行以挂载位置结尾,根目录仅有"\",可以作为判断依据
df | grep "/$"
#正则表达式内容见上一节
#接下来得到一行信息,该信息以space分隔,需要使用awk截取命令,获得第五列信息
df | grep "/$" | awk '{print $5}'
#接下来,需要将百分号去掉,仅需要一个数字
df | grep "/$" | awk '{print $5}' | cut -d "%" -f 1
#由于占用率可能是一位或两位,稳妥方法使用域截取
#命令结果是需要的正确信息
df | grep "/$" | awk '{print $5}' | cut -d "%" -f 1
38
After getting the occupancy information we need, write it into the script to compare the size,
#以下是脚本df中内容
#!/bin/bash
#Author:Zheng
declare -i a
a=$(df | grep "/$" | awk '{print $5}' | cut -d "%" -f 1)
if [ $a -lt 80 ]; then #条件判断语句在后面内容中会讲到
echo "Storage space normal" #如果a小于80
else
echo "Warning:Not enough storage space" #如果a大于80
fi
echo -e "root storage used $a%"
Get the effect:
0zheng@Kali:~/Shell$ ./df.sh
Storage space normal
root storage used 38%
There are many other functions of the awk command (such as process control, functional programming, etc.). Due to space limitations, we will not discuss it in more depth. If you are interested, you can check other information
sed command
sed is a lightweight stream editor included in almost all UNIX platforms (can accept data streams from pipes). Sed can select, replace, delete, and add data
sed [选项] {
脚本} [文件]
Options | Description |
---|---|
-n | Silent output (all data will be output to the screen by default), only the lines processed by the sed command will be output to the screen |
-i | Use the modified result of sed to directly modify the file that reads in the data instead of outputting it on the screen |
action | Description |
---|---|
a | Append, add any line after the current line, except for the last line, you need to add "\" at the end of each line to indicate that the data is not over |
c | Line replacement, replace the original data line with the character string after c. When replacing multiple lines, add "\" at the end of each line except the last line to indicate that the data is not over |
i | Insert, insert any row before the current row, add "" for multiple rows |
d | Delete the specified row |
p | Print, output the specified line |
s | String replacement, replace another string with one string, the format is "line range s/old string/new string/g" (similar to vim) |
- It is generally recommended to enable the
-n
option when outputting , otherwise the command will re-output all the lines read - Append, insert, and replace lines if you modify multiple lines, insert the first line of content after the action with a space, and then use a backslash to enter to continue inserting the following content
Example:
#以下是示例文件b中内容
ID Name gender Mark
1 LiHua M 86
2 HZ M 90
3 Cooper M 89
#下面开始测试
#测试1:追加动作a(多行)
sed '4a End\
> Hello World' b
ID Name gender Mark
1 LiHua M 86
2 HZ M 90
3 Cooper M 89
End
Hello World
#测试2:行替换命令
sed '4c Cooper Absent\
> End' b
ID Name gender Mark
1 LiHua M 86
2 HZ M 90
Cooper Absent
End
#测试3:插入命令
sed '1i Test Results' b
Test Results
ID Name gender Mark
1 LiHua M 86
2 HZ M 90
3 Cooper M 89
#测试4:删除行命令
sed '2,4d' b #注意,逗号表示行范围的始末,非单独行
ID Name gender Mark
#测试4:输出指定的行
sed -n '3p' b
2 HZ M 90
#测试5:字符串替换
sed '4s/M/F/g' b
ID Name gender Mark
1 LiHua M 86
2 HZ M 90
3 Cooper F 89
sort command
The sort command sorts the string lines in a certain order
sort [options] [filename]
Options | Description |
---|---|
-f | Ignore case |
-n | Sort by numeric type (default string type) |
-r | Reverse order |
-t | According to the specified delimiter (default tab tab) |
-k n,m | Sort according to the specified field range, starting from n field and ending with m (default to the end of the line) |
- The specified field of the -k option refers to a column, and a single field can be specified (-kn)
Example:
#以下是passwd原文件前几行内容
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
#按用户名字符串排序
sort /etc/passwd
_apt:x:100:65534::/nonexistent:/usr/sbin/nologin
avahi:x:124:129:Avahi mDNS daemon,,,:/run/avahi-daemon:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
#按组ID排序(需要指定分隔符,且排序依据为数字型)
sort -t ":" -k 4 -n /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
index
The next section, Linux Bash Shell Programming (8): Conditional Judgment and Examples We will start to learn the conditional judgment and flow control statements in Bash
The previous section, Linux Bash Shell Programming (6): Application examples of basic metacharacters in regular expressions