Linux - shell text processing

        1.cut command

 1.1 Command Format

1.2 Example command

2.awk command

2.1printf formatted output

2.2 demo data

2.3awk command format

2.4awk

2.5 Examples

3.sed command

3.1 Syntax

4.sort command

4.1 Command Format

  4.2 Test Sample


1.cut command

cut default separator is a tab command, which is "tab" key

 1.1 Command Format

cut [选项] 文件名 
-f 列号:	提取第几列
-d 分隔符:	按照指定分隔符分割列
-c 字符范围:	不依赖分隔符来区分列,而是通过字符范围(行首为 0)来进行字段 提取。“n-”表示从第 n 个字符到行尾;“n-m”从第 n 个字符到第 m 个字符;“-m”表示从第 1 个字符到第 m 个字符。

1.2 Example command

数据
vi student.txt
ID	Name	gender	Mark
1	Liming	M	86
2	Sc	    M	90
3	Tg	    M	83

cut -f 2 student.txt 截取第二列
grep -v 'Name' student.txt|cut -f 2 student.txt 去掉列头展示第二列
cut -f 2,3 student.txt 截取第2,3列
cut -d ':' -f 1,2,3 /etc/passwd 使用:分割/etc/passwd每行,并截取1,2,3列

2.awk command

2.1printf formatted output

printf ‘输出类型输出格式’ 输出内容 输出类型:

%ns:	输出字符串。n 是数字指代输出几个字符
%ni:	输出整数。n 是数字指代输出几个数字
%m.nf:	输出浮点数。m 和 n 是数字,指代输出的整数位数和小数位数。如%8.2f
代表共输出 8 位数,其中 2 位是小数,6 位是整数。

输出格式:
\a:	输出警告声音
\b:	输出退格键,也就是 Backspace 键
\f:	清除屏幕
\n:	换行
\r:	回车,也就是 Enter 键
\t:	水平输出退格键,也就是 Tab 键
\v:	垂直输出退格键,也就是 Tab 键

2.2 demo data

vi student.txt
ID	Name	PHP	Linux	MySQL	Average
1	Liming	82	95	86	87.66
2	Sc	74	96	87	85.66
3	Tg	99	83	93	91.66

printf '%s' $(cat student.txt) 格式混乱
printf '%s\t %s\t %s\t %s\t %s\t %s\t \n' $(cat student.txt) 按照指定格式输出
printf '%i\t %s\t %i\t %i\t %i\t %8.2f\t \n' \
$(cat student.txt | grep -v Name)  将字段转换为指定类型

2.3awk command format

awk '{action 1 Condition 1 Condition 2} {2} ... operation' file name

awk '{printf $2 "\t" $6 "\n"}' student.txt 输出第二列和第6列

2.4awk conditions

Type of condition

condition

 

Description

 

 

awk reserved words

 

BEGIN

When awk program start, executed before any data has not been read. After BEGIN

The action is executed only once at the beginning of the program

 

END

In awk has processed all the data, execution is about to end. After the action END

Only once at the end of the program

 

>

<

>=

more than the

Less than

greater or equal to

 

 

<=

Less than or equal

 

Relational Operators

 

==

equal. For determining whether the two values ​​are equal, if it is assigned to a variable, use

"=" No.

 

!=

not equal to

 

 

A~B

A string is determined whether to include the substring matches B expression

 

A!~B

A determination whether the string does not contain substring matches B expression

Regular Expressions

/ Regular /

If the "//" characters can be written, it can also support regular expressions

awk implementation process

1) If there are conditions BEGIN, the first operation performed BEGIN defined

2) If there is no BEGIN conditions is read into the first row, the first row of data sequentially assigned to $ 0, $ 1, $ 2 variables. Where $ 0

Data representative of the entire trip, $ 1 represents the first field, $ 2 represents the second field.

2) determining whether to perform an operation based on the type of condition. If the condition is met, perform an action, or read the next line of data. If there are no conditions, each row to perform an action.

3) reading the next row of data, repeating the above steps. 

awk built-in variable

awk built-in variable

Role

 

$0

Currently on behalf of the entire row of data awk read. We know awk is read into the data line by line

And $ 0 represents the entire row of data is read into the current line.

$n

Read on behalf of the current line of the n-th field.

NF

The current row fields owned (column) total.

NO

Awk current line being processed, the first few lines of the total data.

 

FS

User-defined delimiters. awk's default delimiter is any space. If you want to use other

Separator (e.g., ":"), needs to define the variable FS.

ARGC

The number of command-line parameters.

ARGV

An array of command line arguments.

FNR

The current number of records in the current file (input file starting at 1).

OFMT

Numerical output format (default% .6g).

OFS

Separator (space by default) of the output field.

ORS

Output record separator (default newline).

RS

The input record separator (default newline).

2.5 Examples

cat student.txt | grep -v Name |	\
awk '$6 >= 87 {printf $2 "\n" }'  #判断第6列的值大于87,如果成立打印第二行
awk '$2 ~ /Sc/ {printf $6 "\n"}' student.txt 获取Sc的成绩
cat /etc/passwd | grep "/bin/bash" | \
awk '{FS=":"} {printf $1 "\t" $3 "\n"}' 查询可以登录用户的 name和UID

3.sed command

sed is mainly used to select the data, replace, delete, add the command

3.1 Syntax

 sed [选项] ‘[动作]’ 文件名 
选项:
-n:	一般 sed 命令会把所有数据都输出到屏幕,如果加入此选择,则只会 把经过 sed 命令处理的行输出到屏幕。
-e:	允许对输入数据应用多条 sed 命令编辑。
-f 脚本文件名: 从 sed 脚本中读入 sed 操作。和 awk 命令的-f 非常类似。
-r:	在 sed 中支持扩展正则表达式。
-i:	用 sed 的修改结果直接修改读取数据的文件,而不是由屏幕输出

动作:
a \:	追加,在当前行后添加一行或多行。添加多行时,除最后 一行外, 每行末尾需要用“\”代表数据未完结。
c \:	行替换,用 c 后面的字符串替换原数据行,替换多行时,除最后一行 外,每行末尾需用“\”代表数据未完结。
i \:	插入,在当期行前插入一行或多行。插入多行时,除最后 一行外, 每行末尾需要用“\”代表数据未完结。
d:	删除,删除指定的行。
p:	打印,输出指定的行。
s:	字串替换,用一个字符串替换另外一个字符串。格式为“行范围 s/
旧字串/新字串/g”(和 vim 中的替换格式类似)。

3.2 Exercises

sed -n '2p' student.txt 打印第二行
sed '2,4d' student.txt 删除第2-4行 ,并没有修改文件的内容,
sed -i '2,4d' student.txt 删除第2-4行,并修改文件的内容
sed '2a hello' student.txt 在第二行后面追加hello
sed '2i hello \
> world' student.txt 在第二行前插入 两行 \为换行符
cat student.txt | sed '2c No such person'  替换第二行为指定的字符
sed ‘s/旧字串/新字串/g’ 文件名 字符串替换
sed '3s/74/99/g' student.txt 替换第三行中的字符串
sed '4s/^/#/g' student.txt 将第4行注释掉
sed -e 's/Liming//g ; s/Tg//g' student.txt 执行多个命令 使用 -e参数

4.sort command

4.1 Command Format

sort [选项] 文件名 选项:
-f:	忽略大小写
-b:	忽略每行前面的空白部分
-n:	以数值型进行排序,默认使用字符串型排序
-r:	反向排序
-u:	删除重复行。就是 uniq 命令
-t:	指定分隔符,默认是分隔符是制表符
-k n[,m]:  按照指定的字段范围排序。从第 n 字段开始,m 字段结束(默认到行尾)

  4.2 Test Sample

sort /etc/passwd  #排序用户信息文件
sort -r /etc/passwd  #反向排序
sort -t ":" -k 3,3 /etc/passwd 使用:分割每行,并用第三个字段排序
sort -n -t ":" -k 3,3 /etc/passwd 将第三个字段转为数值再排序

uniq
uniq 命令是用来取消重复行的命令,其实和“sort -u”选项是一样的。命令格式如下:
[root@localhost ~]# uniq [选项] 文件名 选项:
-i:	忽略大小写

统计命令 wc
[root@localhost ~]# wc [选项] 文件名 选项:
-l:	只统计行数
-w:	只统计单词数
-m:	只统计字符数

 

Guess you like

Origin blog.csdn.net/misxu890312/article/details/90814846