Liunx data stream redirection sort, sed, awk


Data flow redirection

Data stream redirection: When a command is executed, the command may read data from a file, and then output the data to the screen after processing. Standard output and standard error output stand for [standard output] and [standard error, respectively Output]

Standard output: the correct information returned by the command line

Standard error output: the error message returned after the command fails

  • Standard input (stdin): The code is 0, use <or <<
  • Standard output (stdout): The code is 1, use> or >>
  • Standard error output (stderr): The code is 2, use 2> or 2>>

The meaning of the above code:

  • 1>: Output the [correct data] to the specified file or device by overwriting
  • 1>>: Output the [correct data] to the specified file or device in a cumulative manner
  • 2>: Output [wrong data] to the specified file or device by overwriting
  • 2>>: Output [wrong data] to the specified file or device in a cumulative way

Case:

# 下面的命令输出了正确的日志信息和错误的日志信息
[hadoop@bigdata01 tmp]$ find /home -name .bashrc
find: ‘/home/ruoze’: Permission denied
/home/hadoop/.bashrc
find: ‘/home/azkaban’: Permission denied

# 分别输出正确信息和错误信息到不同的文件,执行命令会输出正确日志和错误日志
find /home -name .bashrc > right.out 2>error.out

# 将错误日志和正确日志写到一个文件中且保证日志输出的顺序
find /home -name .bashrc > all.out 2>&1  -- 推荐使用
find /home -name .bashrc &> all.out 

sort command use

sort can help us sort, and can sort according to the specified data format, for example, the sorting of numbers and text is different

[root@bigdata01 ~]# sort --help
Usage: sort [OPTION]... [FILE]...
  or:  sort [OPTION]... --files0-from=F

  -b, --ignore-leading-blanks 忽略最前面空格字符部分
  -f, --ignore-case           忽略大小写的差异
  -k, --key=KEYDEF            以哪个区间来进行排
  -r, --reverse               反向排序
  -u, --unique                uniq,相同的数据只出现一行
  -t, --field-separator=SEP   分割符号,就是以【Tab】键进行分割
  -n, --numeric-sort          以【纯数字】进行排序,(默认是使用文字形式进行排序)
  。。。。

uniq uses:

[root@bigdata01 ~]# uniq --help
Usage: uniq [OPTION]... [INPUT [OUTPUT]]
  -c, --count           进行计数
  -i, --ignore-case     忽略大小写

Case:

[hadoop@bigdata01 tmp]$ cat test3.txt 
hadoop
hadoop
hadoop
spark
spark
spark

[hadoop@bigdata01 tmp]$ cat test3.txt | uniq 
hadoop
spark

wc uses:

[root@bigdata01 ~]# wc --help
Usage: wc [OPTION]... [FILE]...
  -c, --bytes            统计字节数
  -m, --chars            统计字符数
  -l, --lines            列出多少行

Case:

[hadoop@bigdata01 tmp]$ cat test.txt |wc 
     16      16     227
[hadoop@bigdata01 tmp]$ cat test.txt |wc -l
16
[hadoop@bigdata01 tmp]$ cat test.txt |wc -m
227
[hadoop@bigdata01 tmp]$ cat test.txt |wc -w
16

The sed command is simple to use

Sed is a stream editor, it is a very good tool in text processing, it can be used perfectly with regular expressions, and its functions are extraordinary. During processing, the currently processed line is stored in a temporary buffer called "pattern space", and then the content in the buffer is processed with the sed command . After the processing is completed, the content of the buffer is sent to the screen. Then process the next line, and repeat this way until the end of the file. The content of the file does not change unless you use redirection to store the output. Sed is mainly used to automatically edit one or more files. It can perform specific tasks such as replacing, deleting, adding, and selecting data rows, simplifying repeated operations on files, and writing conversion programs.

sed的命令格式:sed [options] 'command' file(s);

sed的脚本格式:sed [options] -f scriptfile file(s);

parameter:

 -e :直接在命令行模式上进行sed动作编辑,此为默认选项;

 -f :将sed的动作写在一个文件内,用–f filename 执行filename内的sed动作;

 -i :直接修改文件内容;

 -n :只打印模式匹配的行;

 -r :支持扩展表达式;

 -h或--help:显示帮助;

 -V或--version:显示版本信息。

sed common commands

 a\ 在当前行下面插入文本;
 i\ 在当前行上面插入文本;
 c\ 把选定的行改为新的文本;
 d 删除,删除选择的行;
 D 删除模板块的第一行;
 s 替换指定字符;

Case:

# 替换指定的内容
[root@bigdata01 tmp]# cat test3.txt 
HADOOP
HADOOP
HADOOP
spark
spark
spark
[root@bigdata01 tmp]# sed 's/HADOOP/hadoop/g' test3.txt ==> 不会替换原文件呢中内容
hadoop
hadoop
hadoop
spark
spark
spark
[root@bigdata01 tmp]# cat test3.txt 
HADOOP
HADOOP
HADOOP
spark
spark
spark
[root@bigdata01 tmp]# sed -i 's/HADOOP/hadoop/g' test3.txt ==> 替换原文件中的内容
[root@bigdata01 tmp]# cat test3.txt 
hadoop
hadoop
hadoop
spark
spark
spark
[root@bigdata01 tmp]# 

Data search and display:

[root@bigdata01 tmp]# clear
[root@bigdata01 tmp]# nl test.txt |sed '2,5d'  ===> 不显示2行和5行
     1	1,hadoopdoophadoophaoddop,3
     6	6,flume,56
     7	7,kylin,2
     8	8,es,1
     9	9,hadoop,4
    10	10,hadoop,12
    11	11,flink,6
    12	12,flink,4
    13	13,spark,4
    14	14,spark,56
    15	15,spark,22
    16	16,spark,33
[root@bigdata01 tmp]# nl test.txt |sed '3,$d'  ===> 不显示第三行到最后一行
     1	1,hadoopdoophadoophaoddop,3
     2	2,flinflinkflinkflinkflinkk,5
[root@bigdata01 tmp]# nl test.txt |sed '2,$d' ===> 不显示第二行到最后一行,只显示第一行
     1	1,hadoopdoophadoophaoddop,3

awk commonly used commands

Awk is a powerful text analysis tool. Simply put, awk is to read the file line by line, (space, tab) as the default separator to slice each line, and then perform various analysis and processing on the cut part.

Compared with sed, which is commonly used to process a whole row of data, awk prefers to divide a row into several fields for processing, so awk is quite suitable for processing small text data

awk [-F field-separator] 'commands' input-file(s)

awk '条件类型1{操作1} 条件类型2{操作2}...' filename

[-F Separator] is optional, because awk uses spaces and tabs as the default field separator, so if you want to browse text with spaces or tabs between fields, you do not need to specify this option, but if you want Browse files such as /etc/passwd, where each field of this file is separated by a colon, you must specify the -F option

  • NF: the number of fields in each row ($0)
  • NR: The number of rows where awk is currently located
  • FS: The current split character, the default is the space bar
# 原始数据文件
[root@bigdata01 tmp]# cat tmpdata.txt  
1,hadoop,23323
2,spark,234244
3,kafka,897373
4,scrip,123947
5,bangk,627449
6,hadoop,13445
7,hadoop,23454

# 使用awk进行处理 -F指定分隔符 $1 为第一列  $2 为第二列 ,FS默认为空格,NF每一行的字段数,NR 第几行
[root@bigdata01 tmp]# cat tmpdata.txt |awk -F ',' '{print $1 "\t" $2 FS "\t" NF "\t" NR}'
1	hadoop,	3	1
2	spark,	3	2
3	kafka,	3	3
4	scrip,	3	4
5	bangk,	3	5
6	hadoop,	3	6
7	hadoop,	3	7

# 输出文件中的第二行并去重
[root@bigdata01 tmp]# cat tmpdata.txt |awk -F ',' '{print $2}'| uniq 
hadoop
spark
kafka
scrip
bangk
hadoop
[root@bigdata01 tmp]# 

# 输出文件中的第二行,按照第二行统计单词并去重,统计去重后的个数
[root@bigdata01 tmp]# cat tmpdata.txt |awk -F ',' '{print $2}'| uniq | wc -l
6

Combining awk and the above common commands can almost meet the daily use scene

Guess you like

Origin blog.csdn.net/qq_43081842/article/details/110334894