Some operations on linux files

Table of contents

single file

1. View the number of lines in the file

2. Check and process the data of duplicate lines in the file (--uniq command)

2.1 Remove duplicate rows

2.2 Delete duplicate lines and count the number of times each line appears in the file

2.3 Find non-repeating rows and only display rows and columns once

2.4 Find duplicate rows and only display rows and columns greater than 1 time (repeated)

2.5 Remove duplicate rows

2.6 Delete duplicate lines and count the number of times each line appears in the file

2.7 Find non-duplicate rows and only display rows and columns once

2.8 Find duplicate rows and only display rows and columns greater than 1 time (repeated)

multiple files

1. Compare whether the contents of the two files are consistent

2. File Merge

2.1 Merge two files

2.2 Intersection and union of two files


single file

1. View the number of lines in the file

wc -l file # 输出 file 的行数

2. Check and process the data of duplicate lines in the file (--uniq command)

Note when using uniq:
when operating on text, it is generally used in combination with the sort command, because uniq does not check for duplicate lines unless they are adjacent lines. If you want to sort the input first, use sort -u.

  • Duplicate rows adjacent

2.1 Remove duplicate rows

uniq file 

2.2 Delete duplicate lines and count the number of times each line appears in the file

uniq -c file  

2.3 Find non-repeating rows and only display rows and columns once

uniq -u file 

2.4 Find duplicate rows and only display rows and columns greater than 1 time (repeated)

uniq -d file 
  • Duplicate rows are not adjacent, use sort first

2.5 Remove duplicate rows

sort file | uniq 

2.6 Delete duplicate lines and count the number of times each line appears in the file

sort file | uniq -c

2.7 Find non-duplicate rows and only display rows and columns once

sort file | uniq -u

2.8 Find duplicate rows, only display rows and columns that are more than 1 time (repeated)

sort file | uniq -d

multiple files

1. Compare whether the contents of the two files are consistent

  • take the same row
grep -wf [file1] [file2] ...
  • take different rows
grep -wvf [file1] [file2] ...
  • Compare the difference between files line by line, how to match the second file after changing the first file.
diff [选项] file1 file2

diff file1 file2 #正常显示
diff -c file1 file2 #上下文格式显示
diff -u file1 file2 #合并格式显示

   Common options

options

meaning

Remark

-b

do not check for spaces

-B do not check for blank lines
-i do not check case
-w ignore all spaces
-normal display in normal format (default)
-c contextual format display
-u merge format

for example

[wqf@hello rm_test]$ cat file1.txt 
aaaa
111
hello world
222
333
bbb

[wqf@hello rm_test]$ cat file2.txt
aaa
hello
111
222
bbb
333
world

[wqf@hello rm_test]$  grep -wf file1.txt file2.txt #取相同的行
111
222
bbb
333
[wqf@hello rm_test]$  grep -wvf file1.txt file2.txt #取不同的行
aaa
hello
world

[wqf@hello rm_test]$ diff file1.txt file2.txt
1c1,2         第一个文件的第1行需要改变(c=change)才能和第二个文件的第1到2行匹配
< aaaa        小于号"<"表示左边文件(file1)文件内容
---           ---表示分隔符
> aaa         大于号">"表示右边文件(file2)文件内容
> hello       第一个文件的第3行删除(d=delete)后才能和第二个文件的第3行匹配
3d3
< hello world
5d4           第一个文件的第5行删除后才能和第二个文件的第4行匹配
< 333
6a6,7         第一个文件的第6行增加(a=add)内容后才能和第二个文件的第6到7行匹配
> 333         需要增加的内容在第二个文件里是333和world
> world

[wqf@hello rm_test]$ diff -c file1.txt file2.txt
两行主要列出需要比较的文件名和文件的时间戳;文件名前面的符号***表示file1,---表示file2
*** file1.txt   2023-01-05 09:34:40.868721925 +0800
--- file2.txt   2023-01-05 09:35:02.513082078 +0800
***************     我是分隔符
*** 1,6 ****        以***开头表示file1文件,1,6表示1到6行
! aaaa              !表示该行需要修改才与第二个文件匹配
  111               
- hello world       -表示需要删除该行才与第二个文件匹配
  222
- 333               -表示需要删除该行才与第二个文件匹配
  bbb
--- 1,7 ----        以---开头表示file2文件,1,7表示1到7行
! aaa               表示第一个文件需要修改才与第二个文件匹配
! hello             表示第一个文件需要修改才与第二个文件匹配
  111
  222
  bbb
+ 333               表示第一个文件需要加上该行才与第二个文件匹配
+ world             表示第一个文件需要加上该行才与第二个文件匹配

[wqf@hello rm_test]$  diff -u file1.txt file2.txt 
--- file1.txt   2023-01-05 09:34:40.868721925 +0800
+++ file2.txt   2023-01-05 09:35:02.513082078 +0800
@@ -1,6 +1,7 @@
-aaaa    -表示第一个文件需要删除aaaa
+aaa     +表示第一个文件需要添加aaa
+hello   +表示第一个文件需要添加hello
 111      表示第一个文件不变
-hello world -表示第一个文件删除hello world
 222      表示第一个文件不变
-333      -表示第一个文件需要删除333
 bbb      表示第一个文件不变
+333      +表示第一个文件需要添加333
+world    +表示第一个文件需要添加world

tip

Scenario: Sometimes we need to use one file as the standard to modify other files, and when there are many modifications, we can complete it by patching.

1)先找出文件不同,然后输出到一个文件
[wqf@hello rm_test]$ diff -uN file1.txt file2.txt > file.patch
-u:上下文格式
-N:将不存在的文件当作空文件

2)将不同内容打补丁到文件
[wqf@hello rm_test]$ patch file1.txt file.patch
patching file file1.txt

3)测试验证
[wqf@hello rm_test]$ diff file1.txt file2.txt
[wqf@hello rm_test]$

2. File Merge

2.1 Merge two files

cat file1 file2 > file3 #一个文件在上,一个文件在下
paste file1 file2 > file3 #一个文件在左,一个文件在右

2.2 Intersection and union of two files

Precondition: There must be no duplicate lines in each file

2.2.1 Take the union of two files (only keep one copy of duplicate lines)

cat file1 file2 |sort| uniq > file3

2.2.2 Take the intersection of two files (leaving only files that exist in both files)

cat file1 file2 |sort|uniq -d > file3

2.2.3 Delete the intersection and take the union

cat file1 file2 |sort|uniq -u > file3

Guess you like

Origin blog.csdn.net/sodaloveer/article/details/125427502