Common commands for processing text in Linux

Sometimes we need to do simple text processing. The text processing commands in Linux are very powerful and can provide great convenience. This blog introduces several commonly used text processing commands.

grep

Line filtering tool; used to find strings that meet the conditions in the file.

grammar

grep [options] 'keywords' filename

Common options

  • -i case insensitive
  • -v Find lines that do not contain the specified content, reverse selection
  • -w search by word
  • -o print matching keywords
  • -c counts the number of matches
  • -n display line number
  • -r traverse the directory layer by layer to find
  • -A Display the matching line and how many lines after it
  • -B Display the matching line and how many lines before it
  • -C Display the matching line and how many lines before and after
  • -l list only matching filenames
  • -L list non-matching filenames
  • -e use regular matching
  • -E use extended regex matching
  • ^key starts with a keyword
  • key$ ends with a keyword
  • ^$ matches an empty line
  • –color=auto can add color display to the found keywords

test

> cat test.txt
Hello
World
Linux
Ubuntu

> grep -inw 'linux' test.txt
3:Linux

cut tool

Column interception tool; used for text file column interception.

grammar

cut option filename

Common options

  • -c split in units of characters, intercept
  • -d custom separator, the default is tab \t
  • -f is used with -d to specify which area to intercept

test

> cat test.txt
Hello World
Linux Ubuntu

# 按空格分隔 截取第一列内容
> cut -d ' ' -f1 test.txt
Hello
Linux

# 截取文件中每行的1到5个字符
> cut -c1-5 text.txt 
Hello
Linux

sort tool

The sort tool is used for sorting. It takes each line of the file as a unit, and compares them according to the ASCII code value from the first character backwards, and finally outputs them in ascending order.

grammar

sort [options] file

Common options

  • -u remove duplicate lines
  • -r sort in descending order, the default is ascending
  • -o output the sorting results to a file, similar to the redirection symbol >
  • -n sort by number, the default is sort by character
  • -t separator
  • -k column N
  • -b Ignore whitespace characters at the beginning of each line

test

> cat test.txt
DDD
AAA
CCC
BBB

> sort test.txt
AAA
BBB
CCC
DDD

> cat test.txt
DDD	2
AAA 4
CCC 5
BBB	1

#  以空格分割第2行升序排列
> sort -t ' ' -k2 test.txt
BBB	1
DDD	2
AAA 4
CCC 5

unique

uniq is used to remove consecutive duplicate lines

grammar

uniq [options] file

Common options

  • -u only show records without duplicates
  • -c Count the number of duplicate lines
  • -d only show duplicate lines

test

> cat test.txt
AAA
BBB
BBB
bbb
CCC
AAA

> uniq test.txt
AAA
BBB
bbb
CCC
AAA

> uniq -u test.txt
AAA
bbb
CCC
AAA

tee tool

The tee tool reads from standard input and writes to standard output and files. That is: two-way overlay redirection (screen output | text input).

grammar

tee [parameters] [file]

Common options

  • -a append to existing file instead of overwriting it

test

> echo 'hello word' | tee test.txt
> cat test.txt
hello world

diff

The diff tool compares file differences line by line.

grammar

diff [options] file1 file2

Common options

  • -b do not check for spaces
  • -B do not check for blank lines
  • -i do not check case
  • -w ignore all whitespace
  • --normall display in normal format (default)
  • -c context format display
  • -u Merge format display

test

> cat file1.txt
AA
CC
BB
QQ

> cat file2.txt
CC
DD
AA

> diff file1.txt file2.txt
1d0  # 第一个文件要删除第一行才能与第二个文件的0行匹配
< AA 	# 后面的文件需要添加
3,4c2,3	# 第一个文件3,4行要改变才能与第二个文件的2,3行匹配
< BB  # 后面的文件需要添加
< QQ # 后面的文件需要添加
---
> DD # 前面的文件需要添加
> AA # 前面的文件需要添加 

> diff file1.txt file2.txt -y
AA  <
CC 	CC
BB 	| DD
QQ	| AA

Notice

"|" indicates that the content of the two files before and after is different, "<" indicates that the content of the latter file is 1 line less than that of the previous file, and ">" indicates that the content of the latter file is 1 line more than that of the previous file.

paste

The paste tool is used to merge file lines

grammar

paste [parameters] [file1] [file2]

Common options

  • -d custom spacer, the default is tab
  • -s serial processing, non-parallel first line first file, second line second file

test

> cat file1.txt
AA
CC
BB
QQ

> cat file2.txt
CC
DD
AA

> paste file1.txt file2.txt
AA  CC
CC  DD
BB  AA
QQ

tr

tr is used for character conversion, replacement and deletion, and is mainly used to delete control characters in files or perform character conversion.

grammar

tr [parameters] [string1] [string2]

Common options

  • -d delete all input characters in string1
  • -s deletes all repeated character sequences, keeping only the first one, that is, compresses repeated strings into one string
  • a-z
  • A-Z
  • 0-9
  • [:alnum:] All alpha characters and numbers
  • [:alpha:] all alpha characters
  • [:blank:] all horizontal blanks
  • [:cntrl:] All control characters
  • [:digit:] all digits
  • [:graph:] All printable characters (excluding spaces)
  • [:lower:] all lowercase letters
  • [:print:] all printable characters (including spaces)
  • [:punct:] All punctuation characters
  • [:space:] all horizontal and vertical spaces
  • [:upper:] All uppercase letters

test

# 所有小写字母改为大写字母
> echo 'hello WORLD' | tr [:lower:] [:upper:]
HELLO WORLD

Guess you like

Origin blog.csdn.net/flash_love/article/details/132122666