Regular expressions and text processors

Extended regular expression metacharacters

  • Extended regular expressions are the expansion and deepening of basic regular expressions
  • Extended metacharacter
    1.+: Match the preceding sub-expression more than once
    For example: go+d, will match at least one o
    2.? : Match the previous sub-expression 0 or 1 time
    For example: go?d will match dg or god
    3. (): The string in the brackets as a whole
    For example: (xyz)+ will match xyz as a whole more than once, such as xyzxyz
    4.| :Or match character strings in the same way.
    For example: good|food will match good or food
    g(oo|la)d, and it will match good or glad

Introduction to awk tools

  • Powerful editing tools
  • Realize complex text operations without interaction
  • Command format

awk option'mode or condition {edit instruction}' file 1 file 2
awk -f script file 1 file 2

  • Awk contains several special built-in variables (can be used directly)

FS: Specify the field separator of each line of text, the default is space or tab stop
NF: the number of fields of the currently processed line
NR: the line number of the current line (ordinal number)
$0: the current nth field (nth Column)
RS: Data record separation, the default is \n, that is, one record per line

  • How awk works

Read text line by line, separated by space as the delimiter by default, save the separated brother field to the built-in variable, and execute the editing command according to the mode or condition

sort tool

  • Sort according to different data types

Character sorting
Number sorting

  • Grammatical ordering

sort [options] parameter

  • Common options

-f: Ignore case
-b: Ignore the space in front of each line
-M: Sort by month
-r: Reverse sort
-u: Same as uniq, means that only one line of the same data is displayed
-t: Specify separator, default Use [Tab] key to separate
-o: <output file>: transfer the sorted results to the specified file
-k: specify the sorting area

Use of uniq tools

  • Commonly used options for uniq

-c: Perform technique
-d: Only display repeated lines
-u: Only display lines that appear once.

Use of tr tool

  • The commonly used options include the following

-c: replace all characters that do not belong to the first character set
-d: delete all characters that belong to the first character set
-s: represent consecutively repeated characters as a single character
-t: delete the first character set first Replace the extra characters in the second character set

Guess you like

Origin blog.csdn.net/weixin_50346902/article/details/109661935