Regular expressions and text processors
Extended regular expression metacharacters
- Extended regular expressions are the expansion and deepening of basic regular expressions
- Extended metacharacter
1.+: Match the preceding sub-expression more than once
For example: go+d, will match at least one o
2.? : Match the previous sub-expression 0 or 1 time
For example: go?d will match dg or god
3. (): The string in the brackets as a whole
For example: (xyz)+ will match xyz as a whole more than once, such as xyzxyz
4.| :Or match character strings in the same way.
For example: good|food will match good or food
g(oo|la)d, and it will match good or glad
Introduction to awk tools
- Powerful editing tools
- Realize complex text operations without interaction
- Command format
awk option'mode or condition {edit instruction}' file 1 file 2
awk -f script file 1 file 2
- Awk contains several special built-in variables (can be used directly)
FS: Specify the field separator of each line of text, the default is space or tab stop
NF: the number of fields of the currently processed line
NR: the line number of the current line (ordinal number)
$0: the current nth field (nth Column)
RS: Data record separation, the default is \n, that is, one record per line
- How awk works
Read text line by line, separated by space as the delimiter by default, save the separated brother field to the built-in variable, and execute the editing command according to the mode or condition
sort tool
- Sort according to different data types
Character sorting
Number sorting
- Grammatical ordering
sort [options] parameter
- Common options
-f: Ignore case
-b: Ignore the space in front of each line
-M: Sort by month
-r: Reverse sort
-u: Same as uniq, means that only one line of the same data is displayed
-t: Specify separator, default Use [Tab] key to separate
-o: <output file>: transfer the sorted results to the specified file
-k: specify the sorting area
Use of uniq tools
- Commonly used options for uniq
-c: Perform technique
-d: Only display repeated lines
-u: Only display lines that appear once.
Use of tr tool
- The commonly used options include the following
-c: replace all characters that do not belong to the first character set
-d: delete all characters that belong to the first character set
-s: represent consecutively repeated characters as a single character
-t: delete the first character set first Replace the extra characters in the second character set