Detailed explanation of shell scripts (7)-regular expressions, sort, uniq, tr
- One, sort command-sort
- Two, remove duplicate line operation command-uniq
- Three, character conversion command-tr
- Four, display, connect file command-cut
- Five, regular expressions
-
- 1. Common metacharacters in basic regular expressions (support tools: egrep, awk, grep, sed)
- 2. Extended regular expression metacharacters (support tools: egrep, awk)
- 3. Example
-
- ①. First display the mobile phone numbers starting with 13 and 15 in the file, and then display the regional landline number
- ②. To display the mail, the user name must start with a letter, and at most two symbols "-" or "." can be used in the middle, and the end of the symbol cannot be used. The length of the user name is at least 6 characters
One, sort command-sort
- Sort the contents of the files by row, or according to different data types
1. Format
2. Common options
Options | Description |
---|---|
-f | Ignore case, uppercase letters are sorted first by default |
-b | Ignore the spaces in front of each line |
-n | Sort by number |
-r | Reverse sort |
-u | Same as uniq, which means that only one line of the same data is displayed, deduplication |
-t | Specify the field separator, use the tab key to separate by default |
-k | Specify sort field |
-o <output file> | Export the sorted results to the specified file |
3. Example
Two, remove duplicate line operation command-uniq
- Used to report or ignore consecutive repeated lines in the file, often used in conjunction with the sort command
1. Format
2. Common options
Options | Description |
---|---|
-c | Count and delete repeated lines in the file |
-d | Show only consecutive repeated rows |
-u | Show only lines that appear once |
3. Example
Three, character conversion command-tr
- Commonly used to replace, compress and delete characters from standard input
1. Format
2. Common options
Options | Description |
---|---|
-c | Characters in character set 1 are reserved, and other characters (including newline \n) are replaced with character set 2 |
-d | Delete all characters belonging to character set 1 |
-s | Compress the repetitive character string into a character string, and replace character set 1 with character set 2 |
-t | Character set 2 replaces character set 1, the same result without options |
3. Parameters
-
Character set 1:
- Specify the original character set to be converted or deleted. When performing the conversion operation, the parameter "Character Set 2" must be used to specify the conversion operation, and the parameter "Character Set 2" must be used to specify the target character set of the conversion. But when executing the delete operation, the parameter "Character Set 2" is not required
-
Character set 2:
- Specify the target character set to be converted
4. Example
Four, display, connect file command-cut
- The cut command has two main functions, the first is to display the contents of the file, and the second is to connect multiple or multiple files
1. Format
2. Common options
Options | Description |
---|---|
-b | Split in bytes, and only display the content of the specified direct range in the row |
-c | Split by character, only display characters in the specified range in the line |
-d | Custom separator, the default is tab "TAB" |
-f | Display the content of the specified field, used with -d |
-n | Unsplit multibyte characters |
–complement | Complement selected bytes, characters or fields |
–out-delimiter | Specify the field separator of the output content |
3. Example
Five, regular expressions
-
Usually used in judgment statements to check whether a string meets a certain format
-
Regular expressions are composed of ordinary characters and metacharacters
-
Common characters include uppercase and lowercase letters, numbers, punctuation marks and some other symbols
-
Metacharacters refer to special characters with special meaning in regular expressions. They can be used to specify the appearance of the leading character (the character before the metacharacter) in the target object.
1. Common metacharacters in basic regular expressions (support tools: egrep, awk, grep, sed)
Metacharacter | Description |
---|---|
\ | Escape characters, used to cancel the meaning of special symbols, for example: !, \n, $, etc. |
^ | The starting position of the matching string, for example: ^a, ^the, #, [az] |
$ | End of the string matching position, for example: Word Katex the parse error: After the Expected Group '^' position AT. 3:, ^ matches the null line |
. | Match any character except \n, for example: go.d, g...d |
* | Match the preceding sub-expression 0 or more times, for example: goo*d, go.*d |
[list] | Match a character in the list, for example: go[ola]d, [abc], [az], [a-z0-9], [0-9] match any digit |
[^list] | Match any character in a non-list list, for example: [^0-9], [^A-Z0-9], [^az] match any non-lowercase letter |
{n} | Match the preceding sub-expression n times, for example: go{2}d,'[0-9]{2}'match two digits |
{n,} | The sub-expression before matching is not less than n times, for example: go{2, }d,'[0-9]{2, }'matches two or more digits |
{n,m} | Match the preceding sub-expression n to m times, for example: go{2,3}d, '[0-9]{2,3}' matches two to three digits |
Note: when egrep and awk use {n}, {n,}, {n, m} to match, there is no need to add "\" before "{}"
2. Extended regular expression metacharacters (support tools: egrep, awk)
Metacharacter | Description |
+ | Match the preceding sub-expression more than once, for example: go+d, will match at least one o, such as god, good, goood, etc. |
? | Match the previous sub-expression 0 or 1 time, for example: go?d, will match gd or god |
() | Take the string in the brackets as a whole of h, for example 1: g(oo)+d," will match the whole oo more than once, such as good, gooood, etc. |
| | Match the string of characters in an or manner, for example: g (oo|la)d," will match good or glad |