Filter ID number and grep review

First, the ID number filter out

We start with a face questions.

Please filter the correct identification number from among the test.txt file

[root@localhost test.dir]# cat test.txt

Zhao 370831199405162458

Money 370831199305162kjl

Sun 37083119920516245X

Lee 37083110516245887k

Zhang 37083KKKKKKK990516

In LINUX them, we want to filter the text, we must be clear, what you want to filter text feature, then the question is, what characteristics ID number?

  1. l The ID number is 18
  2. l Some ID number is all digital, and some former ID number is 17 digits, the last digit is a capital X

Well, then things simple!

We first meet the first requirement, namely identity card numbers are 18, 18! How to use regular expressions to express it? In fact, we can not use basic regular expressions express 18-digit number, need to use extended regular expressions, namely: {18}; the second requires a little trouble, the last one is not sure what the last one either digital either uppercase X, "or" relationship, or relationships with extended regular expressions how to express it? To use "|" symbol in front of the 17 numbers actually easy, digital by [0-9] can be expressed.

Well, pay attention to, ah! Both methods can be:

[root @ localhost test.dir] # egrep '[0-9] {18} | [0-9] {17} X' test.txt 
Zhao 370831199405162458 
Sun 37083119920516245X

# Find the 18-digit or before 17 is the last digit of the line is X, the number is completely digital and mixed digital X number considered separately

[root@localhost test.dir]# egrep '[0-9]{17}[0-9X]{1}' test.txt
赵 370831199405162458 
孙 37083119920516245X 

# Find the first 17 bits are followed by numbers with a digit or capital line of X, the number of fully digital and mixed digital X number considered together

OK, now more difficult! Now become such a document:

[root @ localhost test.dir] # cat test.txt 
Zhao 370831199405162458 
money 370831199305162kjl 
Sun 39083119920516245X 
Chen 37083119920516245X377 
Lee 37083110516245887k 
Zhang 37083KKKKKKK990516 
[root @ localhost test.dir] # egrep '[0-9] {18} | [0-9] {17} X 'test.txt 
Zhao 370831199405162458 
Sun 39083119920516245X 
Chen 37083119920516245X377 # emergence of this, does not meet our needs # Why do not meet the requirements? This is because the line is clearly more than the required 18 
[root @ localhost test.dir] # egrep '[0-9] {17} [0-9X] {1}' test.txt 
Zhao 370831199405162458 
Sun 39083119920516245X 
Chen 37083119920516245X377 # the emergence of this, does not meet our needs

How to do it? How can we filter out only 18 of it? Thus it can be, as follows:

[root @ localhost test.dir] # egrep '[0-9] {18} | [0-9] {17} X' test.txt -w # -w option using egrep 
Zhao 370831199405162458 
Sun 39083119920516245X 
[the root @ localhost test.dir] # egrep '[0-9] {17} [0-9X] {1}' test.txt -w 
Zhao 370831199405162458 
Sun 39083119920516245X

-w What does it mean? That is anchored consecutive characters, namely anchoring words carefully understand it!

Two, grep use review

grep text as one of the Three Musketeers, its role is to filter the text, with single quotes match, we had better use single quotes when using the grep.

It has the following are several commonly used options, we have to do a brief description, and then cite detailed examples

  1. n: display the line number, the line is the original line number in the original text
  2. v: Invert
  3. o: display process, as will be understood to show only the matching character instead of rows, row filter default grep
  4. E: making grep support extended regular expressions, egrep and the effect is the same
  5. i: igore case, case insensitive
  6. w: above us had the experience, and that is anchored consecutive characters that describe the word
  7. A2: 2 is shown after the target row, a digital customizable
  8. B2: 2 appear in front of the target row, a digital customizable
  9. C2: display 2 rows of each longitudinal row of objects, digital customizable

Here is an example of:

[root @ localhost test.dir] # cat test.txt 
Zhao 370831199405162458 
money 370831199305162kjl 
Sun 39083119920516245X 
Chen 37083119920516245X377 
Lee 37083110516245887k 
Zhang 37083KKKKKKK990516 
[root @ localhost test.dir] # grep -n '1993' test.txt display line numbers # 
2: Money 370831199305162kjl 
[the root @ localhost on test.dir] -o # grep '1993' # test.txt to show only the matching character, do not display an entire row 1993 
[the root @ localhost on test.dir] # -nv grep '1993' Test. txt # negated 
1: Zhao 3,708,311,994,051,624,583: Sun 39083119920516245X   
. 4: Chen 37083119920516245X377 
. 5: Li 37083110516245887k 
. 6: Double 37083KKKKKKK990516 
[the root @ localhost on test.dir] # -Ew grep '[0-9] 18 is {} | [0-9 ] {17} X 'test.txt # -E -w and used in conjunction, can read it? Do not understand the above example could look at Oh! 
Zhao 370831199405162458
Sun 39083119920516245X
[root @ localhost test.dir] # grep -E '[k] {7}' test.txt # lowercase of K, and consequently do not filter out 
[root @ localhost test.dir] # grep -Ei '[k] after {7} 'test.txt # -i is option ignores case, this line will be able to filter out. 
Zhang 37083KKKKKKK990516 
[the root @ localhost on test.dir] # -EB2 grep '[K]}. 7 {' # test.txt the above two lines of the target content filtered 
Chen 37083119920516245X377 
LI 37083110516245887k 
ZHANG 37083KKKKKKK990516

  

 

Guess you like

Origin www.cnblogs.com/yizhangheka/p/11705014.html