When constructing a regular expression pattern, we must consider not only the accuracy of the matching result, but also its execution efficiency.
Common scenarios
Scenario 1: North American telephone number, consisting of a three-digit area code and a seven-digit number (these seven digits are divided into a three-digit office number and a four-digit line number, office number and line number Separated by hyphens). Each number can be any number, but the first digit of the area code and office code cannot be 0 or 1
Text: Le: 248-555-1234
Ben: (313) 555-1234
Dee :(810)555-1234
Regular expression: \ (? [2-9] \ d \ d \)? [-]? [2-9] \ d \ d- \ d {4}
Analysis: First \ (?, Escape (, can match the brackets of (,? Means there can be brackets or not
Because the first one is not 1 or 0, all take [2-9], \ d is any other number
Behind\)? The brackets at the end are optional
[-]? Represents three cases, it can be a space, it can be-, or it can be nothing directly, followed by the basic number matching, {4} means repeat four times
But in fact, the above is not comprehensive enough. For example, when 888) 222-1234 is only one side), it will be matched successfully.
So change to (\ ()? [2-9] \ d \ d (? (1) \) |-) [-]? [2-9] \ d \ d- \ d {4} I do n’t know why there is Question, the idea is to conduct conditional query
China fixed telephone numbers are similar:
Rule: At the beginning, 0 is fixed, which means long distance, followed by an area code consisting of 2, 3, and 4 digits, and then a 7 or 8 digit telephone number.
Regular expression: \ (? 0 [1-9] \ d {1,3} \)? [-]? [2-9] \ d {2,3} [-]? \ D {4}