Regular Expression String Validation

Regular Expression String Validation

Part 1: 5 kinds of English word verification

1. Validation of lowercase English words

  • We can use [az]+ to validate one or more lowercase English words , and of course [az]{1,} as well. But the word verified in this way may not be complete, eg for the string ad "" it will match the ad in it.
  • To solve the previous problem - which is not able to verify complete words, we can use \b[az]+\b to verify. That is, the boundaries of English words are specified by the metacharacter \b.
  • \ba[az]*\b can verify words with one or more English letters starting with the lowercase letter a . (Note: the qualifier * means 0 or more)
  • \b[ac][az]*\b can verify words with one or more English letters starting with lowercase a or b or c .
  • \b[az]+(?:ing)\b  matches strings ending in ing ,
  • \b[az]{6} can verify the string composed of lowercase English of length 6

     Note: (?:ing) in the penultimate verification, although () is used here, that is, the grouping symbol, but in fact, because of ?:, we cannot perform backreferences. It can be understood here that we only need to regard it as a whole, but do not want to be backreferenced.

 

2. Verification of capitalized English words

  Here is the same usage as lowercase English words, we just need to replace the above [az] with [AZ].

 

3. Delimiter verification of English words

  In English text, individual English words are separated by separators. A separator is a symbol that separates two English words. These delimiters include English punctuation, whitespace, and more.

  Among the separators, there are mostly English punctuation marks, such as ,(comma) .( dot ) ?(question mark) :(colon) ;(semicolon) '(single quotation mark) !(exclamation mark) "(double quotation mark) - (hyphen) --(dash) ()(parentheses) [](square brackets) {}(curly brackets) ...(ellipsis) ` ( possessive symbol, also known as backticks ).

  A regular expression such as the following will validate these English punctuation delimiters:

[-,.?:;'"!`]|(-{2})|(\.{3})|(\(\))|(\[\])|({})

 

     Why didn't I put all the English separators into the character class []? This is because for -- (dash)...(ellipsis) () (parentheses) [] (brackets) {} (braces) these delimiters will either cause ambiguity and cannot be expressed correctly, or must use \ to Escape.

 

4. Negative Validation

  • \b[A-GI-Z]+\ b can verify any English word without the letter H
  • \b[A-GI-OQ-Z]\b can verify any English word without letters H and P
  • \b[AZ]*B[^P][AZ]*\b can verify any English word without P after B.
  • But why is DAFB not matched? This is because [^P] occupies a position after B, i.e. there must be nothing at the end.

 

5. English word verification with the same characteristics

  a. Use \b([AZ])\1[AZ]*\b to verify English words that start with two identical letters . Because \1 is a backreference to the first grouping.

  b. Use \b([AZ])[AZ]*\1+[AZ]*\b to verify English words that have the same letter feature as the initial letter .

  An example is as follows:

Part II: Validation of 6 Non-Word Characters

  This part will introduce the verification of non-word strings, such as English punctuation, Chinese punctuation, Chinese text verification, special character verification and password verification.

1. Verification of English punctuation marks

  That is, as shown below, which we have already seen above:

[-,.?:;'"!`]|(-{2})|(\.{3})|(\(\))|(\[\])|({})

 

2. Chinese punctuation verification

  There are also many Chinese punctuation marks and they are very similar to English punctuation marks. For example, (comma). (period) ? (question mark): (colon); (semicolon) '' (single quote) ! (exclamation mark) "" ​​(double quotation marks) - (hyphen) - (dash) ... (ellipsis) () (parentheses) [] (brackets) {} (braces), (colon) "" ( title number) and so on.

  So we can verify it with the following regular expression:

[,。?:;“” ‘’ !— …… 、]|(—{2})|(())|(【】)|({})|(《》)

where (-{2}) is used to match dashes

 

3. Chinese text verification

  The verification of Chinese text is relatively simple, we can use /w directly. As shown below:

    

  It should be noted that if there is no u (note the gray part), it will not match correctly. After I turn on unicode in http://www.regexpal.com/, it can be matched normally.

 

 

 

4. Special character verification  

  We consider characters other than numbers and letters to be special characters, specifically, _ = \ [ ] ; ' , . / ~ ! @ # $ % ^ & * ( ) + | ? > < " : { } .

  Use the following regular expression to validate a special character:

 [、_ =\ \\ [\ ] ; ' ,. / ~ ! @ # $ % ^ & * ( ) + | ? > < “ :{}]

  where \[ ] needs to be escaped by using \.

  Of course, if you need to verify a string composed of special strings with a length of at least 1, you can directly add + to the right of the above regular expression.

 

5. Password verification

  When the user logs in to the website, the user is generally required to enter the user name and password. At this time, as the front-end siege lion, we need to verify. Of course, in general, the more complex the password, the higher the security of user information.

  Password characters can include numbers, letters, and special characters. Therefore, a password can have only numbers, only letters, or only special characters, or it can have two or three of them at the same time. Obviously, the more types you have, the more secure it is to change the password.

A password authentication containing only numbers

  This form of authentication is obviously very simple and its security is very low. As follows:

\d+

  Such an expression can verify passwords with a number of digits greater than or equal to 1. Generally, we have specific requirements when entering a password, such as 6 to 12 digits, we can verify it in the following way.

\d{6,12}

  

B password authentication containing only letters

  This is also simple and equally less secure. We can verify passwords limited to 6-12 digits this way.

 

C Password authentication with only special characters

  It's equally simple, i.e.

 [、_=\\\[\];`',./~!@#$%^&*()+|?><":{}]{6,12}

  This regular expression validates passwords with 6 to 12 characters of pure special characters.

 

D Password authentication containing numbers and letters

  It seems simple here, but it's not, because we have to make sure we have both numbers and letters. My thoughts are as follows (if you think there is a more concise way, you can come up with it):

[\ da-zA-Z] * (\ d + [a-zA-Z] +) | ([a-zA-Z] + \ d +)) [\ da-zA-Z] *

  The middle group (the first group) ((\d+[a-zA-Z]+)|([a-zA-Z]+\d+)) means that the password must be followed by one or more numbers One or more letters or One or more letters followed by one or more numbers. In the form of 45dda or da564, if a password is 555 or daf, it will definitely not match. But this is not enough. If we also want to have letters on the left of 45dda and numbers on the right, we need to add [\da-zA-Z]* on both sides, where * means 0 or more.

 

E Password authentication containing numbers and special characters

  On the basis of D, the regular expression here is easy to write, just replace the validation of letters with the validation of special characters, as follows:

[\d、_=\\\[\];`',./~!@#$%^&*()+|?><":{}]*((\d+[、_=\\\[\];`',./~!@#$%^&*()+|?><":{}]+)|([、_=\\\[\];`',./~!@#$%^&*()+|?><":{}]+\d+))[\d、_=\\\[\];`',./~!@#$%^&*()+|?><":{}]*

 

F Password authentication containing letters and special characters

  On the basis of E, we can directly replace the numbers with letters, as shown below:

[a-zA-Z_=\\\[\];`',./~!@#$%^&*()+|?><":{}]*(((a-zA-Z)+[_=\\\[\];`',./~!@#$%^&*()+|?><":{}]+)|([_=\\\[\];`',./~!@#$%^&*()+|?><":{}]+(a-zA-Z)+))[a-zA-Z_=\\\[\];`',./~!@#$%^&*()+|?><":{}]*

  Why is the verification unsuccessful here? ? If there is a great god who understands, I hope to give guidance, thank you.

 

G Password verification containing letters, numbers and special characters

  If all three are included, then the security of the password is the highest at this time. So, how to implement such password authentication?

  The idea is the same, that is, letters, numbers, and strings need to be placed on the left and right sides in a character class at the same time, and the middle grouping needs to be achieved by 6 OR operations.

Part 3: File Name Validation

   Each file has its own name, which consists of two parts: the file name and the file extension. Based on this, we will introduce the verification of the specified file extension, the verification of the specified file name, the verification of the full name of the file containing the specified character string, and the verification of the full name of the file excluding blank characters at both ends.

1. Validation of the specified file extension 

  Suppose we validate the name of the file with the extension pdf . When validating, the filename is an arbitrary string of length at least 1. Methods as below:

      

  where .+ means match a string of length at least 1 that consists of non-newline characters. \. is for escaping .

2. Validation of the specified file name

  Suppose we validate the name of the file named javascript. When validating, the file extension is a word string of length at least 1. Methods as below:

      

 

3. Validation of the full name of the file containing the specified string

  This verification method is very common, that is, the full name of a file needs to contain a specified string, for example, a file must contain the string java. The verification method is as follows:

  

 

4. Exclude file full name verification with blank characters at both ends

  Earlier we did not consider the existence of whitespace characters at both ends of the file name. However, whitespace characters are not allowed at both ends of the file name. 

   (1) If you verify that a string does not start with a blank character, you can use (?!expression), that is, zero-width negative prediction look-ahead assertion. The assertion can specify that the expression expression cannot be matched after this position, so the verification is as follows:

    

   (2) If it is verified that a string is not terminated by a blank character, (?<!expression) can be used, that is, a zero-width negative lookback can be used to make an assertion, which asserts that the expression expression cannot be matched in front of its own position, so the verification is as follows:

^.+(?<! )\.\w+$

 

(3) Combining the above two cases, the verification that there is no space at both ends of the file name can be obtained as follows:

^(?! ).+(?<! )\.\w+$

Part 4: Email, HTTP/FTP Address Verification

  These four are common elements of the network, especially for email verification.

1. Email Verification

   We know that the common e-mails are 163, 126, QQ, Gmail, Yahoo, Hotmail, Sina, 139, TOM, 21CN, Sogou, 189, 188, Fortune Mail, Yeah, Sohu, Foxmail, etc., but their characteristics are very similar. , such as qq email address: [email protected] , Gmail email address [email protected] and so on. Of course, the above mailboxes are free mailboxes provided by large companies. Companies can also have their own business email addresses. Such as [email protected] and so on.

  To sum up, we can see that mailboxes are generally composed of names, characters @ and domain name suffixes.

  According to the characteristics of the mailbox, we can use the following regular expression to verify the email address:

^\w+@(\w+\.)+\w+

  The effect is as follows:

  Of course, the following method (idea) is also the same:

^\w+@\w+(\.\w+)+

  The above statement is wrong, here is the correction, it should be 

^\[w.]+@(\w+\.)+\w+

  Because the preceding name may also have a decimal point.

2. HTTP, FTP address verification

   HTTP addresses are generally strings starting with the string "http://" or "https:". It can be split by ./? & %=. Such as http://www.cnblogs.com, https://www.baidu.com/s?wd=Google&rsv_spt=1, http://cn.bing.com/search?q=javascript and so on. And the only difference between FTP and HTTP is that it starts with FTP instead of HTTP or HTTPS.

      So it can be seen that the verification expression is as follows:

((http|ftp|https)://)(([a-zA-Z0-9\._-]+\.[a-zA-Z]{2,6})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,4})*(/[a-zA-Z0-9\&%_\./-~-]*)?

       First, the string matched by the regular expression must start with http://, https://, ftp://;

       Secondly, the regular expression can match URL or IP address; (eg: http://www.baidu.com or http://192.168.1.1)

      Then the regular expression can match to the end of the URL, that is, it can match the sub URL; (if it can match: http://www.baidu.com/s?wd=a&rsv_spt=1&issp=1&rsv_bp=0&ie=utf-8&tn= baiduhome_pg&inputT=1236)

      Finally, the regular expression can match the port number;

Reprint address: http://www.cnblogs.com/zhuzhenwei918/p/6202932.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325860460&siteId=291194637