Is it really possible to use this regex to verify passwords?

problem background

Recently, I helped a colleague solve a regular problem, which is very interesting, and I would like to share it with you.

The background is that a colleague is working on a function related to user registration. Party A has put forward some requirements for the complexity of the password, which requires a length of 8-32 characters, lowercase letters, uppercase letters, numbers, and symbols. The following is the matching password The original regular expression of

(?=.*[0-9])(?=.*[A-Z])(?=.*[a-z])(?=.*[^a-zA-Z0-9]).{8,32}

Question verification

You can take a look at this regex, and feel that using this regex can meet the above password verification requirements? take a moment to think about it

The answer is: no

Let's do a simple test. It is recommended to find an online tool and actually test it

Enter the expression, we test the following paragraphs of text

  • abc123A: no match, not long enough
  • abcd12345: No match, not enough complexity, no uppercase and symbols
  • abcdA123-: It can be matched normally, the length and complexity meet

Doesn't this mean that this regex is OK? Let's test the following text

  • abcdA123-me: This also matches
  • abc he dA ★123: This also matches

problem improvement

There are two small problems in this regex

  • The symbol limit is too broad: (?=.*[^a-zA-Z0-9]) This judgment includes all other characters
  • There is no limit to the input text: .{8,32} This point also includes all characters except newlines

To improve the regularization for the above two problems, we must first enumerate the symbols we allow to appear, assuming that we only allow the two characters "-_" to appear

The improved regularization is as follows

^(?=.*[0-9])(?=.*[A-Z])(?=.*[a-z])(?=.*[-_])[0-9a-zA-Z-_]{8,32}$

On the one hand, this problem is due to the lack of test verification, but the deeper reason is that we don’t know enough about regularization, and we can’t quickly find the problem of regularization. In the next section, we will take a look at the more difficult parts of this regularization.

further reading

Some students may have doubts about what the front brackets are for.

Here is a review of a basic concept of regular expressions: zero-width assertions

A zero-width assertion is a zero-width match. The content it matches will not be saved in the matching result, and the final matching result is just a position.

(?=) is a zero-width assertion, called a forward lookahead assertion, whose main function is to match the content behind the expression

For example: \d(?=px), this regex can match 1 in 1px, but not 2 in 2pm, because 2 is not followed by px, if you want to match px at the same time, you have to write \d(?=px)px, this looks the same as \dpx, there is no difference when matching a single condition, but it is different when matching multiple conditions

Let's explain with an example

[a-z0-9]{4,10}

Such a regex can indicate that English lowercase and numbers appear 4-10 times, but it cannot be required to have English and numbers at the same time. It can be pure English or pure numbers

Regular is a sequential descriptive language. 1a and a1 are different. If you want to describe this combination of uncertain positions, it is more difficult for ordinary regular writing, and it basically becomes a permutation and combination.

At this time, we can use the positive lookahead assertion to help match. The forward lookahead assertion can be understood as a pattern match of a pair of strings

(?=.*[a-z])
表示后面必须出现一个英文小写,但是位置可以不定,使用.*来表达位置不定这个信息
同理
(?=.*[0-9])
表示后面必须出现一个数字,但是位置可以不定

[a-z0-9]{4,10}限定输入内容和长度

(?=.*[a-z])(?=.*[0-9])[a-z0-9]{4,10}
连接在一起就能表示英文小写和数字出现4-10次,且英文和数字都必须至少出现一次

Guess you like

Origin blog.csdn.net/cowcomic/article/details/129646785