No knowledge is terrible - MD5+Salt is insecure in the joke of applied cryptography

I saw an interesting article, so I reprinted it.
Original blog address: http://blog.sina.com.cn/s/blog_77e8d1350100wfc7.html

During this period of time, there were many news that broke the library, and there were many interesting things in it. Let's not talk about those jokes that use simple passwords or one password to go around the world, let's talk about something meaningful. (This article is for people in the IT world. If you don't understand it, I will prepare a simple "how to identify websites with poor password security" for you, which will be described in another article.)

After the explosion of the library, there was a lot of grief, and a lot of people were screaming and screaming, including me of course. But when I snorted for a while, I found that my G-spot was different from everyone's, so I calmed down and observed. It turns out that even most people in the IT industry know nothing about cryptography, and all kinds of jokes follow one after another. Of course, few can understand it.

Take for example the joke that MD5 is not safe.

In fact, I don't know why this is related to the explosion of the library, but after a few days, a bunch of people came up and said bitterly: MD5 is not safe, and some people still use it to hash passwords... Well, maybe it's a bit deep for some people, I'll make up for this knowledge later. We went on to say, "It's just ridiculous." Of course, that's not the case in the original, but it's pretty much the same.

The implication is that his knowledge tells him that MD5 is not safe, and the people who use it probably have no knowledge. In fact, the people who say this are also ignorant. Why? First of all, we must figure out what the so-called "unsafe" refers to?

Funny point 1: MD5 has been cracked!

In the 2004 International Cryptography Conference (Crypto'2004) Wang Xiaoyun proved that MD5 can be collided, so far, MD5 is no longer safe. Yes, it is indeed unsafe, but what exactly does that mean? Most people probably don't understand it at all, and they don't know which scene this insecurity is under.

To explain this problem, we must first understand what the concept of MD5 is. The so-called MD5 is actually a hash function, and the specific characteristics are as follows:
no matter how long or random the information is, it is finally converted into a fixed-length hash value;
for a large number of different information, the final hash value is average. Distribution;
for a specific piece of information, the final hash value is the same.

According to the above characteristics, people can usually draw the following conclusions:
irreversible (how can a fixed-length value represent information of any length);
difficult to collide (if the valid range of the hash value is from 0 to 9, then for the A known plaintext needs to try 115 times on average to find the same information, and the collision probability for any two random plaintexts is about 1/N, that is, 1/10. However, in general, the valid range of hash values ​​is 2 above the 64th power, that is, between 0 and 18,446,744,073,709,551,616, or even more, you can say that it is an astronomical number);
it can be represented (since it is irreversible and difficult to collide, you cannot guess the original information with the hash value , let alone It is very likely to fake a message whose hash value is exactly the same. So if you show a hash value, you can prove that you have some valid information, such as a password).

Well, here you can probably find that the difficult collision in the above conclusion seems wrong. Yes, the crack in 2004 proved that MD5 is not reliable in collision, that is, another message with the same hash value can be found quickly in some way. for example:

It is known that the original information is aaaaaaaaaa, and the hash value is 10;
through a certain method, an information aaaXaaaXaa can be quickly found, and the hash value is also 10.

So what effect will this have? This needs to be explained first, what hash functions can be used for:
signature verification, which proves that a certain piece of information has not been modified;
password verification, which proves that you do know a certain password;
others, such as hashing used in hash tables Process, etc. (This scenario is related to a certain type of scenario called (D)DOS attack, but it has nothing to do with privacy/hijacking-related security issues such as password security, so we will not discuss it).
Let’s talk about signature verification first. The so-called signature authentication is to give a piece of information A, then calculate H(A)=S, and record A and S at the same time. When you need to check whether the information A has been tampered with, you only need to calculate H(A)=S' and see if S' is equal to S, then you will know. In fact, the process is more complicated than this, and asymmetric encryption is required to ensure security. But in general, it can be seen that if I know A and S, I can quickly calculate an A', so that H(A')=S, then the signature authentication process is invalid, or it can be pretended not to be tampered with.

The so-called password verification is to give a password A, after the hash operation H(A)=S, after that, every time the user logs in, enter A', calculate H(A')=S', and see if S' is is S. If it is equal, it means that the user knows password A, otherwise it does not know. In this kind of application, if I know the correct password A, do I need to spend half a day to figure out an A' so that H(A')=S? Totally unnecessary.

In other words, the MD5 collision problem in 2004 has nothing to do with password authentication at all. Those who take this to say that MD5 is insecure in password applications, simply do not understand how the collision insecurity of MD5 is going on. The next time someone says this again, you can laugh at the other party, even if you don't know what I'm talking about, you just need to ask "what does collision mean, please enlighten me", and most of them will be dumb.

Funny point 2: There is already a large MD5 password (collision) library, with 7.8 trillion passwords!

Another thing to talk about is how large the MD5 password library is, for example, it contains 7.8 trillion passwords. But... do you know that there are 64 characters in English case + number + 2 punctuation, the length is 10 characters, how many different passwords will there be in total? The answer is 1,152,921,504,606,846,976, or 1,152,921.5 trillion. The password vault of 7.8 trillion passwords only occupies 6.7 millionths of this.

But why is everyone's password still leaking? That's because:
people have poor memory, so they always choose passwords that are easier to remember, that is, weak passwords;
people have poor memory, so they always choose a very limited number of passwords to use on an infinite number of websites;
no There are always so many people who have read the book, and always treat the security part of the system in a bad way, especially the password part.
Let's talk about weak passwords first, because you always tend to remember birthdays, names, and words, so your passwords will usually be:
4 pure numbers, a total of 10,000 different passwords;
6 pure numbers, a total of 1 million different passwords ;
Lowercase letters within 8 digits, and it is still some kind of pinyin or word, and it is estimated that there are no more than 10 million different passwords in total;
even if it is 8 lowercase letters plus numbers, that is 2.8 trillion different combinations.
Therefore, a password library of 7.8 trillion passwords is enough to cover the weak passwords of most users. However, the problem is not that simple. If the process of password storage and verification is correct, even if there is a password database of 78,000 trillion, hackers will not come up with your password - not impossible, but not interested.

Why? Then you have to figure out how the password is cracked. Assuming that the plaintext becomes P, the key is k, the encryption process is E(P, k), the resulting ciphertext is C, and the decryption process is D(C, k)=P. Then there are roughly the following methods of cracking:
brute force exhaustion: the most stupid and slowest method, let P'=0...X, find E(P', k)=C;
Algorithm analysis: study E(), find which , then P'=0...Y, find E(P', k)=C, Y
ciphertext analysis: According to C1, C2,...Cn, find the clues inside, and directly find the decryptable substitute function D'( ), or directly solve the partial plaintext P' of C;
known plaintext attack: selectively give plaintext P1, P2,...Pn, let the other party use E(p, k) to calculate C1, C2,...Cn, through Analysis to find k', so that D(C, k')=P;
birthday attack: selectively give plaintexts P1, P2,...Pn, and then directly use these plaintexts to try users U1, U2, ...Un, it happens that some users Ux is one of the plaintexts used. This is one of the key points to be discussed later. The so-called salting is to solve similar problems;
eavesdropping: listening to the link, it can be obtained directly when the user U gives P, or the user gives E(P,k )=C, then the same protocol can be used to give C next time to disguise user U. What QQ hacking Trojans belong to this form;
the whole pot: through backdoor loopholes, etc., directly get all the data and programs, and then carry out the above various analysis and attacks. This CSDN is the beginning, most of which are like this;
spies (find people, use various bribes, directly get E(), D() and k, and even all transplanted C and P, or go out with these things by yourself sell for money).
The above attack difficulty, time-consuming and cost are basically decreasing. Among them, this kind of attack on the whole pot can allow the attacker to choose a simpler and faster attack method, which depends on the custody strategy and protocol adopted by the password custodian. If done well, the best you can do is a known-plaintext attack, or a birthday attack on individual users. If you don't do it well, like the results of this outbreak, you will get the plaintext password directly.

Obviously, with brute force, the result would be over 1,152,921 trillion different password attempts per user, which is extremely time-consuming and unrealistic. So, in most cases, the birthday attack will be chosen, because most people will choose those passwords that are easier to remember, and these passwords are only a very limited part of the total passwords. That's why a library of 7.6 trillion passwords can turn most websites upside down. So how exactly is this attack carried out? Let me give an example:

For example, you may like to use 123456, then after MD5 hashing, you can get a hash value, such as qwerty. So when we got the database of a website, we found that there are quite a few users in it, whose password column is qwerty. This shows several problems:
the way the site's passwords are stored is not good, it is likely that C=E(P) is calculated and C is stored;
the passwords of these users are likely to be the same.
As long as you try to log in to one of the users with qwerty and find that you can't log in, you can come to the following conclusion:
the site does not use plaintext to save the password;
the site uses MD5;
the user's password is 123456.
The only thing left to do is to use the 7.6 trillion password database to compare the password columns of each user one by one.

Well, here comes the second MD5 insecurity joke: the above cracking process is basically the same for most hash functions. For example, SHA1, with the same password sample, can also make a 7.6 trillion password library, and then the next thing is the same as MD5. So when everyone is not safe to use this method, how come there is a saying that MD5 is not safe?

Summary: If someone tells you about a password library of 7.6 trillion passwords, you can laugh at him by "do you know what a birthday attack or a collision attack is?"

Funny point 3: MD5 is not safe to add salt!
Some people will say that MD5+salt (commonly known as adding salt) is not safe, because the MD5 operation is very fast. People who say this must not know what the problem that MD5+salt wants to avoid, or why MD5+salt is safe, or even what the salt of MD5+salt is and how to add it.

To understand these problems, you must first understand two basic knowledge:
the theoretical security of cryptography is based on the fact that even if you know all other information, including the specific algorithms of E() and D(), the entire encryption and decryption is protocol, and the method of saving ciphertext, and even all program source codes and databases, as long as you do not know what the key k is, you cannot get the plaintext P for the ciphertext C to be cracked, and even use any other plaintext P' to calculate Corresponding ciphertext C', you can't get what is the plaintext P corresponding to the ciphertext C to be cracked;
the application security of cryptography is based on the cost of cracking far beyond the benefits that can be obtained.
There is no such thing as k for the hash function used to store user passwords. Even if there is, it is saved on the server side, and it will be obtained when the pot is served. Therefore, in order to ensure the security of the theory, it must be required that the algorithm itself cannot obtain the plaintext P through the ciphertext C, and the hash algorithm itself can do it. Another thing to ensure is that through various methods of saving passwords and verification protocols that can be publicly known to everyone, it is not subject to various attacks including birthday attacks. Unfortunately, however, this guarantee cannot actually be achieved, or in the face of such an attack, the cracking of a single user can approximate the time required for brute force cracking. Here's an example on this issue:

It is said that any good machine can calculate the MD5 value of 7 million passwords per second. That is to say, the 7.6 trillion password database requires about 1 million seconds of operation, which is 11 and a half days. If you are targeted by the FBI, KGB, or NSA, they will definitely use machines that are 10,000 times more powerful against you. In other words, this time will be shortened to about 2 minutes.

Therefore, in fact, the focus of password preservation is application security. Using the above example, for example, if Hacker A buys a machine for 5,000, and the machine can be used for about 2 years. If he cracked your account in 11 and a half days, then the cost is roughly equal to 5,000 yuan / 2 years * 11 and a half days = 79 yuan. But if you get your account password, the profit you can get in it is only 50 cents, and this hacker A will lose money. No one will do a loss-making business, and even if it does, his loss will be greater than yours. Therefore, the focus of application security is to increase the threshold for cracking, including:
increasing the time required for the algorithm (such as the bcrypt that claims to hash at least 0.3 seconds once);
increasing the overall time to crack everyone, etc.
The former method is not impossible, but this method does not fundamentally solve the problem of birthday attacks. In other words, the time to generate a password library has increased from 11 and a half days for an ordinary machine to a year for a supercomputer cluster, but as long as this library is available, a birthday attack on the existing password hash value only takes 10 seconds, that's still a good deal. Because the cost of generating a password library is a sunk cost and has nothing to do with the number of users. For example, use this password vault to compare the passwords of a Swiss bank account. As long as the passwords of a few Saudi princes are in this password vault, you can buy hundreds of supercomputer clusters.

The latter turns the sunk cost of generating a password library into a marginal cost, that is, if a password library needs to be generated for each user, the cost of the entire cracking will increase sharply with the increase in the number of users. For example, it takes 11 and a half days to generate a user's password database. There are 10,000 users in total, and it will take 315 years to crack all of them. Even if the time to generate the password library is reduced to 10 minutes, it will take 70 days to fully crack it. How to do it? The method is to add salt, which is the joke "md5+salt is not safe".

How to add salt? The method is as follows:
for each user U, a random value Salt is generated, and it will remain unchanged in the future, and the salts of any two users cannot be the same. Then when the user sets the password, according to the plaintext password P, calculate MD5(P+Salt)=C. When logging in, the user also gives the plaintext password P', and the server also calculates MD5(P'+Salt)=C' after getting it to see if C' is equal to C. Let's see if this does the trick:

Suppose there are two users A and B, the passwords are both 123456, but the salts are aaaa and bbbb respectively, so MD5(123456+aaaa)=X8jv8o, and MD5(123456+bbbb)=8go489, which is no longer the standard qwerty. At this time: what
the hacker gets is: User A's salt is aaaa, and the hash value is X8jv8o. User B's salt is bbbb, and the hash value is 8go489;
first, the standard password library is invalid;
second, each user's hash value is different, and you can't figure out which ones are weak based on the same number of hash values. password;
again, the salt is aaaa and the hash value is X8jv8o, is it impossible to deduce whether the password is 123456, or abcdef, or something else, unlike in the case of simple MD5, where you see qwerty and know it's 123456;
so , the hacker is left with two options:
brute force cracking for each user; or
for each user's salt, such as aaaa, calculate MD5 (weak password plaintext + aaaa) = the hash corresponding to the salt aaaa according to the weak password plaintext database. value, and then use this password library to conduct a birthday attack on user A. For user B, the password database has to be regenerated according to the new salt bbbb...
Look, this is the role of adding salt.

Summary: If you're going to laugh at this type of person, you can question him "Do you know sunk costs and marginal costs", or "How to do birthday or collision attacks with salt", even if you don't understand what's written on it.

Well, in fact, there are some interesting branch jokes below this joke:

Laughing point 3.1: Add fixed salt.

To be honest, I didn’t understand what it means to add a fixed salt at first, but I didn’t realize it until I saw that each user was given a random salt. That means, use the same salt for each user, for example, user A uses the salt aaaa, user B also uses aaaa, ... user ZZZ still uses aaaa. If you understand what I said above, you probably know that salt is used to prevent a password collision library that can be used on all users. This "fixed salt" violates the above principles from the very beginning, and is not salt at all. Therefore, there is no such thing as "fixed salt" in the book of applied cryptography.

The idea of ​​a fixed salt probably stemmed from not letting the existing simple MD5 collision library work for you, and the salt would not leak. The salt is not something that only the user knows, not the key k. If your security depends on no one knowing what your "fixed salt" is, then it violates the principles of cryptography, and you don't need it. It's not safe to think about it in your head. Moreover, people can even get your password library, isn't salt still an easy task?

Also, I don't know what my algorithm is. Boss, someone can break into your system and log in and take away your entire database. Is it still a bad program? Tell you, the remaining question is just how much value is in your library. If it is a bank library, Taobao library, even a social library, the value in it (such as fishing) is enough to invite a "security expert" to review your code. Don't think of this security expert as too advanced, just find a technician with a little imagination, such as this idea:
I already have your program;
find the entry function Fuck() that generates the password hash value;
take A plaintext password library, constantly changing the password on a user account, that is, constantly Fuck(P) with each P;
well, the password collision library comes out.
The idea of ​​a fixed salt isn't the funniest yet, here's one:

Funny point 3.2: MD5, SHA1, ... These public algorithms are not safe, even if salt is added, the only thing that is really safe is to write an algorithm and then add salt.

I'm too lazy to say anything, what's the point of this statement other than exposing your ignorance? This shows that you don't understand:
what is MD5; what does
salting do;
one of the principles of cryptography is not to rely on your algorithm not to be disclosed;
what kind of algorithm is secure, and what is considered secure.
Are there any more elegant jokes? There are also:

Funny point 3.3: Use bcrypt, you can adjust the time required for the operation at will, which is ten million times slower than MD5, and can only calculate 3 passwords per second...

Well, that might work because the cost is really high. However:
simply using this algorithm without adding salt will only increase the sunk cost, the marginal cost will not change, and the birthday attack still exists;
1 second counts as 3 passwords, although the cost of cracking is greatly increased, But the running cost of your own system will also increase significantly. For example: it turns out that when using MD5, one server solves the user login problem with a load of just 100%. If you use this goddamn bcrypt algorithm, you need goddamn tens of thousands of servers to solve the problem.
In fact, it comes back to a simple question: how much is your library worth? If you're a bank, maybe it's worth it. But for most sites, doing so will only increase their own costs excessively, and adding salt to MD5 can make most hackers completely lose interest.

There are some similar claims, too. For example: add salt to MD5 several times. In fact, doing so is nothing more than adding some computational costs, which is no different from the bcrypt idea. (Of course, some people think that it is safe to do this several times because the algorithm is not standard and people don’t know it. Please refer to the joke 3.1.)

In fact, as long as you add salt in the true sense—each user is different and random, the rest of the methods are basically not to embroider your legs, or to show how to smash yourself in the foot. Of course, some really genius ideas cannot be ruled out, but most people's claims are not in this category.

Conclusion
If you have not read at least one book in a certain field, it is best not to speak casually, otherwise others can see what kind of youth you are at a glance;
MD5+salt is safe enough for most small and medium websites;
If you really can’t trust MD5, then use SHA1 and others. In the scenario of password verification, it is enough, unless you open a bank or Taobao;
identity verification is only one part of security. Think that these libraries are all How it was stolen, think about all kinds of QQ Trojans...
Most of the comments you see on the Internet are basically false statements imagined by people who are completely ignorant in this field, such as searching for "MD5 is not safe", Except for a few reports of copying legitimate websites, and the information in individual real safe websites, there are almost all kinds of jokes;
in terms of security, as long as you are not an expert, it is better that he does not exist. Like food safety, in terms of network security, in fact, most websites have done a very bad job in this area. If you know the real situation, you may not be able to use the things on the Internet.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326757938&siteId=291194637