MD5 algorithm principle and common implementation

definition

The full name of MD is Message-Digest, that is, message digest, so the algorithm of the MD family is also called the message digest algorithm. The
MD family has MD2, MD3, MD4, and MD5, and the generation is stronger than the generation.
Therefore, MD5 is the most commonly used encryption algorithm in the MD algorithm family.

Any information can be calculated by MD5 algorithm to generate a 16-byte (128-bit) hash value, but the 16-byte hash value cannot be used to obtain the information before encryption.
The 16 hash values ​​are usually represented by a hexadecimal string with a length of 32.
This is one of the most important features of MD5: encryption is irreversible.

MD5 features

The encryption is irreversible, that is, the original text cannot be obtained through the cipher text.
Immutability, that is, the same original text, the ciphertext obtained through the MD5 algorithm is always the same.
Hashability, that is, slight changes to the original text, can completely change the final ciphertext.

Common application scenarios

1. Verify the integrity of the file

If Zhang San sends a file to Li Si, how to confirm that the file is complete to Li Si?
Before Zhang San transfers the file, do an MD5 encryption on the file, and then pass the MD5 encrypted ciphertext to Li
Si and Li Si Upon receiving the file, the file is also MD5 encrypted. If the cipher text obtained is the same as the cipher text given by Zhang San, it means that the file is complete.

2. Store user password

In theory, user passwords cannot be stored directly in plain text in the database, because once the database is cracked, all user passwords are lost,
so you can do an MD5 encryption of the user password, and then store the cipher text in the database. When the
user logs in, you can MD5 encrypt the user's password, and then compare the cipher text with the cipher text in the database to determine whether the password filled in the user's front desk is correct.
This is just a way of thinking, generally not so simple, the general production environment will deal with the user password plus salt encryption, etc., the user information is more important, it requires more complicated calculation logic.

principle

The overall encryption process of MD5 is to define four values ​​first, and then use these four values ​​to calculate the original text information and get the new four values, and then calculate the original text and get the new four values. Value, so loop a certain number of times, and finally perform simple string concatenation of the last four values ​​to get the final ciphertext.
The main three steps are as follows:

1. Fill in the information

Use the length of the original text to find the remainder of 512. If the result is not 448, fill to 448 bits. Padding is to fill in the first bit with 1, followed by 0. 512-448 = 64, use the remaining 64 bits to record the length of the original text.
Finally get a filled message (total length = original text length + 512 bits)

2. Get the initial value

The four initial values, which are defined in advance by the MD5 algorithm, are four 32-bit values, which are exactly 128 bits.
We name it ABCD:
A = 0x01234567
B = 0x89ABCDEF
C = 0xFEDCBA98
D = 0x76543210

3. Real calculation

The calculation is divided into multiple cycles. Each cycle is filled with the information filled in the first step with ABCD and the original text. The calculation is performed, and finally a new ABCD is obtained. Finally, the last ABCD is spelled into a string, which is the final ciphertext.
The loop is first divided into a main loop, and each main loop is set with a sub-loop.
Number of main loops = length of original text / 512.
Number of sub-cycles = 64 times.

Let's see what is done in a single sub-loop:

The following is the real calculation logic of a single sub-loop (this implementation is taken from netizens): In the
Insert picture description here
figure, A, B, C, and D are four groups of hash values. Each cycle will make the old ABCD produce a new ABCD. How many cycles are there in total? Determined by the length of the processed text.

It is assumed that the length of the original text after processing is M
times of main loop = M / 512.
Each main loop contains 512/32 * 4 = 64 sub-loops.

The picture above shows the flow of a single sub-cycle.

The following explains the other elements in the figure one by one:

1. The green F
in the green F graph represents a nonlinear function. There are four functions used by the official MD5:

F(X, Y, Z) =(X&Y) | ((~X) & Z)
G(X, Y, Z) =(X&Z) | (Y & (~Z))
H(X, Y, Z) =X^Y^Z
I(X, Y, Z)=Y^(X|(~Z))

In the 64 sub-cycles below the main loop, F, G, H, I are used alternately, the first 16 uses F, the second 16 uses G, the third 16 uses H, and the fourth 16 uses I.

2. The word "Tian"
in red is very simple. The word "Tian" in red represents the meaning of addition.

3.
Mi Mi is the original text after the first step. In the first step, the length of the original text after processing is an integer multiple of 512. Each 512 bits of the original text is divided into 16 equal parts, named M0 ~ M15, each equal part length is 32. In 64 sub-cycles, one of M1 ~ M16 will be used alternately every 16 cycles.

4.Ki is
a constant. In 64 sub-cycles, the constant used each time is different.

5. Yellow <<
FF (a, b, c, d, Mj, s, ti) means a = b + ((a + F (b, c, d) + Mj + ti) <<< s)
<< <s means rotate left by s bits

第一轮
 a=FF(a,b,c,d,M0,7,0xd76aa478)
 b=FF(d,a,b,c,M1,12,0xe8c7b756)
 c=FF(c,d,a,b,M2,17,0x242070db)
 d=FF(b,c,d,a,M3,22,0xc1bdceee)
 a=FF(a,b,c,d,M4,7,0xf57c0faf)
 b=FF(d,a,b,c,M5,12,0x4787c62a)
 c=FF(c,d,a,b,M6,17,0xa8304613)
 d=FF(b,c,d,a,M7,22,0xfd469501)
 a=FF(a,b,c,d,M8,7,0x698098d8)
 b=FF(d,a,b,c,M9,12,0x8b44f7af)
 c=FF(c,d,a,b,M10,17,0xffff5bb1)
 d=FF(b,c,d,a,M11,22,0x895cd7be)
 a=FF(a,b,c,d,M12,7,0x6b901122)
 b=FF(d,a,b,c,M13,12,0xfd987193)
 c=FF(c,d,a,b,M14,17,0xa679438e)
 d=FF(b,c,d,a,M15,22,0x49b40821)
 
第二轮
 a=GG(a,b,c,d,M1,5,0xf61e2562)
 b=GG(d,a,b,c,M6,9,0xc040b340)
 c=GG(c,d,a,b,M11,14,0x265e5a51)
 d=GG(b,c,d,a,M0,20,0xe9b6c7aa)
 a=GG(a,b,c,d,M5,5,0xd62f105d)
 b=GG(d,a,b,c,M10,9,0x02441453)
 c=GG(c,d,a,b,M15,14,0xd8a1e681)
 d=GG(b,c,d,a,M4,20,0xe7d3fbc8)
 a=GG(a,b,c,d,M9,5,0x21e1cde6)
 b=GG(d,a,b,c,M14,9,0xc33707d6)
 c=GG(c,d,a,b,M3,14,0xf4d50d87)
 d=GG(b,c,d,a,M8,20,0x455a14ed)
 a=GG(a,b,c,d,M13,5,0xa9e3e905)
 b=GG(d,a,b,c,M2,9,0xfcefa3f8)
 c=GG(c,d,a,b,M7,14,0x676f02d9)
 d=GG(b,c,d,a,M12,20,0x8d2a4c8a)
 
第三轮
 a=HH(a,b,c,d,M5,4,0xfffa3942)
 b=HH(d,a,b,c,M8,11,0x8771f681)
 c=HH(c,d,a,b,M11,16,0x6d9d6122)
 d=HH(b,c,d,a,M14,23,0xfde5380c)
 a=HH(a,b,c,d,M1,4,0xa4beea44)
 b=HH(d,a,b,c,M4,11,0x4bdecfa9)
 c=HH(c,d,a,b,M7,16,0xf6bb4b60)
 d=HH(b,c,d,a,M10,23,0xbebfbc70)
 a=HH(a,b,c,d,M13,4,0x289b7ec6)
 b=HH(d,a,b,c,M0,11,0xeaa127fa)
 c=HH(c,d,a,b,M3,16,0xd4ef3085)
 d=HH(b,c,d,a,M6,23,0x04881d05)
 a=HH(a,b,c,d,M9,4,0xd9d4d039)
 b=HH(d,a,b,c,M12,11,0xe6db99e5)
 c=HH(c,d,a,b,M15,16,0x1fa27cf8)
 d=HH(b,c,d,a,M2,23,0xc4ac5665)
 
第四轮
 a=II(a,b,c,d,M0,6,0xf4292244)
 b=II(d,a,b,c,M7,10,0x432aff97)
 c=II(c,d,a,b,M14,15,0xab9423a7)
 d=II(b,c,d,a,M5,21,0xfc93a039)
 a=II(a,b,c,d,M12,6,0x655b59c3)
 b=II(d,a,b,c,M3,10,0x8f0ccc92)
 c=II(c,d,a,b,M10,15,0xffeff47d)
 d=II(b,c,d,a,M1,21,0x85845dd1)
 a=II(a,b,c,d,M8,6,0x6fa87e4f)
 b=II(d,a,b,c,M15,10,0xfe2ce6e0)
 c=II(c,d,a,b,M6,15,0xa3014314)
 d=II(b,c,d,a,M13,21,0x4e0811a1)
 a=II(a,b,c,d,M4,6,0xf7537e82)
 b=II(d,a,b,c,M11,10,0xbd3af235)
 c=II(c,d,a,b,M2,15,0x2ad7d2bb)
 d=II(b,c,d,a,M9,21,0xeb86d391)

Why MD5 is irreversible

The reason why MD5 is irreversible, in principle, the
first is that he uses a hash function, namely the above FGHI function.
The second is that he uses a lot of shift operations in it, that is <<<, these are irreversible. For
example, there are 10110011, we shift left three places, and become 10011000. 0 is replaced, then it is absolutely impossible to use 10011000 and then reverse to get 10110011.

Java implementation and use

public class MD5Util {
    public static void main(String[] args) throws IOException {
        System.out.println(encodeString("123"));
    }

    public static String encodeString(String plainText) throws UnsupportedEncodingException {
        return encodeBytes(plainText.getBytes("UTF-8"));
    }

    public static String encodeBytes(byte[] bytes) {
        try {
            MessageDigest md = MessageDigest.getInstance("MD5");
            md.update(bytes);
            byte b[] = md.digest();

            int i;

            StringBuffer buf = new StringBuffer("");
            for (int offset = 0; offset < b.length; offset++) {
                i = b[offset];
                if (i < 0) {
                    i += 256;
                }
                if (i < 16) {
                    buf.append("0");
                }
                buf.append(Integer.toHexString(i));
            }
            return buf.toString();

        } catch (Exception e) {
            e.printStackTrace();
        }
        return "";
    }

}
Published 203 original articles · praised 186 · 210,000 views

Guess you like

Origin blog.csdn.net/java_zhangshuai/article/details/105568316