Cryptography Series Five: MD5, SHA1 - One article to understand the hash function

1. Basic concepts

1.1 The concept of Hash function

Hash function, also known as hash function/hash function, hash function, is an irreversible mapping from message space to image space, which can transform an input of "arbitrary" length to obtain a fixed-length output . It is a one-way cryptographic system, that is, there is only an encryption process and no decryption process.

The one-way and fixed output length of the Hash function make it possible to generate the "Digital Fingerprint" (Digital Fingerprint) of the message, also known as the message digest (MD, Message Digest) or hash value/hash value (Hash Value) , It is mainly used in message authentication , digital signature , secure transmission and storage of passwords , file integrity verification , etc.

The generation process of the hash value is expressed as: h = H ( M ) h=H(M)h=H ( M ) , where

  • M M M is a message of any length
  • H H H is a hash (Hash) function or hash function, hash function
  • h h h is a fixed-length hash value

1.2 Properties of the Hash function

  • (1) The input message is of any finite length, and the output hash value is of fixed length.
  • (2) Easy to calculate: for any given message MMM , easy to calculate its hash valueh = H ( M ) h=H(M)h=H(M)
  • (3) One-way: also known as Preimage Resistance, for any given hash value hhh , findH ( M ) = h H(M)=hH(M)=h 's messageMMM is computationally infeasible.
  • (4) Weak collision resistance: also known as second preimage resistance (Second Preimage Resistance), for any given message MMM , find the satisfyingM ≠ M ′ M \neq M^{'}M=M andH ( M ) = H ( M ′ ) H(M)=H(M')H(M)=H(M )messageM ′ M'M is computationally infeasible.
  • (5) Strong collision resistance: find any object that satisfies H ( M ) = H ( M ′ ) H(M)=H(M')H(M)=H(M )pair( M , M ′ ) (M,M')(M,M )is computationally infeasible.

In addition, the Hash function should have an avalanche effect, that is, when the input bit of the message changes, at least half of the output hash value changes.

1.3 Structure of Hash function

The general structure of Hash function is called iterative Hash function structure, which was independently proposed by Merkle and amgảrd respectively. The Hash function divides the input message into LLL fixed-length packets, each packet length isbbb bits, the last packet contains the total length of the input message, if the last packet is less thanbbWhen the b bit, it needs to be filled withbbb bit.

The hash algorithm iteratively uses a compression function fff , the compression functionfff is the core of the hash algorithm, it has two inputs: one is the bit output of the previous iteration, called the link variable; the other is thebbb bits grouped, and produce an ( n < b ) n(n<b)n(n<b ) bit output. The link variable input in the first iteration is also called the initial value variable, which is specified by the algorithm at the beginning, and the output of the last iteration is the hash value.

Please add a picture description

2. MD5 algorithm

The MD5 algorithm was designed by Rivest, a famous cryptographer at the Massachusetts Institute of Technology. He made a detailed elaboration on MD5 in RFC1321 submitted to the IETF in 1992. MD5 is developed on the basis of MD2, MD3, and MD4. Since Safety-Belts are added to MD4, MD5 is also called "MD4 with safety belts".

2.1 Algorithm structure

The input of the MD5 algorithm is that the maximum length is less than 2 64 2^{64}2For 64 bit messages, the input message is processed in units of 512 bit packets, and the output is128 bit 128bit128 bit message digest.

insert image description here
Input message length is NNN Y i ( i = 0 , 1 , . . . , L − 1 ) Y_i(i=0,1,...,L-1) Yi(i=0,1,...,L1 ) for message grouping, whereLLL is the number of groups after message expansion

IV IVIV represents the initial link variable, consisting of four 32-bit registers A, B, C, and D

C V i CV_i CViis the link variable, which is the output of each packet processing unit and the input of the next packet processing unit

CVN CV_NCVNis the output of the last unit, the hash value of the message

(1) Additional padding bits

fill a "1" "1""1" and several"0" "0""0" makes its length modulo512 512512 and448 448448 congruence, and then convert the real length of the message to64 64The 64 bit representation is appended to the padding result so that the message length is exactly512 512Integer multiple of 512 bit, namely512 × L 512\times L512×L bit。

(2) Packet processing (iterative compression)

The packet processing (compression function) of the MD5 algorithm consists of 4 rounds, and the 512bit message packet M i M_iMiIt is equally divided into 16 sub-groups (each sub-group 32bit) to participate in each 16-step function operation. The input of each step is four 32bit link variables and a 32bit message subpacket, and the output is a 32bit value. After 4 rounds and a total of 64 steps, the obtained 4 register values ​​are respectively input into the link variables for modulo addition, which is the intermediate hash value of the current message.

insert image description here

2.2 Compression function

The step function of MD5, that is, the compression function, first takes the vector ( A , B , C , D ) (A,B,C,D)(A,B,C,The last three in D ) perform a non-linear function operation, and then add the result to the first variable,M [ j ] M[j]M[j] T [ i ] T[i] T [ i ] , then circularly shift the result to the leftby sss bits, and add( A , B , C , D ) (A,B,C,D)(A,B,C,D ) The second variableBBB , and finally assign the new value to the first variable in the vector.

The detailed process is as follows, where M [ j ] M[j]M [ j ] is the message groupingM i M_iMiThe jth ( 0 ≤ j ≤ 15 ) j(0≤j≤15)j(0j15 ) 32bit subgroups

insert image description here

(1) Pseudo-random constant

T [ i ] = ⌊ 2 32 × a b s ( sin ⁡ ( i ) ) ⌋ T[i] =\lfloor 2^{32} \times abs (\sin(i))\rfloor T[i]=232×abs(sin(i))⌋ i i i is radian,1 ≤ i ≤ 64 1\leq i \leq 641i64 ) are used to eliminate the regularity of the input data. For example: T [ 28 ] = ⌊ 4294967296 × abs ( sin ⁡ ( 28 ) ) ⌋ ≈ ⌊ 1163531501.0793967247 ⌋ = 1163531501 T[28]= \lfloor 4294 967 296\times\r2appro(\s) lfloor 1163531501.0793967247 \rfloor = 1163531501T[28]=4294967296×abs(sin(28))⌋1163531501.0793967247=1163531501

Then call 1163531501 11635315011163531501 is converted to hexadecimal455 A 14 ED 455A14ED455A14ED

(2) Cycle left

** < < < s <<<s <<<s ** means circular left shiftsss bit, a total of 16 constant values:
round 1 : 7 , 12 , 17 , 22 round 2 : 5 , 9 , 14 , 20 round 3 : 4 , 11 , 16 , 23 round 4 : 6 , 10 , 15 , 21 round 1: 7, 12, 17, 22 \\ round 2: 5, 9, 14, 20 \\ round 3: 4, 11, 16, 23 \\ round 4: 6, 10, 15, 21round1:7,12,17,22round2:5,9,14,20round3:4,11,16,23round4:6,10,15,21

(3) Nonlinear function

The 4 rounds of MD5 use 4 different non-linear functions (16 steps in each round use the same function): F , G , H , IF, G, H, IF , G , H , I are defined as follows:

第一轮: F ( x , y , z ) = ( x ∧ y ) ∨ ( ¬ x ∧ z ) F(x,y,z)=(x\wedge y)\lor (\lnot x\land z) F(x,y,z)=(xy)(¬xz)第二轮: G ( x , y , z ) = ( x ∧ z ) ∨ ( y ∧ ¬ z ) G(x,y,z)=(x\land z)\lor (y\land \lnot z) G(x,y,z)=(xz)(y¬z)第三轮: H ( x , y , z ) = x ⊕ y ⊕ z H(x,y,z)=x\oplus y \oplus z H(x,y,z)=xyz第四轮: I ( x , y , z ) = y ⊕ ( x ∨ ¬ z ) I(x,y,z)=y\oplus (x\lor \lnot z) I(x,y,z)=y(x¬z)

where x , y and zx, y and zx , y and z are three 32bit input variables, and the output is a 32bit variable;∧ , ∧ , ¬ , ⊕ \wedge, \land, \lnot, \oplus, ¬ , represent logical operations of AND, OR, NOT, and XOR respectively.

As in the first round, FF ( a , b , c , d , M [ j ] , s , T [ i ] ) FF(a,b,c,d,M[j],s,T[i])FF(a,b,c,d,M[j],s,T[i]) 表示: a = b + ( ( a + ( F ( b , c , d ) + M [ j ] + T [ i ] ) < < < s ) a=b+((a+(F(b,c,d)+M[j]+T[i])<<<s) a=b+((a+(F(b,c,d)+M[j]+T[i])<<<s ) where,0 ≤ j ≤ 15 , 1 ≤ i ≤ 64 0≤j≤15, 1\leq i \leq 640j15,1i64 , 16 steps are as follows:

FF(A,B,C,D,M[0],7,T[1]) FF(D,A,B,C,M[1],12,T[2]) FF(C,D,A,B,M[2],17,T[3]) FF(B,C,D,A,M[3],22,T[4])
FF(A,B,C,D,M[4],7,T[5]) FF(D,A,B,C,M[5],12,T[6]) FF(C,D,A,B,M[6],17,T[7]) FF(B,C,D,A,M[7],22,T[8])
FF(A,B,C,D,M[8],7,T[9]) FF(D,A,B,C,M[9],12,T[10]) FF(C,D,A,B,M[10],17,T[11]) FF(B,C,D,A,M[11],22,T[12])
FF(A,B,C,D,M[12],7,T[13]) FF(D,A,B,C,M[13],12,T[14]) FF(C,D,A,B,M[14],17,T[15]) FF(B,C,D,A,M[15],22,T[16])

After the last step of round 4 is completed, perform the following calculation: A ≡ ( A + AA ) mod 2 32 , B ≡ ( B + BB ) mod 2 32 A \equiv (A+AA)\bmod 2^{32}, B \equiv (B+BB)\bmod 2^{32}A(A+AA)mod232B(B+BB)mod232 C ≡ ( C + C C )   m o d   2 32 , D ≡ ( D + D D )   m o d   2 32 C \equiv (C+CC)\bmod 2^{32} ,D \equiv (D+DD)\bmod 2^{32} C(C+CC)mod232D(D+DD)mod232 AfterwardsA, B, C, DA, B, C, DThe value of A , B , C , D is used as the initial value of the next iteration until the output of the last message grouping( A ∣ ∣ B ∣ ∣ C ∣ ∣ D ) (A||B||C||D)( A ∣∣ B ∣∣ C ∣∣ D ) is the 128bit message hash value.

3. SHA1 algorithm

In 1993, the National Institute of Standards and Technology NIST announced the Secure Hash Algorithm SHA0 (Secure Hash Algorithm) standard. On April 17, 1995, the revised version was called SHA-1, which is an algorithm required in the digital signature standard. .

In 2002, NIST released FIPS 180-2 on the basis of FIPS 180-1. In addition to SHA1, three new hash algorithm standards, SHA256, SHA384 and SHA512, were added to this standard. Their message digest lengths are 256 bit, 384 bit, and 512 bit, respectively, in order to match the use of AES.

The difference between the relevant attributes of the four Hash algorithms (unit: bit):

SHA1 SHA256 SHA384 SHA512
message digest length 160 256 384 512
message length < 2 64 <2^{64} <264 < 2 64 <2^{64}<264 < 2 128 <2^{128} <2128 < 2 128 <2^{128}<2128
packet length 512 512 1024 1024
word length 32 32 64 64
Step count 80 64 80 80

3.1 Algorithm structure

The input of the SHA1 algorithm is that the maximum length is less than 2 64 2^{64}2For 64 bit messages, the input message is processed in units of 512 bit packets, and the output is160 160160 bit message digest, so it is more resistant to exhaustion.

The design of SHA-1 is based on MD4. It has 5 32-bit registers involved in the operation. The message grouping and filling methods are the same as MD5. The main cycle is also 4 rounds, but each round performs 20 operations, nonlinear operations, shifts and The addition operation is also similar to MD5, but there are some differences in the design of nonlinear functions, addition constants, and circular left shift operations.

(1) Additional padding bits

fill a "1" "1""1" and several"0" "0""0" makes its length modulo512 512512 and448 448448 congruence, and then convert the real length of the message to64 64The 64 bit representation is appended to the padding result so that the message length is exactly512 512Integer multiple of 512 bit, namely512 × L 512\times L512×L bit。

(2) Packet processing (iterative compression)

SHA1 processes messages in units of 512 bits. The core of the algorithm is a module containing 4 loops. Each loop consists of 20 steps. Each loop uses the same step function, and the step functions in different loops contain different non- Linear functions (Ch, Parity, Maj, Parity).

The input of each step function is different, except registers A, B, C, DA, B, C, DA , B , C , D andEEIn addition to E , there is an additional constantKKK , W [ t ] W[t]related to message groupingW [ t ] , wheret ( 0 ≤ t ≤ 79 ) t(0 \leq t \leq 79)t(0t79 ) is the number of steps.

insert image description here
Each cycle starts with the currently processed 512 512512 bitsY q Y_qYqand 160 160160 bit buffer valueA, B, C, DA, B, C, DA , B , C , D andEEE is the input, and then updates the cached content. The input mode of the last step2 32 2^{32}232 plus the inputCV q of the first cycle CV_qCVqGenerate CV q + 1 CV_{q+1}CVq+1. All 512 512After the 512 bit data block is processed, output 160 160160 bit Hash value.

3.2 Compression function

The step function of SHA1, that is, the form of each cycle of the compression function is as follows, where t ( 0 ≤ t ≤ 79 ) t(0 \leq t \leq 79)t(0t79 ) is the number of steps.

A = ( R O T L 5 ( A ) + f t ( B , C , D ) + E + W t + K t )   m o d   2 32 A=(ROTL^5(A)+f_t(B,C,D)+E+W_t+K_t)\bmod 2^{32} A=( ROT L5(A)+ft(B,C,D)+E+Wt+Kt)mod232 B = A B=A B=A C = R O T L 30 ( B )   m o d   2 32 C=ROTL^{30}(B) \bmod 2^{32} C=ROT L30(B)mod232 D = C D=C D=C E = D E=D E=D
insert image description here
(1) ConstantK t K_tKt

K's 4 4The 4 values ​​are2, 3, 5 2, 3, 52 , 3 , 5 and10 10Square root of 10 , then multiplied by2 30 2^{30}230 =1073741824, finally take the hexadecimal of the integer part of the result.

steps ttt K t K_t Ktvalue
0 ≤ t ≤ 19 0\leq t \leq 19 0t19 0 x 5 A 827999 0x5A827999 0 x 5 A 827999
20 ≤ t ≤ 39 20\leq t \leq 39 20t39 0 x 6 E D 9 E B A 1 0x6ED9EBA1 0 x 6 E D 9 EB A 1
40 ≤ t ≤ 59 40\leq t \leq 59 40t59 0 x 8 F 1 B B C D C 0x8F1BBCDC 0 x 8 F 1 BBC D C
60 ≤ t ≤ 79 60\leq t \leq 79 60t79 0 x C A 62 C 1 D 6 0xCA62C1D6 0xCA62C1D6

To calculate K t ( 60 ≤ t ≤ 79 ) K_t(60\leq t \leq 79)Kt(60t79 ) as an example, ⌊ 10 × 2 30 ⌋ = 3395469782 \lfloor \sqrt{10}\times 2^{30} \rfloor = 339546978210 ×230=3395469782

Then 3395469782 33954697823395469782 converted to hexadecimalCA 62 C 1 D 6 CA62C1D6CA62C1D6

(2) Cycle left

R O T L n ( x ) = ( x < < n ) ROTL^n(x) = (x<<n) ROT Ln(x)=(x<<n ) represents the variable xxof 32bitx cycle leftnnn bit。

(3) Generate word W t W_tWt

32bit word W t W_tWtDerived from the 512bit message packet, W t W_t in the first 16 steps of processingWtThe value is equal to the corresponding word in the message packet:

W t = M t i , 0 ≤ t ≤ 15 W_t=M^{i}_t, 0\leq t \leq 15 Wt=Mti,0t15

In the remaining 64 steps of operation, its value is obtained by XORing each other of the previous 4 values ​​and then circularly shifting:

W t = R O T L 1 ( W t − 3 ⊕ W t − 8 ⊕ W t − 14 ⊕ W t − 16 ) 16 ≤ t ≤ 79 W_t=ROTL^1(W_{t-3}\oplus W_{t-8} \oplus W_{t-14} \oplus W_{t-16}) 16\leq t \leq 79 Wt=ROT L1(Wt3Wt8Wt14Wt16)16t79

The above operations increase the redundancy and interdependence of the compressed packets, so it will be very difficult to find messages with the same compression result for messages in the same packet.

insert image description here

4. Hash function attack

The security of the hash function is mainly reflected in its good one-way and effective avoidance of collisions. Since the hash transformation is a kind of message contraction transformation, when the length difference between the message and the hash value is large, it is difficult to provide enough information for recovering the message only by knowing the hash value, so it is difficult to restore the message only by the hash value, Greater than the difficulty of a ciphertext-only attack on a block cipher of the same block length.

The main goal of the Hash function attack is not to restore the original message, but to forge and deceive with illegal messages with the same hash value , which requires that the hash function must resist collision attacks .

The length of the output is 128 128128 bit hash function, which can satisfyH ( M ) = H ( M ′ ) H(M)=H(M')H(M)=H(M )is2 128 2^{128}2128

Then, satisfy H ( M ) ≠ H ( M ) H(M)≠H(M)H(M)=The probability of H ( M ) is: 1 − 2 128 1-2^{128}12128 trykkk arbitrary messages and none of them satisfiesH ( M ) = H ( M ′ ) H(M)=H(M')H(M)=H(M )is( 1 − 2 − 128 ) k (1-2^{-128})^k(12128)k has at least oneM' M'M SatisfyingH ( M ) = H ( M ′ ) H(M)= H(M')H(M)=H(M )is1 − ( 1 − 2 − 128 ) k 1-(1-2^{-128})^k1(12128)k

According to the binomial theorem, the attacker must try at least 2 127 2^{127}2127 messages, the probability of successful forgery can exceed0.5 0.50.5 , the existing computing power is still difficult to achieve2 127 2^{127}2Exhaustive search is performed in the space of 127 , it can be seen that the output length is 128 128A 128 bit hash function seems to be safe. But in fact, the attacker can achieve collision through other attack methods, such as birthday attack.

The current attack method for the output length is 160 160The hash function above 160 bit is still not feasible to calculate, it is generally considered that 160 160Hash functions above 160 bits are safe.

4.1 The birthday paradox

Birthday paradox problem : Assume that everyone's birthday is equally probable, 365 days a year, if kkThe probability that at least two of k individuals have the same birthday is greater than1/2 1/21/2 , minimumkkWhat is the value of k ?

Think of everyone's birthday as [1,365][1,365][1,365 ] random variable,kkThe probability that the birthdays of k individuals do not repeat: pk = p 365 k 36 5 k = 365 × 364 × … ( 365 − k + 1 ) 36 5 k p_k=\frac{p^{k}_{365}}{365 ^k}=\frac{365\times 364\times...(365-k+1)}{365^k}pk=365kp365k=365k365×364×(365k+1)when k = 23 k=23k=At 23 o'clock,pk ≈ 0.4927 p_k\approx 0.4927pk0.4927 , thus23 23The probability that at least one of the 23 birthdays is repeated is1 − pk ≈ 0.5073 1-p_k\approx 0.50731pk0.5073

When k = 100 k=100k=When 100 ,1 − pk ≈ 0.9999997 1-p_k \approx 0.99999971pk0.9999997 , which is100 100The probability that 100 people's birthdays have at least one repetition is basically an inevitable event probability. This result is not consistent with people's intuition. This is the Birthday Paradox (Birthday Paradox).

Actually from kkOne person is drawn from k people, the probability of this person having the same birthday as others is only1 365 \frac{1}{365}3651. But if you just find two people with the same birthday (that is, without specifying a specific date), the probability of being in the same range is much greater.

For output length 128 128The 128- bit hash function seeks collisions, similar to the above situation. The probability of finding another message with the same hash value as a particular message is very small. But it is much easier to find two messages with the same hash value in two sets of messages (that is, without specifying the hash value) .

4.2 Set intersection problem

两个k元集合 X = x 1 , x 2 , … , x k , Y = y 1 , y 2 , … , y k X={x_1,x_2,…,x_k},Y={y_1,y_2,…,y_k} X=x1,x2,,xk,Y=y1,y2,,yk,if xi , yi , 1 ≤ i , j ≤ k x_i,y_i,1 \leq i,j \le kxi,yi,1i,jk is( 1 , 2 , … , n ) (1,2,…,n)(1,2,,A uniformly distributed random variable on n ) .

deal xi x_ixi,若 y j = x i y_j=x_i yj=xi, then call yj y_jyjgive xi x_iximatch. fix i , ji,ji,jyj y_jyjgive xi x_ixiThe probability of matching is 1 n \frac{1}{n}n1

y j ≠ x i y_j \neq x_i yj=xiThe probability of is: 1 − 1 n 1-\frac{1}{n}1n1YYAllkk in Yk random variables are not equal toxi x_ixiThe probability of is: ( 1 − 1 n ) k (1-\frac{1}{n})^k(1n1)k X, YX,YX,kkin YThe k random variables are different from each other, thenXXXYYThe probability that there is no match in Y is: ( 1 − 1 n ) k 2 (1-\frac{1}{n})^{k^2}(1n1)k2 Therefore,XXXYYProbability of at least one match in Y : p = 1 − ( 1 − 1 n ) k 2 p=1-(1-\frac{1}{n})^{k^2}p=1(1n1)k2x0 x \ge 0x0 , there must be( 1 − x ) ≤ e − x (1-x) \le e^{-x}(1x)ex , so:p = 1 − ( 1 − 1 n ) k 2 > 1 − ( e 1 n ) k 2 p=1-(1-\frac{1}{n})^{k^2} >1-(e^{\frac{1}{n}})^{k^2}p=1(1n1)k2>1(en1)k2 If you wantp > 0.5 p>0.5p>0.5,令1 − ( e 1 n ) k 2 = 0.5 1-(e^{\frac{1}{n}})^{k^2}=0.51(en1)k2=0.5 , it can be obtained: k = n ln ⁡ 2 ≈ 0.83 n ≈ nk=\sqrt{n\ln 2} \approx 0.83 \sqrt{n} \approx \sqrt{n}k=nln2 0.83n n

4.3 Birthday attack

Suppose the hash function HHH output length ismmm , all possible outputs are2 m 2^m2m pieces, receivekkk random inputs produceXXX , receive anotherkkk random inputs yieldYYY

According to the "intersection of two sets" problem, when k = 2 m / 2 k=2^{m/2}k=2m /2 ,XXXYYThe probability that Y has at least one pair of matches (that is, the hash function produces a collision) is greater than0.5 0.50.5 . Therefore,2 m / 2 2^{m/2}2m /2 will determine the output length inmmThe hash function HHof mH is the strength against collision.

Birthday attacks are also known as square root attacks. The principle is as follows:

  • The attacker first generates a legitimate message, and changes the writing or format by adding spaces or other means (keep the meaning unchanged) to generate 2 m / 2 2^{m/2}2m /2 different message variants, i.e. a legal message group is produced.

  • The attacker then generates an illegal message group to forge the signature

  • Generate hash values ​​for the above two groups of messages respectively

  • Find a pair of messages with the same hash value in two sets of messages. If not found, increase the number of deformations of each group of messages until found.

According to the birthday paradox, the probability of success is very high, so that the attacker can find an illegal message with the same hash value as the legitimate message, that is, find a hash collision.

At present, the most effective attack method for Hash function attack is the modular difference method, also known as the "bit tracking method", which was first proposed by Wang Xiaoyun and others when analyzing the MD4 series hash functions. The modulo difference method is a new difference defined by combining the integer modulo difference and the XOR difference. Compared with a single difference, the combination of the two differences can express more information.

5. Message authentication

On the one hand, information security must realize the confidential transmission of messages, so that it can resist passive attacks, such as eavesdropping attacks; on the other hand, it must also prevent attackers from actively attacking the system, such as forging or tampering with messages.

Authentication (Authentication) is the main method against active attacks, which can be divided into two types: entity authentication and message authentication:

  • Entity Authentication : Verifying the identity of an entity
  • Message Authentication : Verifying the authenticity of a message
    • Verify the authenticity of the source of information , generally known as information source authentication
    • Verify the integrity of the message , that is, verify that the message has not been tampered with, forged, etc. during transmission and storage

5.1 Message Authentication Code

The basis of message authentication is to generate a message authentication code (MAC, Message Authentication Code) , which is used to check whether the message has been maliciously modified.

The authentication code is different from the error detection code in communication:

  • Error detection codes are special codes used to detect errors in messages due to communication defects
  • Authentication codes are used to prevent attackers from maliciously tampering or forging messages

The message authentication code uses the message and the key shared by both parties to generate a fixed-length short data block through the authentication function , and appends the data block to the message.

5.2 HMAC

Cipher block chaining mode (CBC) using symmetric block cipher systems such as DES and AES has always been the most common method for constructing MAC, such as CBC-MAC defined in FIPS PUB 113.

Since the execution speed of Hash function software such as MD5 and SHA-1 is faster than that of symmetric block cipher algorithms such as DES, many message authentication algorithms based on Hash functions have been proposed at present. Among them, HMAC (RFC 2014) has been published as a FIPS 198 standard and is used in SSL for message authentication.

The HMAC structure is as follows:

insert image description here

in,

  • K K K represents the key, the length of the key can be any length, the minimum recommended length isnnn bit, because less thannnn bit will significantly reduce the security of the function, greater thannnn bit also does not increase security
  • M M M indicates the message input of HMAC
  • LLL means messageMMnumber of groups in M
  • Y i Y_i YiIndicates message MMM 'siii group
  • b b b represents the number of bits contained in each packet
  • n n n represents the length of the hash code generated by the embedded hash function
  • IV IVIV represents the initial link variable
  • ipad says byte 0x36 repeats b/8 b/8The result after b /8 times
  • opad means byte 0x5C repeats b/8 b/8The result after b /8 times

HMAC can be described as: HMAC ( K , M ) = H [ ( K + ⊕ opad ) ∣ ∣ ( K + ⊕ ipad ) ∣ ∣ M ] HMAC(K,M)=H[(K^+ \oplus opad)| |(K^+ \oplus ipad)||M]HMAC(K,M)=H[(K+o p a d ) ∣∣ ( K+ipad)∣∣M]

The operation process is as follows:

  • (1) Key KKThe left side of K is filled with 0 00 to produce abbb bit longK + K^+K+ (egKKThe length of K is160 160160 bits,b = 512 b=512b=512 , you need to fill44 4444 zero bytes0x00 0x000 x 00 ).
  • (2) K + K^+ K+ XOR with ipad bit by bit to generate b-bit packetS 1 S_1S1
  • (3) Send the message to MMM appended toS 1 S_1S1back
  • (4) The Hash function HHH acts on the result of step (3) to generate a message digest
  • (5) K + K^+ K+ bit-by-bit XOR with opad to generate b-bit packetS 0 S_0S0
  • (6) Link the message digest generated in step (4) to S 0 S_0S0back
  • (7) The Hash function HHH acts on the result of step (6), generates a message digest, and outputs the final result

A more efficient way to implement HMAC is shown in the figure below, where f ( IV , ( K + ⊕ ipad ) ) f(IV, (K^+ \oplus ipad))f(IV(K+i p a d ))f ( IV , ( K + ⊕ opad ) ) f(IV,(K^+ \oplus opad ))f ( I V ,(K+o p a d )) are two pre-calculated values, wherefff is the compression function of the hash function, and its input isnnn- bit link variable andbbThe grouping of b bits, the output isnnn -bit link variable. The above values ​​need to be calculated only when initialization or key changes, these pre-computed values ​​replace the function's initial valueIV IVIV . _ In the case that the messages input to the HMAC function are all short, this implementation is of great significance.

insert image description here

Guess you like

Origin blog.csdn.net/apr15/article/details/127501879