Symmetric encryption algorithm (1) (replacement algorithm, Caesar, Playfair, Hill Cipher, Polyalphabetic Cipher)


Symmetric encryption , also known as traditional encryption, single-key encryption, or private-key encryption , was the only type of encryption used before the development of public- key encryption in the 1970s . It remains by far the most widely used of the two encryption types.

The original message is called plaintext , and the encrypted message is called ciphertext . The process from plaintext to ciphertext is the process of encryption (enciphering, encryption), and correspondingly, the process from ciphertext to plaintext is the process of decryption (deciphering, decryption). Numerous encryption algorithms form what is called cryptography , and a series of algorithms that attempt to decrypt ciphertext without knowing the encrypted information form what is known as cryptanalysis . Both cryptography and cryptanalysis belong to the category of cryptology .

Symmetric Cipher Model

A simplified encryption algorithm model is shown in the figure below:

insert image description here

It mainly consists of five parts: plaintext, encryption algorithm, key (the key is also the input of the encryption algorithm, it is a value independent of the plaintext and the algorithm. Different keys will cause the encryption algorithm to output different ciphertext), Ciphertext and decryption algorithm. In the use of symmetric encryption, we have two requirements:

  • We need a strong encryption algorithm . At least, we want the algorithm to be such that an attacker who knows the algorithm and has access to one or more ciphertexts cannot decipher the ciphertexts or find out the key. This requirement is usually expressed in a stricter form: Even if an attacker has some ciphertexts and the corresponding plaintext to generate each ciphertext, it should be impossible to decrypt the ciphertexts or discover the key.
  • The sender and receiver must obtain a copy of the key in a secure manner, and must keep the key safe . If someone can discover the key and know the encryption algorithm, all communications using that key can be said to be seen at a glance.

We believe that it is difficult to decipher the entire system when the ciphertext and the encryption algorithm are known. This means that the encryption algorithm can be disclosed to the outside world, and a lot of cost is reduced, which is why symmetric encryption is widely used.

Let's take a closer look at the process of symmetric encryption in mathematical form.

insert image description here

Now suppose the source of information produces some message X = [ X 1 , X 2 , … , XM ] X=[X_1, X_2, \dots, X_M]X=[X1,X2,,XM] , whereMMThe M elements are letters of the alphabet. The alphabet traditionally contains 26 uppercase letters, but now we directly use the binary alphabet, ie {0, 1}. At the receiving end, or a third party, a keyK = [ K 1 , K 2 , … , KJ ] K=[K_1, K_2, \dots, K_J]K=[K1,K2,,KJ] . With the plaintext and the key, the encryption algorithm can encrypt and output the ciphertextY = [ Y 1 , Y 2 , … , YN ] Y=[Y_1, Y_2, \dots, Y_N]Y=[ and1,Y2,,YN] , this process can be written as
Y = E ( K , X ) Y=E(K, X)Y=E(K,X)

At the receiving end, the decryption algorithm will be used to restore the plaintext:
X = D ( K , Y ) X=D(K, Y)X=D(K,and )

Three separate design domains of cryptography:

  • The type of operation used to convert plaintext to ciphertext . All encryption algorithms are based on two general principles: substitution , which maps each element (bit, letter, group of bits, or string) in the plaintext to another element; and transposition , which maps The elements in the plaintext are rearranged. The basic requirement is that no information is lost (ie, all operations are reversible ). Often we may use a mix of these two strategies.
  • The number of keys . If the receiving end and the sending end use the same key, it is symmetric encryption; if different keys are used, it is asymmetric encryption or public key encryption.
  • The way plaintext is processed . Block cipher (block cipher) divides the number sequence encoded by the plaintext message into groups of length n, and each group is transformed into an output number sequence of equal length under the control of the key. Stream cipher (stream cipher) uses the key to generate a key stream Z=Z1Z2Z3..., and then uses this key stream to encrypt the plaintext X=X0X1X2... in turn.

Typically, the goal of attacking an encryption system is to recover the key in use, rather than simply recovering the plaintext of a single ciphertext. There are two general approaches to attacking traditional encryption algorithms:

  • Cryptanalysis : Cryptanalytic attacks rely on properties of the algorithm , and perhaps some knowledge of general characteristics of the plaintext, or even some samples of plaintext-password pairs. This type of attack exploits the characteristics of an algorithm in an attempt to deduce a particular plaintext or deduce the key being used.
  • Brute force cracking : The attacker tries every possible key on a piece of ciphertext until he obtains a plaintext that looks understandable and reasonable. On average, half of all possible keys must be tried to be successful.

The figure below summarizes some types of cryptanalytic attacks based on some known information:

insert image description here


An encryption scheme is unconditionally secure if the ciphertext it produces does not contain enough information to uniquely determine the corresponding plaintext, no matter how much ciphertext there is. That is, no matter how much time an attacker spends, it is impossible to decipher the ciphertext because the required information is not contained in it. Except for a scheme known as a one-time pad, no encryption algorithm is unconditionally secure . Therefore, all a user of an encryption algorithm can strive for is an algorithm that meets one or both of the following criteria:

  • The cost of cracking a password exceeds the value of the encrypted information;
  • The time required to crack the code exceeds the validity time of the information.

An encryption algorithm is said to be computationally secure if either of these two criteria is met . All cryptanalysis is based on an assumption that the ciphertext still contains some clues (structure, pattern) of the plaintext, which can be identified through analysis.


Substitution Techniques

Substitution technology refers to the replacement of letters of the specified text by other letters or numbers or symbols. If the plaintext is viewed as a sequence of bits, substitution is the replacement of the plaintext bit pattern with the ciphertext bit pattern.

Caesar Cipher

The earliest, and simplest, use of an alternative cipher was Caesar. The Caesar cipher replaces each letter with the next three letters of the alphabet. for example

plain:meet me after the toga party
cipher:PHHW PH DIWHU WKH WRJD SDUWB

All correspondences are:

plain: a b c d e f g h i j k l m n o p q r s t u v w x y z
cipher:D E F G H I J K L M N O P Q R S T U V W X Y Z A B C

We assign each letter a corresponding number:

insert image description here

Then the Caesar cipher can be expressed as follows:

For each letter pp in plaintextp , replace it with the ciphertext letterCCC is
C = E ( 3 , p ) = ( p + 3 ) mod 26 C=E(3, p)=(p+3)\ {\rm mod}\ 26C=And ( 3 ,p)=(p+3 ) m o d 2 6  

A general extended form (delay k bits) is:
C = E ( k , p ) = ( p + k ) mod 26 1 ≤ k ≤ 25 C=E(k, p)=(p+k)\ {\rm mod}\ 26 \quad 1\le k \leq 25C=E(k,p)=(p+k) mod 261k2 5

The corresponding decryption algorithm is
p = D ( k , C ) = ( C − k ) mod 26 1 ≤ k ≤ 25 p=D(k, C)=(Ck)\ {\rm mod}\ 26 \quad 1 \le k \leq 25p=D(k,C)=(Ck)mod26  1k2 5

The Caesar cipher is very easy to break using the brute force method, because the encryption and decryption algorithms are known, and the total number of possible keys is only 25.

Monoalphabetic Ciphers

With only 25 possible keys, the Caesar cipher is far from secure. By allowing arbitrary substitutions, a large increase in the key space can be achieved. That is, we can consider permutations . If a set is S = { a , b , c } S=\{a, b, c\}S={ a,b,c } , then all possible permutations are
abc acb bac bca cab cba abc\ acb\ bac\ bca\ cab\ cbaabc acb bac bca cab cba

In general, if the set has nnn elements, then the total number of permutations isn ! n!n ! . So if we still consider the English alphabet, that means we have26 ! 26!2 6 ! possible keys! This is absolutely difficult to break through the law of violence. This method is known asa single-letter substitution cipher. However, through cryptanalysis, the scheme is still possible to be cracked.

For example, we have the following ciphertext:

UZQSOVUOHXMOPVGPOZPEVSGZWSZOPFPESXUDBMETSXAIZ
VUEPHZHMDZSHZOWSFPAPPDTSVPQUZWYMXUZUHSX
EPYEPOPDZSZUFPOMBZWPFUPZHMDJUDTMOHMQ

If the ciphertext is long enough, we can calculate the frequency of each letter and compare it with the known frequency of English letters to obtain the corresponding relationship. This ciphertext is relatively short, but let's give it a try.

The frequency of letters in the ciphertext (%):

insert image description here

After a large number of statistics, the frequency of occurrence of English letters is

insert image description here

According to this rule, P, Z in the ciphertext may correspond to e, t in the plaintext, and S, U, O, M, H with higher frequency may correspond to {a, h, i, n, o, r, s}. A, B, G, Y, I, J with the lowest frequency may correspond to {b, j, k, q, v, x, z}. The next thing to do is to try and guess, to see if some of the above correspondences can help us get a reasonable plaintext.

A more efficient mapping rule is that we detect the frequency of pairs of letters. Usually the combination that appears the most should thbe , but in our ciphertext, it ZWappears 3 times, and all of them are likely to correspond th. If we assume that P corresponds to e, then what appears in ZWPthe corresponds exactly the!

In the first line of the ciphertext, we also have ZWSP, according to our current mapping rules, it is th_t, so according to the frequency correspondence and guessing above, S should correspond to a.

So far, we have been able to translate the ciphertext as

insert image description here

Carrying on this analysis process, we may get the complete plaintext:
insert image description here

Single-letter ciphers are also easy to crack because they reflect the frequency data of the original letters.

In substitution ciphers, there are two main methods used to reduce the extent to which plaintext structures persist in ciphertext. We briefly introduce these two methods.

Playfair Cipher

Playfair Cipher adopts the method of two-letter replacement , which is based on a 5×5 matrix, selects an English word as the key, removes repeated letters, and adds the letters of the key to the 5×5 matrix one by one In the remaining space, add the English letters that have not been added in the order of az (I and J are regarded as the same letter). Divide the message to be encrypted into two groups, and the Porefe cipher will be encrypted according to the following rules (assuming it is used monarchyas the key, the matrix is ​​as shown below):

insert image description here

  • If the letters in the group are the same, add a filler letter (such as x) to the first letter of the group and regroup. If there is one word left, also add the filling word. For example, balloonwould becomeba lx lo on
  • In each group, find where the two letters are in the matrix:
    • If two letters are in the same row of the matrix, take the letter to the right of the two letters (if the letter is at the far right, take the letter at the far left). For example, arwould becomeRM
    • If two letters are in the same column of the matrix, take the letter below the two letters (if the letter is at the bottom, take the top letter). For example, muwould becomeCM
    • Otherwise, each letter is replaced by a letter in its own row and column occupied by another letter . For example, hswill become BP, eawill become IMorJM

Hill Cipher

Each letter is regarded as a 26-digit number: A=0, B=1, C=2... A string of letters is regarded as an n-dimensional vector, multiplied by an n×n matrix (key), and then mod the result 26. For example, if we replace three letters each time:

insert image description here
Written in vector form, it is
C = PK mod 26 \boldsymbol{C}=\boldsymbol{P}\boldsymbol{K}\ {\rm mod}\ 26C=P K m o d 26  

Let's look at a concrete example, for the plaintext paymoremoneyand the key

insert image description here
We take 3 plaintext letters each time, and paythe corresponding 26 digits are 15, 0, 24, that is, P = ( 15 0 24 ) \boldsymbol{P}=(15\ 0\ 24)P=( 1 5 0 2 4 )   , the result is
C = ( 15 0 24 ) K = ( 303 303 531 ) mod 26 = ( 17 17 11 ) = RRL \boldsymbol{C}=(15\ 0\ 24)\boldsymbol{ K}=(303\ 303\ 531)\ {\rm mod}\ 26=(17\ 17\ 11)={\rm RRL}C=(15 0 24)K=( 3 0 3 3 0 3 5 3 1 ) m o d 2 6    =( 1 7 1 7 1 1 )  =RRL

When decoding, we need to calculate K \boldsymbol{K}K 's inverse matrixK − 1 \boldsymbol{K}^{-1}K−1 : _

insert image description here
Then pass P = CK − 1 mod 26 \boldsymbol{P}=\boldsymbol{C}\boldsymbol{K}^{-1}\ {\rm mod}\ 26P=CK1 mod26  We can solve the corresponding plaintext.

Polyalphabetic Cipher

Another way to improve the performance of single-letter replacement is to use different single-letter passwords as the plaintext information advances. This type of method is called Polyalphabetic Cipher. They all have the following common characteristics:

  • A series of related single-letter substitution rules are used;
  • key to determine which specific rule to use for this conversion.

Vigenere Cipher

The Vigenere Cipher is one of the most famous and simplest Polyalphabetic Ciphers. In this mechanism, the associated single-letter substitution rule set consists of 26 Caesar ciphers with shifts from 0 to 25 . Each cipher is represented by a key letter, which is the ciphertext letter that substitutes for the plaintext letter a. Thus, a Caesar cipher with a shift of 3 is represented by a key value of 3 (the letter d). We can express Vigenere Cipher in the following way:

Suppose we have a string of plaintext letters P = p 0 , p 1 , … , pn − 1 P=p_0, p_1, \dots, p_{n-1}P=p0,p1,,pn1And a string of key letters K = k 0 , k 1 , … , km − 1 K=k_0, k_1, \dots, k_{m-1}K=k0,k1,,km1, usually m < n m<nm<n

Corresponding ciphertext letters C = C 0 , C 1 , … , C n − 1 C=C_0, C_1, \dots, C_{n-1}C=C0,C1,,Cn1Calculated as follows:
C = E ( K , P ) = ( p 0 + k 0 ) mod 26 , ( p 1 + k 1 ) mod 26 , … , ( pm − 1 + km − 1 ) mod 26 , ( pm + k 0 ) mod 26 , ( pm + 1 + k 1 ) mod 26 , … , ( p 2 m − 1 + km − 1 ) mod 26 , … \begin{aligned} C=E(K, P)= & (p_0 +k_0)\ {\rm mod}\ 26, (p_1+k_1)\ {\rm mod}\ 26,\dots, (p_{m-1}+k_{m-1})\ {\rm mod} \ 26, \\ &(p_m+k_0)\ {\rm mod}\ 26, (p_{m+1}+k_1)\ {\rm mod}\ 26,\dots, (p_{2m-1}+ k_{m-1})\ {\rm mod}\ 26, \dots \end{aligned}C=E(K,P)=(p0+k0) m o d 2 6 ,  (p1+k1) m o d 2 6 ,  ,(pm1+km1) m o d 2 6 ,  (pm+k0) m o d 2 6 ,  (pm+1+k1) m o d 2 6 ,  ,(p2 m 1+km1) m o d 2 6 ,  

That is, the first letter of the key is added to the first letter of the plaintext, modulo 26, the second letter is added to the corresponding letter, modulo 26, and so on until the first m letters of the ciphertext are obtained. For the next m letters, the key letters will start from k 0 k_0k0Start the cycle again. This process is repeated until all plaintext letters have been converted.

Concise writing, the calculation formula of each ciphertext letter:
C i = ( pi + ki mod m ) mod 26 C_i=(p_i + k_{i\ {\rm mod}\ m})\ {\rm mod}\ 26Ci=(pi+ki m o d   m) m o d 2 6  

Look at a concrete example:

insert image description here

Converting to numeric form is straightforward:

insert image description here

The advantage of this cipher is that each plaintext letter corresponds to multiple ciphertext letters, so the letter frequency information is concealed. However, not all knowledge about the structure of the plaintext is lost. For example, in the figure above, the 3 ciphertext letters in the shaded part are obviously repeated, so we can infer that the length of the key letter may be 9 or 3 (a multiple of 3, but it must be less than 26~).

Next, we introduce some methods of cracking Vigenere Cipher, which cover some routine operations of cryptanalysis.

First, we assume that the attacker thinks that the encryption algorithm is single-letter substitution or Vigenere Cipher. Single-letter substitutions can be judged by a simple test: the statistical properties of the ciphertext letters should be the same as those of the plaintext letters.

For Vigenere Cipher, we can first determine the length of the key, just like the example in the above figure, we found repeated combinations of ciphertext letters. This is due to the fact that two identical plaintext sequences are separated by an integer multiple of the key length . Of course it could also be accidental. But if the observed ciphertext sequence is long enough, some useful information can be captured.

Suppose now that we have deduced that the key length is mmm , that is, Vigenere Cipher has a total ofmmm kinds of single-letter substitution rules. For example, for the keyDECEPTIVE, the plaintext letters in positions 1, 10, 19... will be encrypted by the same key letter (some kind of single-letter substitution), so we can crack each single-letter substitution method based on the frequency estimation mentioned earlier.

For the periodic nature of cryptographic keys, we can use non-repeating keys. Vigenere Cipher proposed the so-called autokey system, a key will be followed by a plaintext string:

insert image description here

But this is not absolutely safe. Statistics can work! For example, the probability that e is encrypted by e is 0.12 7 2 ≈ 0.016 0.127^2 \approx 0.0160 . 1 2 720 . 0 1 6

Vernam Cipher

Directly use a key that is as long as the plaintext and is statistically independent. But this system is allowed at the level of binary numbers, not letters, so the ciphertext is generated by ORing the plaintext with the key:

insert image description here
c i = p i ⊕ k i c_i=p_i\oplus k_i ci=piki

The core of Vernam Cipher is key generation. Essentially, the Vernam Cipher still uses a duplicate key , but that key is very long . Although the Vernam Cipher poses a serious problem for cryptanalysis, it can still be broken given enough ciphertext, using known or probable plaintext sequences, or both.

One-Time Pad

An improvement on the Vernam Cipher, using a random key of the same length as the message . And the key will only be used once , and each new message will generate a random key of the corresponding length, which is uncrackable! Because the final ciphertext does not contain any statistical characteristics of the plaintext. But the world will not be so good. There are two major difficulties in the practical application of one-time keys:

  • Make lots of random keys. Any heavily used system may often require millions of random characters. Serving such a large number of truly random characters is a daunting task;
  • Even more difficult is the distribution and protection of keys. For each message to be sent, both the sender and receiver need a random key of the same length. How to allocate and protect it?

Because of these difficulties, one-time keys are of limited use and are mainly used for low-bandwidth channels where very high security is required.


References

Cryptography and Network Security: Principles and Practice, 7th Edition, ISBN 978-0-13-444428-4, by William Stallings, published by Pearson Education.

Guess you like

Origin blog.csdn.net/myDarling_/article/details/128246910