Applied Cryptography: chapter 11 Mathematical Background

mathematics background

11.1 Information Theory

1) Entropy and uncertainty

Information theory defines: the concept of information volume . The number of possibilities for a piece of information is represented by data of a certain length in the computer, such as the number of days in a week.
               There are seven days in total , Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday
, so you can use 3bits binary numbers to represent these seven situations.

If gender is to be discussed, then only one bit is needed to indicate gender. "0" or "1" represents "man" and "woman".
The above 3bits and 1bit are the amount of information.

The amount of information in information M is represented by the entropy of information, and the symbol is H(M).
In this way, using the above example, the entropy of gender is 1 bit; the entropy of the number of days in a week is 3 bits.
Generally speaking, the entropy of information is measured in bits as H(m) H ( m ) = log ⁡ 2 n H( m) = \log_2 nHm=log2n
n is the number of basic events (eg number of days)

Entropy can be used to measure the uncertainty in the quantity of information .
(In a messy ciphertext, if you want to know the plaintext, the first thing a cryptographer needs to do is, what is the entropy of the plaintext? In fact, it is the uncertainty of the plaintext at this time. For example, I see a line of garbled code " ksd*w@#$ ", it represents the ciphertext of gender, then its uncertainty is 1, but cryptographers don't know what the plaintext is, so they need to judge, how much is the uncertainty of the plaintext, once If it is determined to be 1, then the ciphertext will be easily cracked.)

2) Language rate (rate of language, I didn’t find it on the Internet, use this to translate first)

In a given language, the language rate is r = H ( m ) / N r=H(m)/Nr=H ( m ) / N
(N is the length of the message)

The English rate is 1.0~1.5 bits/character.
If there are L characters in a language, the absolute rate is as follows:
R = log ⁡ 2 LR=\log_2 LR=log2For
example, in English, there are 26 letters, and the rate islog ⁡ 2 26 \log_2 26log22 6 , approximately equal to 4.7 bits/letter
Redundancy of letters is generally represented by D, which is defined as: D = R − r D=RrD=Rr

3) System security

      The purpose of cryptanalysts is to determine the key K and the plaintext P. They are very interested in probabilistic information about P, ​​such as digital audio, German text, tabular data, etc.
      In reality, if cryptanalysts want to determine K or P, they should know some probabilistic things before doing anything, for example, they have already guessed the text type used in the plaintext from some information with a high probability.
      There is a cryptographic system that can achieve complete secrecy: no information about the plaintext can be seen or detected from the ciphertext. Shannon established a theory: Only when the length of the key is at least equal to the length of the plaintext can complete secrecy be guaranteed (K>=m)
. This is also the concept source       language of the one-time pad . The easier it is to be analyzed and cracked. Therefore, before encryption, the plaintext is always compressed first, so that the redundancy in it becomes extremely small, and then the compressed data is encrypted and decrypted using the key.

The entropy of the cryptographic system is all reflected in the key. K is the key and H(K) is the entropy.
The algorithm formula is as follows: H ( K ) = log ⁡ 2 KH(K)=\log_2 KH(K)=log2K

4) Unique solution distance

The approximate value of the number of ciphertexts is called the unique solution value U, that is, the total number of real information in the corresponding plaintext plus the entropy of the encryption key is equal to the number of bits of the ciphertext used. Ciphertexts beyond this distance can be determined to have only one meaningful encryption. Ciphertexts less than this distance have multiple legal encryption methods, so the situation can be disturbed and security can be obtained.
In a symmetric encryption algorithm, this single-degree value can be thought of as: the entropy value of the system separated by the redundancy of the language:
U = H ( K ) / DU=H(K)/DU=H(K)/D

I can't understand what I wrote at all, but I don't need to know its specific meaning, I just need to know that it is inversely proportional to redundancy (redundancy, literal meaning, just search if you don't know it). Knowing a certain amount of ciphertext (preferably a minimum value), it can then be deduced that there is only one encrypted way to encrypt plaintext. The longer the unique solution distance, the more secure the cryptographic system.
If the unique solution distance is too small, the cryptosystem will be insecure. However, if it is lengthened, it will not ensure that the system is more secure.

5) Confusion and Diffusion

Obfuscation : Obfuscation can blur the relationship between plaintext and ciphertext, which makes it very difficult to study the ciphertext to obtain the redundancy and statistical methods of the encryption method. So how to get confused? The easiest way is to replace, and the most typical replacement cipher is the Caesar cipher. The Caesar cipher is a product of thousands of years ago, and the modern substitution cipher is of course more complicated. (The Caesar cipher is too simple, so I won’t comment on it.) The most typical of these is German Enigma, which is a cipher encoding machine invented by the Germans during World War II.

Diffusion : Diffusion of redundancy in the ciphertext, and then reduce the redundancy of the plaintext. When cryptographers try to find redundancies, it becomes very difficult. Of course, there is also a common way to achieve diffusion, which is displacement . Common replacements include fence transposition (you can search columnar transposition on Baidu for a good English master, and there are detailed explanations).
The following picture (copied) explains this fence translocation in detail:
If the plain text is: "Which wristwatches are swiss wristwatches", the operation method is as follows:
insert image description here
if you understand it, read it, if you don't understand it, read the original text

Sequence ciphers rely solely on obfuscation, block ciphers use both. (diffusion is easily cracked)

11.2 Complexity Theory

1) Algorithm complexity

The complexity of an algorithm is determined by the computing power required to solve it. Usually, two variables can measure the complexity of the algorithm: Time and Space (time complexity and space complexity, well, friends who have learned the data structure and algorithm will understand, the algorithm complexity here is actually learned there algorithmic complexity).
So the complexity of the algorithm is not much to say. Give me a picture (screenshot)
algorithmic complexity

2) Problem complexity

Problems can be divided into: solvable problems and unsolvable problems. Solvable problems can be solved with an input of reasonable length and in a reasonable amount of time. In layman's terms, the unsolvable problem means that it is impossible to calculate this problem "quickly".
Unsolvable problems are divided into two categories: one is indeed unsolvable, and the other is solutionable, but the algorithm complexity is too high.
Solvable problems are divided into two categories: a polynomial problem (P-type problem), a non-deterministic polynomial problem (NP problem)
P-type problems are all problems that can be solved in polynomial time. NP problems are all problems that can be solved in polynomial time on non-deterministic Turing machines.
The connection between the NP problem and cryptography is that many symmetric and public-key cryptographic algorithms can be broken in non-deterministic polynomial time.

11.3 Number Theory

1) Modular algorithm

If Xiao Ming wants to go home at ten o'clock, but he is 15 hours late, when will he get home?

( 10 + 15 ) / 12 = 2 … … 1 (10+15)/12=2……1 10+15/12=21
Then you can know that it is home at 1 o'clock (this is the 12-hour system).
Modulo arithmetic is used here. In fact, the modulo operation is to find the remainder. a%b=c, c is the remainder of a/b. And the % is the modulo operation.

2) prime numbers

A natural number that has no factors other than 1 and itself is a prime number. For example, 1, 3, 5, 7, 11, 13, 17, 19, 23, 29. . . . .

3) Greatest common divisor

a1, a2, a3,...an, are a group of positive integers, d is the factor of all ai (1<=i<=n), then d is the common factor of these numbers, these numbers may have many common factors, But the greatest common factor is called: Greatest common divisor
Example: 12, 24, 16, the greatest common divisor is 4.

4) Find the inverse of a number

What is an inverse? x ∗ a ≡ 1 ( mod n ) x*a ≡ 1 ( mod n )xa
In 1 ( m o d n ) and multiplication, the reciprocal is a concept. (≡This is the concept of congruence.)
Treat x as a variable, and the inverse of x is a. At this time, a≡x^-1(mod n)

For example, 5x≡1 (mod 14), that is, what is the inverse of 5 with respect to modulo 14? When x=3, 5x=15, 15%14=1, so 3 is the inverse of 5 modulo 14.

How to find the inverse element is so complicated that I can write a dedicated article. Take a look at the article of this great god, write it carefully and click here

Mathematics is too complicated to talk about here, and the space is very long. I will write one by one when I have a chance in the future.

Guess you like

Origin blog.csdn.net/wangzhiyu12/article/details/108755170