Project Euler 59: XOR decryption

Each letter on the computer corresponds to a unique number, generally accepted standard ASCII (American Standard Code for Information Interchange). For example, capital letters A ASCII code is 65, the asterisk (*) is the ASCII code 42, while the lower case letters code is 107 k.

It is a modern encryption method: Enter a text file, which is converted to the ASCII code corresponding to the byte, and then a specific value obtained from the keys and each byte XOR operation. One benefit of using the XOR function is the same key ciphertext can restore the plaintext, such as \ (65 \ the XOR \ = 42 is 107 \) , while \ (107 \ the XOR \ = 42 is 65 \) .

If the length of the plaintext and the key is as long as the key is completely random, then the encryption is unbreakable. Information and encryption key after the user will be placed in different places, if you can not get both at the same time, it is impossible to decrypt this message.

Unfortunately, for most users, the above method is not practical, so often use a method of correction, which is a code word to use as the key. If the key word encrypted short message (usually the case) than you want, then the key will be reused throughout the information. The scheme is a compromise: Use a password long enough words to ensure safety, while the word but not so long therefore better to remember.

We tell you by following your task easier a few: Known key contains three lowercase letters, and text files p059_cipher.txt contains the ASCII code encrypted and clear text contains only common English words . Now you decrypt this information and find the sum of the ASCII code corresponding to the original text.

Analysis: This is a cryptography problem, if there is a certain understanding of the relevant background knowledge of cryptography, this question will be a lot easier. If you are interested, you can take a look at "code book: encoding and decoding of war," the book, which is a very exciting cryptography popular science books. Author Simon Singh was a British mathematician, he is also " Fermat's Last Theorem: a confused worldly wise 358-year mystery" author of the book, the book is quite exciting, can dull mathematical knowledge written in such lively and interesting, exciting authors only real small.

Closer to home, the title has been said key consists of three English lowercase letters, lowercase letters taking into account the three can only constitute a \ (26 ^ 3 = 17576 \) combinations, so this question can also use the brute force approach. With more than the million keys one by one to try, only to behave like a normal English text, then the key is correct, ASCII code corresponding to the sum of the original this time then that is the subject of the request, but obviously we also there are other better ways. In the "code book" a book, the author tells the story of an ancient very popular encryption method, which is an alternative method, we can replace the original letters to various other letters of the alphabet, for example, we can replace C to E, a replaced with K, T replaced by Z, so that the word CAT becomes EKZ, this alternative mapping relationship is the key, when the recipient receives the ciphertext information, the key can be used to launch anti plaintext. This method is about the invention of the Roman period, a long period of time is a very safe method of encryption, no one can decipher only to the European Middle Ages, Arab scholars have found a way to break this code, which is the frequency Analysis. In this phonogram as English, the frequency of use of each letter is relatively fixed, such as in English, using the highest frequency of the letter E, is about 12.7%; letter T followed by frequency of use, approximately 9.35 %; the lowest frequency of the letter Z, is about 0.077%. So long as we collect enough ciphertext, statistics on the frequency of occurrence of each letter, and to this frequency distribution and frequency distribution of English in contrast, it is not difficult to guess the correspondence between the letters in the ciphertext with the original letter, which this crack the password.

This problem relates to the fact of encryption and XOR substitution are very similar, so it can break frequency analysis. First we look at the XOR encryption method works for XOR encryption do not know the students, can be found in this wiki . I suppose there are some plaintext "hello world!", I'm sure a shorter length than the plaintext letter combinations of words as a password, for example, the password is the word "god", but does not contain spaces contain quotation marks around the original length of 12, The password is the word length of 3, so we demand for recycled four times a password to encrypt the original word. We first letter of "h" and the letter "g" ASCII codes corresponding to the XOR operation, the result obtained was 15; and the ASCII code for the letter "e" and the letter "o" corresponding XORed to give 10. And so, the letter "l" and the letter "d" pair, the next letter "l" and the letter "g" pairing XOR operation, the whole finally obtained ciphertext [15,10,8,11,0,68, 16,0,22,11,11,69]. If we want to recover its original, we only need the cipher text according to the above XORed with a password word encrypted in a similar manner, as will be 15 and the letter "g" corresponding to the ASCII code 103 XORed to obtain 104, which corresponds to It is the letters "h".

Now let's see how to crack encrypted using XOR frequency analysis, first of all topics have told us that the word password length is three, so we know that the first and the fourth letter of the original with the first letter of the word password encryption, The second and fifth letter is the second letter, the third and sixth letter is the third letter. After so on. Thus, we can be divided into three groups ciphertext, the ciphertext are each using the same password encrypted word letters. Then we each set of frequency analysis of the ciphertext, the cipher text to find the highest frequency of appearance: the first group of the most frequent ciphertext is 69, appeared 86 times; the second group of the highest frequency is 88, appeared 77 times ; third group of 80 is the highest frequency, appeared 103 times. In a period of normal English, the most frequent is usually a space, so we guess the top three highest frequency corresponding plaintext ciphertext is space (for this reason, it is usually encrypted when the original spaces will be removed, otherwise would be too easy to break, but after I try to identify problems in order to simplify and remove spaces when no encryption). Now we only need a space corresponding to the plaintext is ASCII code and corresponding ciphertext XORed the key can be obtained. A space corresponding to the ASCII code is 32, first 32 and 69 are XOR obtained 101, the corresponding English letters "e"; then 32 and 88 are XOR obtained 120, the corresponding English letters "x"; Finally 80 and 32 XOR 112 to give the corresponding English letters "p", we obtain code word is "exp". We can decrypt the ciphertext using this code word, get reads as follows:

An extract taken from the introduction of one of Euler's most celebrated papers, "De summis serierum reciprocarum" [On the sums of series of reciprocals]: I have recently found, quite unexpectedly, an elegant expression for the entire sum of this series 1 + 1/4 + 1/9 + 1/16 + etc., which depends on the quadrature of the circle, so that if the true sum of this series is obtained, from it at once the quadrature of the circle follows. Namely, I have found that the sum of this series is a sixth part of the square of the perimeter of the circle whose diameter is 1; or by putting the sum of this series equal to s, it has the ratio sqrt(6) multiplied by s to 1 of the perimeter to the diameter. I will soon show that the sum of this series to be approximately 1.644934066842264364; and from multiplying this number by six, and then taking the square root, the number 3.141592653589793238 is indeed produced, which expresses the perimeter of a circle whose diameter is 1. Following again the same steps by which I had arrived at this sum, I have discovered that the sum of the series 1 + 1/16 + 1/81 + 1/256 + 1/625 + etc. also depends on the quadrature of the circle. Namely, the sum of this multiplied by 90 gives the biquadrate (fourth power) of the circumference of the perimeter of a circle whose diameter is 1. And by similar reasoning I have likewise been able to determine the sums of the subsequent series in which the exponents are even numbers.

Interestingly, this is a famous passage in the paper Euler, In this paper he solved the famous Basel problem, that is obtained and the following infinite series:
\ [\ sum_ {k = 1 } ^ \ infty \ frac {1 } {k ^ 2} = 1 + \ frac {1} {4} + \ frac {1} {9} + \ frac {1} {16} + \ cdots = \ frac { \ pi ^ 2} {6} \]

This inspired further research Riemann Riemann \ (Zeta \) nature of the functions, the Riemann \ (Zeta \) nontrivial zeros secret link between the distribution of prime numbers and function, resulting in the course of the study of Riemann suspect is still one of the important areas of mathematics conjecture unresolved. Even more interesting is that apparently the topic and the Euler works secretly on this topic and the original key has been modified, because if you search question is present on the network, you'll find out the problem before the person is calculated the password is the word "god", and plaintext is from the first chapter, "the Holy Bible, gospel of John." Obviously, the paper is more in line with Euler Euler locate the site of the project.

Clarify the above principles and know the password after the word, the code is relatively simple to achieve. XORed with a password word for each letter of the ciphertext obtained itemized description corresponding to the ASCII code and then summed, that is the subject of the request. code show as below:

# time cost = 668 µs ± 3.63 µs

from collections import Counter

def main():
    with open('data/ep59.txt') as f:
        cipher = list(map(int,f.read().split(',')))
    space_ascii = ord(' ')
    key = [Counter(cipher[i::3]).most_common(1)[0][0] ^ space_ascii for i in range(3)]
    cycles = len(cipher)//3
    res = sum([x^y for x,y in zip(cipher,key*cycles)])
    return res

Guess you like

Origin www.cnblogs.com/metaquant/p/11862762.html