[Java] Based on the naive Bayesian algorithm to crack the random character replacement encryption algorithm based on the hash table,
don't read it, this article is wrong. The result obtained is also not the correct result. I was wrong, because the premise of this decryption algorithm is to calculate it when the hash table is already known. In fact, it should only rely on statistics to analyze the ciphertext, so the number of ciphertext characters required for actual cracking should be greater than the figure calculated in this article

foreword

This article introduces how to use the naive Bayesian algorithm to train the model to crack the random character replacement encryption algorithm, and analyze the security of the encryption algorithm through the statistics of the decrypted data.

Summary

This paper studies the random character replacement encryption algorithm based on hash table and its security issues, uses the naive Bayesian algorithm to crack the encryption algorithm, and analyzes and discusses the cracking results. The research results show that the security of the encryption algorithm is insufficient and it is easy to be cracked. This paper proposes improvement measures to improve the security of the algorithm.

Keywords: hash table, random character replacement encryption, naive Bayesian algorithm, security

1. Code part

The code part looks like this:

import javax.swing.*;
import java.awt.*;
import java.awt.geom.AffineTransform;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.util.List;
import java.util.*;

public class Main {
    
    
    private static final int WIDTH = 600;
    private static final int HEIGHT = 400;
    private static final int BORDER_GAP = 30;
    private static final Color LINE_COLOR = Color.BLUE;
    private static final Stroke GRAPH_STROKE = new BasicStroke(2f);
    private static final int GRAPH_POINT_WIDTH = 6;


    public static void main(String[] args) {
    
    

        //读取哈希表
        Map<Character, List<Character>> hashTable;
        try (ObjectInputStream ois = new ObjectInputStream(new FileInputStream("hashtable.ser"))) {
    
    
            hashTable = (Map<Character, List<Character>>) ois.readObject();
        } catch (IOException | ClassNotFoundException e) {
    
    
            e.printStackTrace();
            return;
        }
        NaiveBayes naiveBayes = null;
        ArrayList<Double> accuracyList = new ArrayList<>();
        ArrayList<Integer> lengthList = new ArrayList<>();
        for (int times = 0; times < 1000000; times += 1000) {
    
    
            lengthList.add(times);
            int length = 1;
            //生成10000个长度为100的字符串
            Random random = new Random();

            List<String> originalStrings = getOriginalStrings(times, length, random);

            List<String> encryptedStrings = getEncryptedStrings(hashTable, originalStrings, random);

            Map<String, String> dataSet = getStringStringMap(originalStrings, encryptedStrings);

            //使用朴素贝叶斯算法训练出一个解密的模型
            naiveBayes = new NaiveBayes();
            naiveBayes.train(dataSet);

            accuracyList.add(accuracyTest(hashTable, length, random, naiveBayes));
            if (accuracyList.get(accuracyList.size() - 1) > 0.99) {
    
    
                System.out.println("准确率(0~1): " + accuracyList.get(accuracyList.size() - 1));
                System.out.println("截取密文长度：" + times + '\n');
                break;
            }
        }
        drawGraph(accuracyList, lengthList, 20000, 0.1);

        Scanner scanner = new Scanner(System.in);

        System.out.println("请输入要解密的密文：");
        String encryptedString = scanner.nextLine();

        System.out.println("解密结果：\n" + naiveBayes.predict(encryptedString));
    }

    private static List<String> getOriginalStrings(int times, int length, Random random) {
    
    
        List<String> originalStrings = new ArrayList<>();
        for (int i = 0; i < times; i++) {
    
    
            StringBuilder sb = new StringBuilder();
            for (int j = 0; j < length; j++) {
    
    
                int r = random.nextInt(36);
                if (r < 26) {
    
    
                    sb.append((char) ('a' + r));
                } else if (r < 35) {
    
    
                    sb.append((char) ('0' + r - 26));
                } else {
    
    
                    sb.append(' ');
                }
            }
            originalStrings.add(sb.toString());
        }
        return originalStrings;
    }

    private static List<String> getEncryptedStrings(Map<Character, List<Character>> hashTable, List<String> originalStrings, Random random) {
    
    
        //将这些字符串通过哈希表进行加密
        List<String> encryptedStrings = new ArrayList<>();
        for (String originalString : originalStrings) {
    
    
            StringBuilder sb = new StringBuilder();
            for (char c : originalString.toCharArray()) {
    
    
                sb.append(hashTable.get(c).get(random.nextInt(hashTable.get(c).size())));
            }
            encryptedStrings.add(sb.toString());
        }
        return encryptedStrings;
    }

    private static Map<String, String> getStringStringMap(List<String> originalStrings, List<String> encryptedStrings) {
    
    
        //将生成的密文和原字符串一一对应起来
        Map<String, String> dataSet = new HashMap<>();
        for (int i = 0; i < originalStrings.size(); i++) {
    
    
            dataSet.put(encryptedStrings.get(i), originalStrings.get(i));
        }
        return dataSet;
    }

    private static double accuracyTest(Map<Character, List<Character>> hashTable, int length, Random random, NaiveBayes naiveBayes) {
    
    
        int timesTest = 100000;
        //生成10000个随机字符，用哈希表进行加密后，再用训练出的模型进行解密
        List<String> testStrings = getOriginalStrings(timesTest, length, random);
        //用哈希表加密
        List<String> testEncryptedStrings = getEncryptedStrings(hashTable, testStrings, new Random());
        //用训练出的模型进行解密
        List<String> testDecryptedStrings = new ArrayList<>();
        for (String testEncryptedString : testEncryptedStrings) {
    
    
            testDecryptedStrings.add(naiveBayes.predict(testEncryptedString));
        }

        //将解密结果与原字符进行对比，评测模型的准确度并打印到输出台上
        int correctCount = 0;
        for (int i = 0; i < testStrings.size(); i++) {
    
    
            if (testStrings.get(i).equals(testDecryptedStrings.get(i))) {
    
    
                correctCount++;
            }
        }
        return correctCount / (double) timesTest;
    }

    public static void drawGraph(ArrayList<Double> accuracyList, ArrayList<Integer> lengthList, int xUnitLength, double yUnitLength) {
    
    
        JFrame frame = new JFrame("统计测试");
        frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
        frame.setSize(WIDTH, HEIGHT);

        JPanel panel = new JPanel() {
    
    
            @Override
            protected void paintComponent(Graphics g) {
    
    
                super.paintComponent(g);
                Graphics2D g2 = (Graphics2D) g;

                g2.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);
                setBackground(Color.WHITE);

                //绘制坐标系
                g2.drawLine(BORDER_GAP, getHeight() - BORDER_GAP, BORDER_GAP, BORDER_GAP);
                g2.drawLine(BORDER_GAP, getHeight() - BORDER_GAP, getWidth() - BORDER_GAP, getHeight() - BORDER_GAP);

                //绘制数据点和连线
                int pointX, pointY, prevPointX = 0, prevPointY = 0;
                for (int i = 0; i < accuracyList.size(); i++) {
    
    
                    pointX = (int) ((double) (getWidth() - 2 * BORDER_GAP) / (lengthList.size() - 1) * i + BORDER_GAP);
                    pointY = (int) ((getHeight() - 2 * BORDER_GAP) * (1 - accuracyList.get(i)) + BORDER_GAP);
                    g2.setColor(LINE_COLOR);
                    g2.setStroke(GRAPH_STROKE);
                    if (i > 0) {
    
    
                        g2.drawLine(prevPointX, prevPointY, pointX, pointY);
                    }
                    g2.fillOval(pointX - GRAPH_POINT_WIDTH / 2, pointY - GRAPH_POINT_WIDTH / 2, GRAPH_POINT_WIDTH, GRAPH_POINT_WIDTH);
                    prevPointX = pointX;
                    prevPointY = pointY;
                    //在x轴上绘制坐标值
                    if (lengthList.get(i) % xUnitLength == 0) {
    
    
                        g2.drawString(String.format("%,d", lengthList.get(i)), pointX, getHeight() - BORDER_GAP / 2);
                    }
                }

                //在y轴上绘制坐标值
                for (int i = 0; i <= 10; i++) {
    
    
                    int y = (int) ((getHeight() - 2 * BORDER_GAP) * (1 - i * 0.1) + BORDER_GAP);
                    g2.drawString(String.format("%.1f", i * yUnitLength), BORDER_GAP / 2, y);
                }

                //在x轴下标注密文长度
                g2.drawString("密文长度", getWidth() / 2, getHeight() - BORDER_GAP / 4);

                //在y轴旁边标注准确率
                AffineTransform origTransform = g2.getTransform();
                AffineTransform at = new AffineTransform();
                at.rotate(-Math.PI / 2);
                g2.setTransform(at);
                g2.drawString("准确率", -getHeight() / 2, BORDER_GAP / 2);
                g2.setTransform(origTransform);
            }

            @Override
            public Dimension getPreferredSize() {
    
    
                return new Dimension(WIDTH, HEIGHT);
            }
        };

        frame.add(panel);
        frame.setVisible(true);
    }

    private static class NaiveBayes {
    
    
        private Map<String, Map<Character, Integer>> featureCount;
        private Map<String, Integer> labelCount;

        public NaiveBayes() {
    
    
            featureCount = new HashMap<>();
            labelCount = new HashMap<>();
        }

        public void train(Map<String, String> dataSet) {
    
    
            for (Map.Entry<String, String> entry : dataSet.entrySet()) {
    
    
                String encryptedString = entry.getKey();
                String originalString = entry.getValue();
                for (int i = 0; i < encryptedString.length(); i++) {
    
    
                    String feature = encryptedString.substring(i, i + 1);
                    char label = originalString.charAt(i);
                    if (!featureCount.containsKey(feature)) {
    
    
                        featureCount.put(feature, new HashMap<>());
                    }
                    if (!featureCount.get(feature).containsKey(label)) {
    
    
                        featureCount.get(feature).put(label, 0);
                    }
                    featureCount.get(feature).put(label, featureCount.get(feature).get(label) + 1);
                }
                if (!labelCount.containsKey(originalString)) {
    
    
                    labelCount.put(originalString, 0);
                }
                labelCount.put(originalString, labelCount.get(originalString) + 1);
            }
        }

        public String predict(String encryptedString) {
    
    
            StringBuilder sb = new StringBuilder();
            for (int i = 0; i < encryptedString.length(); i++) {
    
    
                String feature = encryptedString.substring(i, i + 1);
                char label = 'a';
                int maxCount = 0;
                //若找不到字符对应的指针，则解出的密文赋与*字符
                if (!featureCount.containsKey(feature)) {
    
    
                    sb.append('*');
                    continue;
                }
                for (Map.Entry<Character, Integer> entry : featureCount.get(feature).entrySet()) {
    
    
                    if (entry.getValue() > maxCount) {
    
    
                        maxCount = entry.getValue();
                        label = entry.getKey();
                    }
                }
                sb.append(label);
            }
            return sb.toString();
        }
    }
}

2. Random character replacement encryption algorithm based on hash table

The encryption process of the hash table-based random character replacement encryption algorithm is as follows:

Randomly generate a character set containing 37 characters, including 26 English letters, 10 numbers from 0-9 and a space character;
For each character, a Unicode-encoded character is randomly assigned, a total of 1771 characters are assigned, and these characters are stored in a hash table;
When encryption is required, each character in the plaintext is replaced with other characters in the hash table, that is, a character in the hash table is randomly selected for replacement.

For more information about this encryption algorithm, please refer to one of my previous articles:

[Java] Random character replacement encryption algorithm based on hash table--by why not?

3. Training model based on Naive Bayesian algorithm

Naive Bayesian classification is a classification method based on Bayesian theorem and the assumption of feature-conditional independence. It can be used to classify input data. It assumes that the features are independent of each other, which simplifies calculations.

In this code, the Naive Bayes algorithm is used to decrypt the encrypted string. First, use a random function to generate a certain number of original strings, and encrypt them through a hash table to generate ciphertext pairs. Then, a decrypted model is trained using Naive Bayes.

Model training process:

Traverse each pair of ciphertext and original text in the data set, use each character of the original text as a label, and use each character of the ciphertext as a feature, calculate the frequency of each feature in each label, and store it in the counter of the feature and its label middle.
For each label, the number of occurrences in the training set is counted and stored in the label counter.
Use the Bayesian formula to calculate the probability of each feature corresponding to each label, and store the calculation results in the model.

After the training is completed, for the input ciphertext, the model can be used to calculate the label probability corresponding to each feature, and then the label with the highest probability is taken as the decryption result.

4. Naive Bayesian Algorithm Cracks Random Character Replacement Encryption Algorithm Based on Hash Table

Prepare the data set:
randomly generate a certain number of strings, and then replace random characters through the hash table, and form a corresponding relationship between the obtained ciphertext and the original string, and use it as a training data set.
Build a model:
use the naive Bayesian algorithm to build a training model, and calculate the probability that each character is replaced by another character.
Crack and calculate the accuracy rate:
re-randomly generate a sufficient number of character strings, and also encrypt them with a hash table-based random character replacement encryption algorithm to obtain a test set. According to the trained model, the naive Bayesian algorithm is used to crack the ciphertext of the test set, and the accuracy of the model is calculated and counted.

5. Safety Analysis

The code is cracked by the naive Bayesian algorithm, and the relationship curve between the number of ciphertext characters and the accuracy rate is statistically drawn. As shown below:

It can be seen from the curve in the figure that when the model intercepts about 60,000 ciphertext characters, it can crack 60-70% of the ciphertext. When intercepting about 100,000 ciphertext characters, about 80% of the ciphertext can be deciphered. So less security.

6. Feasible improvement methods

Here are two feasible ways to improve the encryption algorithm to enhance its security:

Increase the number of characters in the character set: the size of the original character set is 37, you can increase the number of character sets to increase the security of encryption, but this method needs to consider whether to use a certain character, and then add it to the character concentrated. If there is no need to use a certain character in the plaintext, then it does not need to be added, so I think it is not convenient to strengthen the security of encryption in this way.
Increase the number of characters allocated for each character: the original number of allocated characters is 1771, and the security of encryption can be improved by increasing the number of characters allocated for each character. And this increase can be unlimited, that is, when the number of allocated characters is large enough, the security of the encryption method can be infinitely improved.

7. Summary

This paper studies the random character replacement encryption algorithm based on hash table and its security issues, uses the naive Bayesian algorithm to crack the encryption algorithm, and analyzes and discusses the cracking results. The results show that the security of the encryption algorithm is insufficient and it is easy to be cracked.

[Java] Based on the naive Bayesian algorithm to crack the random character replacement encryption algorithm based on the hash table