Some methods of preventing reverse engineering of Python source code

1 Introduction

There is no way to ensure absolute security of your code. Many methods can increase the difficulty of reverse engineering, but cannot completely prevent attackers with malicious intentions. Best practice is to use a combination of methods to increase the effectiveness of code protection. Some commonly used methods will be introduced below.

2. Summary of common methods for preventing reverse engineering of Python source code

2.0 Suppose we have a precious piece of code as follows

Script name: edlen.py

import numpy as np

def edlen_equation(temperature, pressure, humidity, wavelength):
    temp_c = temperature + 273.15  # 摄氏度转换为开氏度
    p = pressure / 100  # 帕斯卡转换为百帕
    e = (humidity / 100) * 6.112 * np.exp((17.62 * temperature) / (243.12 + temperature))

    n_air = 1 + (6432.8 + 2949810 / (146 - 1 / wavelength**2) + 25540 / (41 - 1 / wavelength**2)) * 1e-8 * p / temp_c
    n_water_vapor = 1 + 1.022 * (13.14 - 3.52 * 1e3 * wavelength**2) * 1e-6 * e / temp_c
    n = n_air * n_water_vapor

    return n

if __name__ == "__main__":
    # 测试
    temperature = 20  # 气温,单位:摄氏度
    pressure = 101325  # 气压,单位:帕斯卡
    humidity = 50  # 相对湿度,单位:百分比
    wavelength = 532e-9  # ATLAS激光波长,单位:米(例如 532 纳米)

    n = edlen_equation(temperature, pressure, humidity, wavelength)
    print("大气折射率: ", n)

2.1 Compile Python source code (.py file) to bytecode (.pyc file)

(1) Generate .pyc
to compile Python code into bytecode, although this does not make the code completely safe, it does increase the difficulty of reverse engineering. This can be achieved using the compileall module. Execute the following code in the path where the terminal edlen.py is located:

python -m compileall edlen.py

The .pyc files are located inside the pycache folder. The compiled filename usually includes the original filename, the Python version, and the .pyc extension. The pycache folder located in the directory where the command was executed . This folder is invisible in Pycharm and can be browsed in the file directory.

(2) Use .pyc :
When you try to import a module, Python will use the .pyc file first (if it exists) before considering the .py file. The premise is to make sure that the pycache folder where the .pyc file is located is in the same directory as the script that will import it.

We just compiled edlen.py into edlen.cpython-36.pyc and put it in the pycache folder. Now, you can create a new Python script (say main.py) in the same directory and import the edlen module in it:

from edlen import edlen_equation

# 测试
temperature = 20  # 气温,单位:摄氏度
pressure = 101325  # 气压,单位:帕斯卡
humidity = 50  # 相对湿度,单位:百分比
wavelength = 532e-9  # ATLAS激光波长,单位:米(例如 532 纳米)

n = edlen_equation(temperature, pressure, humidity, wavelength)
print("大气折射率: ", n)

Note that .pyc files improve code execution speed, but do not effectively protect source code.

2.2 Compiling Python code into a binary executable

Use tools such as Nuitka or PyInstaller to compile the Python code into a platform-specific binary executable. This will make reverse engineering more difficult, but still not completely prevent reverse engineering.
(1) Compile Python code into .so file
Cython can compile Python code into C code, and then compile C code into shared library (.so file). First, you need to install Cython. Run the following command in your virtual environment:

pip install cython

Next, create a file called edlen.pyx and copy the contents of edlen.py into this file. A .pyx file is the file format Cython uses to compile Python code. Then, create a file called setup.py with the following content:

from setuptools import setup
from Cython.Build import cythonize

setup(
    ext_modules=cythonize("edlen.pyx", language_level=3),
)

Run the following command in the terminal to compile the edlen.pyx file:

python setup.py build_ext --inplace

This will generate a file named edlen.cpython-36m-x86_64-linux-gnu.so (named with Python version and OS platform). The path structure is as follows:

├── build
│   ├── lib.linux-x86_64-3.6
│   │   └── edlen.cpython-36m-x86_64-linux-gnu.so
│   └── temp.linux-x86_64-3.6
│       └── edlen.o
├── edlen.c
├── edlen.cpython-36m-x86_64-linux-gnu.so
├── edlen.py
├── edlen.pyx
├── main.py
├── __pycache__
└── setup.py

(2) Use this .so file

To use this .so file in another Python script, simply import it like a normal Python module:

from edlen import edlen_equation

# 测试
temperature = 20  # 气温,单位:摄氏度
pressure = 101325  # 气压,单位:帕斯卡
humidity = 50  # 相对湿度,单位:百分比
wavelength = 532e-9  # ATLAS激光波长,单位:米(例如 532 纳米)

n = edlen_equation(temperature, pressure, humidity, wavelength)
print("大气折射率: ", n)

2.3 Code obfuscation

Obfuscates Python source code to make it harder to read and understand. You can use tools like pyarmor to achieve code obfuscation. Please note that code obfuscation only increases the difficulty of reverse engineering code, not absolute security.
(1) Use PyArmor for code obfuscation
First, install PyArmor: pip install pyarmor
run the pyarmor command to obfuscate the code: pyarmor obfuscate edlen.py
the obfuscated code will be located in a new folder named dist. The edlen.py file in the folder has been obfuscated. Additionally, this folder contains other files required by PyArmor to run.
(2) Use the obfuscated code
To use the obfuscated edlen module, you need to write a new Python script in the dist folder, such as main.py, and import the obfuscated edlen module in it.

from edlen import edlen_equation

# 测试
temperature = 20  # 气温,单位:摄氏度
pressure = 101325  # 气压,单位:帕斯卡
humidity = 50  # 相对湿度,单位:百分比
wavelength = 532e-9  # ATLAS激光波长,单位:米(例如 532 纳米)

n = edlen_equation(temperature, pressure, humidity, wavelength)
print("大气折射率: ", n)

Note that the obfuscated code depends on the PyArmor runtime library. So when deploying obfuscated code, make sure to deploy the entire dist folder along with it. The method here only increases the difficulty of reverse engineering, and does not ensure that the code is completely safe. For enhanced security, a combination of methods can be used, such as compiling code to bytecode, binary executables, or C/C++ extension modules.

2.4 Implement core functions as C/C++ extension modules

Implement key functions in your code, such as core algorithms, as C or C++ extension modules and compile them as shared library files (.so or .dll). This will increase the difficulty of reverse engineering, because reverse engineers need to deal with machine code instead of Python code.

After compiling into C/C++ extension modules, reverse engineers need to deal with assembly code and underlying machine code, which is much more difficult than directly analyzing Python bytecode. Additionally, C/C++ code offers more optimization options that can further increase the difficulty of reverse engineering.

2.5 License protection

Use a license management system to protect your software. That way, even if someone were able to reverse engineer your code, they'd need a valid license to use your software. This will help limit unauthorized use.

3. Summary

3.1 Implemented as a C/C++ extension module or a Python script .so file, which one is more difficult to reverse

When manually converting Python code to a C/C++ extension module, you are actually manually converting the Python code to lower-level C/C++ code, which is then compiled into a shared library (.so or .dll file) . Since C/C++ provides more optimization options, more complex operations can be performed on the code, making reverse engineering more difficult. In this case, the reverse engineer needs to deal with the assembly code and the underlying machine code, which is much more difficult than analyzing the Python bytecode directly.

However, when Python code is compiled directly into a .so file, the Python interpreter compiles the code into Python bytecode, which is then converted into a shared library. This process does not involve conversion of C/C++ code. While the generated .so files also provide increased code protection, reverse engineering is less difficult than manually converting the core functionality into C/C++ extension modules. Python bytecode is generally easier to analyze and understand than the underlying machine code.

In summary, manually implementing core functionality as C/C++ extension modules provides a higher degree of protection because it allows you to do more complex manipulation and optimization of your code. Note, however, that no code can ever be completely protected against reverse engineering, so it's best practice to employ multiple protections such as code obfuscation, encryption, and use of native extensions.

particularly interesting reference

How to keep source code secret to prevent reverse engineering using python language?

Guess you like

Origin blog.csdn.net/wokaowokaowokao12345/article/details/129745664