Introduction to Python control flow code obfuscation, making it more difficult for others to analyze your code logic and flow

 

Introduction to Python control flow code obfuscation, making it more difficult for others to analyze your code logic and flow

 

Foreword

The text and pictures of the text come from the Internet and are only for learning and communication. They do not have any commercial use. The copyright belongs to the original author. If you have any questions, please contact us in time for processing.

Author: Wang Ping

PS: If you need Python learning materials, you can click the link below to obtain http://t.cn/A6Zvjdun

Let's talk about Python code obfuscation. It feels nonsensical to obfuscate Python code, but there are some needs for outsourcing project delivery.

The purpose of obfuscation is to make it more difficult for others to analyze the logic and flow of your code, so that the code looks messy and the logic is chaotic. But the program must be able to run normally.

General confusion

The simple point of confusion with Python code is to obfuscate variable names / class names / strings / constants, making the name very long or approximate.

There are many such obfuscation libraries, such as the Intensio-Obfuscator library. This library is divided into simple and complex obfuscation. Let's take a look at using its simple mode to obfuscate Python code:

Introduction to Python control flow code obfuscation, making it more difficult for others to analyze your code logic and flow

 

The left side is before the confusion, and the right side is after the confusion, but only the variable name and method name are confused and lengthened.

The meaning of this simple confusion is not significant, both strings and constants are clear at a glance. The structure of the code depends on static analysis, and the context of the code is still clear.

A little more complicated confusion is to hide the key code and add some invalid code to the code.

Or the complex obfuscation mode of Intensio-Obfuscator library, let's take a look:

 

Introduction to Python control flow code obfuscation, making it more difficult for others to analyze your code logic and flow

 

At first glance, on the right, it doesn't seem to be a Python code. In fact, the string on the right is the Python code on the left, but only the unicode code. Because Python has a built-in function exec () that can execute string programs, like this:

>> exec("1+1")>> 2

Let's print the contents of this string to utf8 and see the contents:

Introduction to Python control flow code obfuscation, making it more difficult for others to analyze your code logic and flow

 

 

As shown in the figure above, its confusion is that the variable name is made longer, and the second is that some interference code is added to the code. Looking at the red mark, the original code originally had no for and if statements, and the obfuscated code is available. It seems that it is very difficult to statically analyze this code. In fact, if the variable names are renamed and shortened, this part of the extra for and if through static analysis is still easier to jump over.

To summarize the complex obfuscation mode of the Intensio-Obfuscator library, first the code variable function name is made very long, then the invalid code is added to the code, and finally the source code is compressed as a string, which is executed with exec.

Abstract syntax tree obfuscation

The above obfuscation method is relatively simple, and can be de-obfuscated through static analysis. A more complicated confusion is control flow confusion. Usually the execution flow of the program is very organized, and the control flow confusion is to confuse the execution flow of the program.

For example, there are a lot of while for if and even lamdb statements in the code, which change the assignment, addition and subtraction operations into bit operations and so on. It is hard to see what the purpose and logic of the code is through static analysis.

How to achieve control flow confusion, through the abstract syntax tree (AST), through the abstract syntax tree, you can do the program to modify the program. Through the abstract syntax tree, you can know exactly what the program is doing, so you can modify the code very accurately.

Let's take a look at the simple example of confusing the program through the abstract syntax tree, or take the above program as an example.

 

Introduction to Python control flow code obfuscation, making it more difficult for others to analyze your code logic and flow

 

On the left is the code before obfuscation, and on the back is the post-confusion. This example also confuses variable names, and then confuses strings and constants, and imports. The difficulty of anti-aliasing is a bit greater than the above. It is necessary to know what the program is doing through dynamic debugging.

What is an abstract syntax tree

Knowing the meaning is to abstract the program into a tree, and the statements in the code are split into nodes on the tree. There is an AST module in Python to do this, or the source code above, and see what it looks like after being split into nodes by AST.

 

Introduction to Python control flow code obfuscation, making it more difficult for others to analyze your code logic and flow

 

 

Introduction to Python control flow code obfuscation, making it more difficult for others to analyze your code logic and flow

 

The second picture is to create the first picture as an abstract syntax tree, and print the source code according to the nodes of the tree.

 

The red arrow indicates that there are Import node, Assign node, function node, addition node and so on. This tree can fully express the above procedure. We can modify the program by program by visiting this tree.

Introduction to Python control flow code obfuscation, making it more difficult for others to analyze your code logic and flow

 

 

Customize a class, inherit ast.NodeTransformer, for example, if you want to access the string, implement the visit_Str method, if you want to access Import, implement the visit_ImportFrom method. In the implementation method, you can use some obfuscation algorithms to obfuscate (note that it can only be obfuscated, not change the result). In this way, refinement and more complicated confusion can be achieved.

There is an ASTObfuscate third-party obfuscation library that obfuscates the code by operating AST, but there is no obfuscation of the program logic flow. To achieve more complicated control flow obfuscation, the parse tree must be fully implemented.

Of course, if Python code obfuscation is more difficult, it should be through obfuscating bytecode, or making key code into so file, which is more difficult to obfuscate. Both bytecode and so files are assembly instructions.


If you want to learn Python or are learning Python, there are a lot of Python tutorials, but is it the latest? Maybe you have learned something that people might have learned two years ago, and here I share a wave of the latest Python tutorials for 2020. How to get it, you can get it for free by editing the "Information" of the private letter!

 

Guess you like

Origin www.cnblogs.com/python0921/p/12694280.html