Talking about Android development code obfuscation technology

 

 

 

With the rapid development of the mobile Internet, application security problems continue to emerge, so more and more application developers transfer the core code from the java layer to the native layer to fight against mature java reverse analysis tools. If the code is not protected in any way, it is relatively easy for reverse analysis workers to obtain its running logic to complete application cracking or perform other operations. So is there any good way to improve the security of native code? The answer is yes, today we will introduce an effective method against native layer code analysis - code obfuscation technology.

So, what is code obfuscation? The academic definition of code obfuscation is as follows:

  Code obfuscation refers to converting the code of a computer program into a functional equivalence. The so-called functional equivalence refers to the same or similar functions before and after the transformation. The explanation is as follows: Program P is transformed into P' after obfuscation. If P does not end or ends incorrectly, then P' cannot end or ends incorrectly; and the result of program P' should have the same output as program P. Otherwise P' is not a valid obfuscation of P.

  At present, the classification of obfuscation is generally based on Collberg's theory, which is divided into four types: layout obfuscation, data obfuscation, control obfuscation and preventive obfuscation.

1. Layout confusion

  Layout obfuscation refers to deleting or obfuscating auxiliary text information irrelevant to execution in software source code or intermediate code, making it more difficult for attackers to read and understand the code. The comment text and debugging information in the software source code can be deleted directly, and the code or data structure such as methods and classes that are not used can also be deleted, which can make it difficult for attackers to understand the semantics of the code, and can also reduce the size of the software and improve the performance of the software. Efficiency of software loading and execution. The naming rules and literal meanings of identifiers such as constant names, variable names, class names, and method names in software codes are beneficial to attackers' understanding of the code. Layout obfuscation increases the difficulty of attackers' understanding of software codes by obfuscating these identifiers. There are many ways to obfuscate identifiers, such as hash function naming, identifier swapping, and overload induction. The naming of the hash function is to simply replace the original identifier string with the hash value of the string, so that the identifier string is not related to the software code; the identifier exchange refers to collecting all the identifiers in the software code first. character string, and then randomly assigned to different identifiers, this method is not easy to be detected by attackers; overload induction refers to the use of some features in high-level programming language naming rules, such as variable names in different namespaces can be the same , so that different identifiers in the software use the same string as much as possible, which increases the difficulty for attackers to understand the software source code. Layout obfuscation is the simplest obfuscation method, which does not change the code and execution of the software.

2. Data Obfuscation

  Data obfuscation is to modify the data field in the program without processing the code segment. Common data obfuscation methods include merging variables, splitting variables, array reorganization, and string encryption.

  Merging variables is to combine several variables into one data, each of the original variables occupies one of the areas, similar to a large data structure. Splitting a variable is to split a variable into two variables, provide a mapping relationship before and after the split, and convert the operation on one variable into the operation on the two variables after the split.

  There are several ways of array reorganization, such as dividing, merging, folding and smoothing of arrays. Splitting is dividing an array into 2 or more arrays of the same dimension; merging is the opposite; folding is increasing the dimension of an array; smoothing is the opposite.

  In ELF files, global variables and constant strings are stored in the data segment, and disassembly tools can easily find the reference relationship between strings and codes. In software cracking, it is easy to find the key statements of the code through some string prompts, so as to crack the software. String encryption can store these obvious strings encrypted and decrypt them when needed.

3. Control Obfuscation

  Control obfuscation, also known as process obfuscation, is to change the execution process of the program, thereby interrupting the reverse analyst's tracking ideas and achieving the purpose of protecting the software. Commonly used techniques include inserting instructions, masquerading conditional statements, and breakpoints. A disguised conditional statement is when the program is executed sequentially from A to B, and a conditional judgment is added between A and B after obfuscation, so that after A is executed, it outputs TRUE or FALSE, but no matter how it is output, B will definitely be executed.

  There are many methods used to control obfuscation, such as fuzzy predicates, inline outreach, and breaking order.

  Fuzzy predicates use the principle of message asymmetry. When adding fuzzy predicates, its value is known to the obfuscator, but it is difficult to infer it for the deobfuscator. So adding it will interfere with the disassembler's analysis of the value. The use of fuzzy predicates is generally inserted in some dead or irrelevant code (bogus code), or inserted in loops or branch statements, interrupting the flow of program execution.

  Inline (in-line) is to embed a small piece of program into each program point that is called, and out-line (out-line) is to abstract a piece of code without any logical connection into a program that can be called multiple times.

Breaking the order means breaking the local dependencies of the program. Since programmers tend to put related codes together, changing the program space structure by breaking the sequence will increase the thinking jump of the cracker.

4. Prevent confusion

  Anti-obfuscation is generally designed for special decompilers, and the purpose is to prevent decompilation by such decompilers. It is specially designed to exploit the weaknesses of a specific decompiler or deobfuscator. Obfuscation prevention is very effective for a specific decompiler, so it is necessary to comprehensively utilize the characteristics of various decompilers to design.

  市面的安全服务供应商如腾讯御安全,所提供的保护方案提供了以上所述四种混淆分类的多维度的保护;布局混淆方面:提供了针对native代码层中的函数名进行了混淆删除调试信息等功能;数据混淆方面:提供了针对常量字符串加密及全局变量的混淆的功能;控制混淆方面:针对代码流程上,提供了扁平化,插入bogus 分支以及代码等价变换等功能;预防混淆方面:在混淆过程中加入了针对主流反编译器的预防混淆的代码,能够有效地抵抗其分析。此外还对应用开发者提供不同等级的保护力度及多种混淆方式的功能的选择,用户可以根据自己的需求定制不同的混淆功能保护。

安全保护方案除了提供代码混淆保护方面的技术,还提供代码虚拟化技术及反逆向、反调试等其他应用安全加固方案,综合使用多种代码保护方案可以有效地提高应用代码安全。

腾讯御安全技术团队

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326641865&siteId=291194637