Talking about Android development code obfuscation technology

 

 

 

With the rapid development of the mobile Internet, application security problems continue to emerge, so more and more application developers transfer the core code from the java layer to the native layer to fight against mature java reverse analysis tools. If the code is not protected in any way, it is relatively easy for reverse analysis workers to obtain its running logic to complete application cracking or perform other operations. So is there any good way to improve the security of native code? The answer is yes, today we will introduce an effective method against native layer code analysis - code obfuscation technology.

So, what is code obfuscation? The academic definition of code obfuscation is as follows:

  Code obfuscation refers to converting the code of a computer program into a functional equivalence. The so-called functional equivalence refers to the same or similar functions before and after the transformation. The explanation is as follows: Program P is transformed into P' after obfuscation. If P does not end or ends incorrectly, then P' cannot end or ends incorrectly; and the result of program P' should have the same output as program P. Otherwise P' is not a valid obfuscation of P.

  At present, the classification of obfuscation is generally based on Collberg's theory, which is divided into four types: layout obfuscation, data obfuscation, control obfuscation and preventive obfuscation.

1. Layout confusion

  Layout obfuscation refers to deleting or obfuscating auxiliary text information irrelevant to execution in software source code or intermediate code, making it more difficult for attackers to read and understand the code. The comment text and debugging information in the software source code can be deleted directly, and the code or data structure such as methods and classes that are not used can also be deleted, which can make it difficult for attackers to understand the semantics of the code, and can also reduce the size of the software and improve the performance of the software. Efficiency of software loading and execution. The naming rules and literal meanings of identifiers such as constant names, variable names, class names, and method names in software codes are beneficial to attackers' understanding of the code. Layout obfuscation increases the difficulty of attackers' understanding of software codes by obfuscating these identifiers. There are many ways to obfuscate identifiers, such as hash function naming, identifier swapping, and overload induction. The naming of the hash function is to simply replace the original identifier string with the hash value of the string, so that the identifier string is not related to the software code; the identifier exchange refers to collecting all the identifiers in the software code first. character string, and then randomly assigned to different identifiers, this method is not easy to be detected by attackers; overload induction refers to the use of some features in high-level programming language naming rules, such as variable names in different namespaces can be the same , so that different identifiers in the software use the same string as much as possible, which increases the difficulty for attackers to understand the software source code. Layout obfuscation is the simplest obfuscation method, which does not change the code and execution of the software.

2. Data Obfuscation

  Data obfuscation is to modify the data field in the program without processing the code segment. Common data obfuscation methods include merging variables, splitting variables, array reorganization, and string encryption.

  Merging variables is to combine several variables into one data, each of the original variables occupies one of the areas, similar to a large data structure. Splitting a variable is to split a variable into two variables, provide a mapping relationship before and after the split, and convert the operation on one variable into the operation on the two variables after the split.

  There are several ways of array reorganization, such as dividing, merging, folding and smoothing of arrays. Splitting is dividing an array into 2 or more arrays of the same dimension; merging is the opposite; folding is increasing the dimension of an array; smoothing is the opposite.

  In ELF files, global variables and constant strings are stored in the data segment, and disassembly tools can easily find the reference relationship between strings and codes. In software cracking, it is easy to find the key statements of the code through some string prompts, so as to crack the software. String encryption can store these obvious strings encrypted and decrypt them when needed.

3.控制混淆

  控制混淆也称流程混淆,它是改变程序的执行流程,从而打断逆向分析人员的跟踪思路,达到保护软件的目的。一般采用的技术有插入指令、伪装条件语句、断点等。伪装条件语句是当程序顺序执行从A到B,混淆后在A和B之间加入条件判断,使A执行完后输出TRUE或FALSE,但不论怎么输出,B一定会执行。

  控制混淆采用比较多的还有模糊谓词、内嵌外联、打破顺序等方法。

  模糊谓词是利用消息不对称的原理,在加入模糊谓词时其值对混淆者是已知的,而对反混淆者却很难推知。所以加入后将干扰反汇编者对值的分析。模糊谓词的使用一般是插入一些死的或不相关的代码(bogus code),或者是插入在循环或分支语句中,打断程序执行流程。

  内嵌(in-line)是将一小段程序嵌入到被调用的每一个程序点,外联(out-line)是将没有任何逻辑联系的一段代码抽象成一段可被多次调用的程序。

打破顺序是指打破程序的局部相关性。由于程序员往往倾向于把相关代码放在一起,通过打破顺序改变程序空间结构,将加大破解者的思维跳跃。

4.预防混淆

  预防混淆一般是针对专用的反编译器设计的,目的就是预防被这类反编译器反编译。他是利用特定的反编译器或反混淆器的弱点进行专门设计。预防混淆对于特定的反编译器非常有效,所以在使用时要综合利用各种反编译器的特点进行设计。

  市面的安全服务供应商如腾讯御安全,所提供的保护方案提供了以上所述四种混淆分类的多维度的保护;布局混淆方面:提供了针对native代码层中的函数名进行了混淆删除调试信息等功能;数据混淆方面:提供了针对常量字符串加密及全局变量的混淆的功能;控制混淆方面:针对代码流程上,提供了扁平化,插入bogus 分支以及代码等价变换等功能;预防混淆方面:在混淆过程中加入了针对主流反编译器的预防混淆的代码,能够有效地抵抗其分析。此外还对应用开发者提供不同等级的保护力度及多种混淆方式的功能的选择,用户可以根据自己的需求定制不同的混淆功能保护。

安全保护方案除了提供代码混淆保护方面的技术,还提供代码虚拟化技术及反逆向、反调试等其他应用安全加固方案,综合使用多种代码保护方案可以有效地提高应用代码安全。

腾讯御安全技术团队

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326516599&siteId=291194637