Network security | Penetration testing entry learning, from zero basic entry to proficiency - static analysis technology detailed explanation

Table of contents

foreword 

 1. File type analysis

 2. Disassembly engine

2.1、OllyDbg的ODDisasm

 2.2、BeaEngine

 2.3、Udis86

2.5、AsmJit

 2.6、Keystone

 2.7 Summary


foreword 

Programs written in high-level languages ​​come in two forms. A program is compiled into machine language to be executed on the CPU, such as Visual C++. Machine language and assembly language are almost corresponding, therefore, machine language can be converted into assembly language, this process is called disassembly (Disassembler). For example, in the x86 system, the assembly language corresponding to the machine code "EB" is "jmp short xx". Another kind of program is executed while explaining. The language used to write this program is called interpreted language, such as Visual Basic 5.0/6.0, Java. The compiled program of this type of language can be restored to the original structure of the high-level language. This process is called decompiler (Decompiler).

The so-called static analysis refers to obtaining the program assembly code or source code by means of disassembly and decompilation, and then analyzing the flow of the program according to the program list to understand the functions completed by the module.

 

 1. File type analysis

The first step in reverse analysis of a program is to analyze the type of the program, understand what language the program is written in or what compiler is used to compile it, and whether the program has been processed by some kind of encryption program, so that the next step can be targeted. This analysis process requires the assistance of file analysis tools. Common file analysis tools include PEiDExeinfoPE, etc. Such tools can detect most compiled languages, viruses and encryption software. This section uses PEiD as an example to briefly explain their usage.
PEiD is a commonly used file detection and analysis tool with a GUI interface. It detects most compiled languages, viruses and encrypted shells. As shown in the figure below, the analyzed files are compiled with Microsoft Visual C++5.0/6.0. For files that cannot be analyzed, it may report "PEWin GUT" ("Win CUI" is the general name of the Windows graphical user interface program. When using it through Check the "RegisterShellExtensions" option in the "Options" menu to add the corresponding option in the right-click shortcut menu.

Reminder: PEiD here recommends using the My Love version, use it with confidence and safety. To download, please download from Wuwuai official website.

 File analysis tools such as PEiD use feature searches to complete the identification work. Various development languages ​​have fixed startup codes, which can be used to identify which language the program is compiled from. The program processed by the encryption program will leave information about the encryption software, which can be used to identify what kind of software the program is encrypted by.

The one shown below is unencrypted.


PEiD provides an extended interface file userdb.txt, users can customize some feature codes, so that new file types can be recognized. The creation of the signature can be completed with the Add Sigmature plug-in, and if necessary, it must be corrected with the help of a debugger such as 0llyDbg.


In order to deceive file identification software such as PEiD, some shell programs will remove some packing information and forge startup codes. For example, changing the entry code to a code similar to that programmed by VisuaC++6.0 can achieve the purpose of deception. Therefore, the results given by the file identification tool can only be used as a reference. As for whether the file has been packed, it can only be known by tracking and analyzing the program code. 

 2. Disassembly engine

Assembly engines and disassembly engines are often used in the development of security software and protection software, such as 0llyDhg, IDAVMProtect, packer, and decompiler. The function of disassembly is to parse the machine code into assembly instructions. Developing a disassembly engine requires a deep understanding of Intel's 386 machine instruction encoding. However, it is generally not necessary to develop a disassembly engine yourself. There are many open source or paid disassembly engines available on the Internet. The current mainstream open source x86-64 assembly engine and disassembly engine have their own advantages in different usage scenarios. The following is a comparison of commonly used assembly engines and disassembly engines. The disassembly engines include ODDisasm, BeaEngine, Udis86, and Capstone, and the assembly triggers include ODAssemhler, Keystone, and AsmJit.

2.1、OllyDbg的ODDisasm

OllyDbg's built-in disassembly engine ODDisasm has the advantage of having an assembly interface (that is, the text solution parses text strings and encodes them into binary values). This feature was once unique. The function of the debugger x64_dbg that appeared in recent years is similar to the text analysis function of 0llyDbg. It supports a more complete instruction set, fewer bugs, and supports the x64 platform.

There are many disadvantages of ODDisasm, examples are as follows.

  • The supported instruction set is incomplete. 0llyDbg is no longer updated. Insufficient support for the MMX instruction set and multiple versions of InteAMD's extended instruction set standard have been updated, so it cannot parse SSE5, AVX, and AESXOP instruction sets.
  • The decoded structure is not detailed enough. For example, the support for instruction prefixes is not friendly enough, which can be seen from the disassembly window of ollyDbg (except for moscmps and other instructions, repcc is displayed separately when combined with other instructions).
  • The author no longer maintains the open source version after opening the source once, and it is difficult to fix the bugs in the disassembly in time.
  • Assembly and disassembly of 64-bit instructions are not supported.

However, the existence of these shortcomings is also understandable, because the purpose of the author's development of DDisasm is to perform text assembly and disassembly, so there is no structure and interface for the decoded information. Overall, the ODDisasm disassembly engine is way behind the times.

 2.2、BeaEngine

BeaEngine has no obvious shortcomings, and the extended instruction sets that can be parsed include FPU, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, VMX, CLMUL, AES, and MPX. BeaEngine classifies the instructions in order to judge different instructions. Another feature of BeaEngine is that it can decode the registers used and affected by each instruction, including the flag register, and even accurately decode all positions of the flag register. This function is very advantageous for use as an optimizer and obfuscator.

In addition to supporting disassembly of x86 instructions, BeaEngine also supports disassembly of x64 instructions. BeaEngine's coding style is a bit messy, such as casting various variables and using multiple naming styles. If you don't care about these, BeaEngine's performance is still good.

 2.3、Udis86

Udis86 is a popular disassembly engine that supports x86 extensions including MMX, FPU (x87), AMD3D Nowl, SSE, SSE2, SSE3, SSSE3, SSE4. INTEL-VMX and SMXUdis86 not only support the disassembly of x86 instructions, but also support the disassembly of x64 instructions. The code style of Udis86 is streamlined, the functions are short and small, the variable naming and interface are clean, simple and flexible. If you need to maintain a branch by yourself, you can get familiar with the entire code structure in tens of minutes by using Udis86.

The advantage of Udis86 is that the interface is flexible. You can use the ud_decode function to decode only one instruction, and then use the ud_translate_intel function to convert the decoded structure into a text format. You can also directly use the ud_disassemble function to complete all operations at one time. These interfaces only need one line of code. will be able to achieve.

This combination mode design concept of Udis86 makes it suitable for various scenarios. For example, develop a disassembler like IDA, and develop instruction simulators, analyzers, optimizer mixers, etc. This philosophy allows Udis86 to take into account performance while possessing strong adaptability. With similar decoding details and capabilities, Udis86 is the fastest disassembly engine to decode.

2.4、Capstone

Capstone can be said to be the master of all disassembly references. Because Capstone is transplanted from part of the MC component of the LLVM framework, the CPU architectures supported by LLVM also support Capstone. The CPU architectures supported by Capstone include ARM, ARM64 (ARMv8), M68K, MIPS, PowerPC, SPARC Systemz, TMS320C64X, XCORE, x86 (including x86-64) Moreover, Capstone's support for the x86 architecture instruction set is the most comprehensive, which is unmatched by other engines. AVX512CD, AVX512ERAVX512F, AVX512PF, BMI, BMI2, FMA, FMA4, FSCSBASE, LZCNT, MMX, SCX, SHA, SLM SSE, SSE2, SSE3, SSE4.1,
SSE4.2, SSE4A, SSSE3, TBM, XOP in the current mobile terminal Under the background of hot development, there are few disassembly libraries supporting ARM. If you want to develop compilers under x86 and ARM at the same time, it is better to use a unified interface. From the perspective of the x86-64 platform, whether it is decoding capability or instruction set support, Capstone completely surpasses BeaEngine.

2.5、AsmJit

AsmJit is a complete JIT assembler and compiler packaged in C++. It can generate native assembly instructions for x86 and x64 architectures, and supports x86 and x64 instruction sets including MMX, SSExBMIxADXTBMXOPAVXxFMAxAVX512, etc.


AsmJit is different from the open source libraries introduced above. It does not disassemble and analyze binary instructions like BeaEngine, Udis86, and Capstone. It is just an assembler. Compared to OllyDbg's assembler XEDParse (these are text-based assemblers) AsmJit also assembles in a completely different way. A simple example is as follows.

#include <asmjit/asmjit.h>

using namespace asmjit;

int main(int argc, char* argv[]) {
  // Create JitRuntime and X86 Compiler.
  JitRuntime runtime;
  X86Compiler c(&runtime);

  // Build function having two arguments and a return value of type 'int'.
  // First type in function builder describes the return value. kFuncConvHost
  // tells compiler to use a host calling convention.
  c.addFunc(kFuncConvHost, FuncBuilder2<int, int, int>());

  // Create 32-bit variables (virtual registers) and assign some names to
  // them. Using names is purely optional and only greatly helps while
  // debugging.
  X86GpVar a(c, kVarTypeInt32, "a");
  X86GpVar b(c, kVarTypeInt32, "b");

  // Tell asmjit to use these variables as function arguments.
  c.setArg(0, a);
  c.setArg(1, b);

  // a = a + b;
  c.add(a, b);

  // Tell asmjit to return 'a'.
  c.ret(a);

  // Finalize the current function.
  c.endFunc();

  // Now the Compiler contains the whole function, but the code is not yet
  // generated. To tell compiler to generate the function make() has to be
  // called.

  // Make uses the JitRuntime passed to Compiler constructor to allocate a
  // buffer for the function and make it executable.
  void* funcPtr = c.make();

  // In order to run 'funcPtr' it has to be casted to the desired type.
  // Typedef is a recommended and safe way to create a function-type.
  typedef int (*FuncType)(int, int);

  // Using asmjit_cast is purely optional, it's basically a C-style cast
  // that tries to make it visible that a function-type is returned.
  FuncType func = asmjit_cast<FuncType>(funcPtr);

  // Finally, run it and do something with the result...
  int x = func(1, 2);
  printf("x=%d\n", x); // Outputs "x=3".

  // The function will remain in memory after Compiler is destroyed, but
  // will be destroyed together with Runtime. This is just simple example
  // where we can just destroy both at the end of the scope and that's it.
  // However, it's a good practice to clean-up resources after they are
  // not needed and using runtime.release() is the preferred way to free
  // a function added to JitRuntime.
  runtime.release((void*)func);

  // Runtime and Compiler will be destroyed at the end of the scope.
  return 0;
}

 2.6、Keystone

Keystone and Capstone are the same series of engines, developed by the same maintainer. Capstone is mainly responsible for the disassembly of cross-platform multi-instruction sets and Keystone is mainly responsible for the compilation of cross-platform multi-instruction sets. Like 0llyDbg's assembler, Keystone only supports text assembly and does not support functional assembly like AsmJit.

Keystone is also transplanted from a part of the MC component in the LLVM framework, so the CPU architectures supported by LLVM are also supported by Keystone. The CPU architectures supported by Keystone include ARM, ARM64 (AArch64/ARMv8), HexagonMIPS, PowerPC, SPARC, and Systemzx86 (including 16-bit, 32-bit, and 64-bit).

 2.7 Summary

There are also niche disassembly engines such as XDELDsm. XDE's code is small and flexible, and many small software like to use it. A length disassembly engine in Blackbone is also worth mentioning. The name is "Ldasm". In fact, it is not an engine because it has only one function, and its function is only to calculate the length of an instruction, but it is very useful in the relocation jump instruction of Hook. it works.

The following is a comparative analysis of three commonly used disassembly engines, Udis86BeaEngine and Capstone.

  • Performance: Udis86>BeaEngine>Capstone.
  • Decoding capability: Capstone> BeaEngine> Udis86 (Udis86 does not support register analysis and other decoding capabilities are similar).
  • Platform support: Capstone>Udis86; Udis86=BeaEngine.
  • x86 extended instruction set: Capstone>Udis86;Udis86=BeaEnginea

Guess you like

Origin blog.csdn.net/qq_22903531/article/details/131412809