Full-featured binary file analysis tool Radare2

Guide

Radare2 is an open source tool customized for binary analysis. There are a large number of (non-native) Linux tools available for binary analysis, why choose Radare2?

Why do I need another tool?

If the existing Linux native tools can do similar things, you will naturally ask why you need another tool. Well, this is the same reason you use your mobile phone as an alarm clock, taking notes, making a camera, listening to music, surfing the Internet, and occasionally making and answering calls. In the past, separate devices and tools were used to handle these functions-such as a physical camera to take pictures, a small notepad to take notes, a bedside alarm to wake up, and so on. It is convenient for users to have one device to do multiple (but related) things . In addition, the killer feature is the interoperability between independent functions .

Similarly, even though many Linux tools have a specific purpose, it is very useful to bundle similar (and better) features in one tool. This is why I think Radare2 should be the tool of choice when you need to process binary files.

According to its GitHub profile, Radare2 (also known as r2) is a "reverse engineering framework and command- line tool set on Unix-like systems. " The "2" in its name is because this version was rewritten from scratch to make it more modular.

Why choose Radare2?

There are plenty of (non-native) Linux tools available for binary analysis, why choose Radare2? My reason is simple.

First of all, it is an open source project with an active and healthy community. This is important if you are looking for novel features or provide bug fixing tools.

Secondly, Radare2 can be used on the command line, and it has a feature-rich graphical user interface (GUI) environment called Cutter, suitable for those who are familiar with GUI. As a long-term Linux user, I am used to typing on the shell . Although there is a little learning curve to be familiar with Radare2 commands, I would compare it to learning Vim. You can learn the basic things first, and once you master them, you can continue to learn more advanced things. Soon, it became a muscle memory.

Third, Radare2 can support external tools well through plug-ins. For example, the recently open source Ghidra binary analysis and reverse engineering tool reversing tool is very popular because its decompiler function is a key element of the reverse software. You can install and use the Ghidra decompiler directly from the Radare2 console. This is amazing and gives you the best of both worlds.

Start using Radare2

To install Radare2, just clone its repository and run the user.sh script . If you do not have some preparatory packages on your system, you may need to install them. Once the installation is complete, run the r2 -v command to see if Radare2 is installed correctly:

$ git clone https://github.com/radareorg/radare2.git
$ cd radare2
$ ./sys/user.sh
 
# version
 
$ r2 -v
radare2 4.6.0-git 25266 @ linux-x86-64 git.4.4.0-930-g48047b317
commit: 48047b3171e6ed0480a71a04c3693a0650d03543 build: 2020-11-17__09:31:03
$

Obtain binary test samples

Now that r2 is installed, you need a sample binary program to try it out. You can use any system binary file (ls, bash, etc.), but to keep the content of this tutorial simple, please compile the following C program:

$ cat adder.c
#include <stdio.h>
 
int adder(int num) {
return num + 1;
}
 
int main() {
int res, num1 = 100;
res = adder(num1);
printf("Number now is : %d\n", res);
return 0;
}
$ gcc adder.c -o adder
$ file adder
adder: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=9d4366f7160e1ffb46b14466e8e0d70f10de2240, not stripped
$ ./adder
Number now is : 101

Load the binary file

To analyze a binary file, you must load it in Radare2. r2 Load it by providing the file name as a command line argument to the command. You will enter a separate Radare2 console, which is different from your shell. To exit the console, you can type Quit or Exit or press Ctrl+D:

$ r2 ./adder
 -- Learn pancake as if you were radare!
[0x004004b0]> quit
$

Analyze the binary

Before you explore the binary, you must let r2 analyze it for you. You can do this by running the aaa command in the r2 console:

$ r2 ./adder
 -- Learn pancake as if you were radare!
[0x004004b0]> quit
$

This means that every time you select a binary file for analysis, you must enter an additional command aaa after loading the binary file. You can bypass this and call r2 with -A after the command; this will tell r2 to automatically analyze the binary for you:

$ r2 -A ./adder
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Check for vtables
[x] Type matching analysis for all functions (aaft)
[x] Propagate noreturn information
[x] Use -AA or aaaa to perform additional experimental analysis.
 -- Already up-to-date.
[0x004004b0]>

Get some basic information about binary

Before you start analyzing a binary file, you need some background information. In many cases, this can be the format of the binary file (ELF, PE, etc.), the binary architecture (x86, AMD, ARM, etc.), and whether the binary is 32-bit or 64-bit. The convenient iI command of r2 can provide the required information:

[0x004004b0]> iI
arch x86
baddr 0x400000
binsz 14724
bintype elf
bits 64
canary false
class ELF64
compiler GCC: (GNU) 8.3.1 20190507 (Red Hat 8.3.1-4)
crypto false
endian little
havecode true
intrp /lib64/ld-linux-x86-64.so.2
laddr 0x0
lang c
linenum true
lsyms true
machine AMD x86-64 architecture
maxopsz 16
minopsz 1
nx true
os linux
pcalign 0
pic false
relocs true
relro partial
rpath NONE
sanitiz false
static false
stripped false
subsys linux
va true
 
[0x004004b0]>
[0x004004b0]>

Import and export

Usually, when you know what kind of file you are dealing with, you want to know what standard library functions the binary program uses, or understand the potential functions of the program. In the sample C program in this tutorial, the only library function is printf, which is used to print information. You can see this by running the ii command, which shows all the libraries imported by the binary:

[0x004004b0]> ii
[Imports]
nth vaddr bind type lib name
―――――――――――――――――――――――――――――――――――――
1 0x00000000 WEAK NOTYPE _ITM_deregisterTMCloneTable
2 0x004004a0 GLOBAL FUNC printf
3 0x00000000 GLOBAL FUNC __libc_start_main
4 0x00000000 WEAK NOTYPE __gmon_start__
5 0x00000000 WEAK NOTYPE _ITM_registerTMCloneTable

The binary can also have its own symbols, functions, or data. These functions are usually displayed under Exports. The binary of this test exports two functions: main and adder. The rest of the functions are added during the compilation phase when the binary file is built. The loader needs these functions to load binary files (don’t care about them too much for now):

[0x004004b0]>
[0x004004b0]> iE
[Exports]
 
nth paddr vaddr bind type size lib name
――――――――――――――――――――――――――――――――――――――――――――――――――――――
82 0x00000650 0x00400650 GLOBAL FUNC 5 __libc_csu_fini
85 ---------- 0x00601024 GLOBAL NOTYPE 0 _edata
86 0x00000658 0x00400658 GLOBAL FUNC 0 _fini
89 0x00001020 0x00601020 GLOBAL NOTYPE 0 __data_start
90 0x00000596 0x00400596 GLOBAL FUNC 15 adder
92 0x00000670 0x00400670 GLOBAL OBJ 0 __dso_handle
93 0x00000668 0x00400668 GLOBAL OBJ 4 _IO_stdin_used
94 0x000005e0 0x004005e0 GLOBAL FUNC 101 __libc_csu_init
95 ---------- 0x00601028 GLOBAL NOTYPE 0 _end
96 0x000004e0 0x004004e0 GLOBAL FUNC 5 _dl_relocate_static_pie
97 0x000004b0 0x004004b0 GLOBAL FUNC 47 _start
98 ---------- 0x00601024 GLOBAL NOTYPE 0 __bss_start
99 0x000005a5 0x004005a5 GLOBAL FUNC 55 main
100 ---------- 0x00601028 GLOBAL OBJ 0 __TMC_END__
102 0x00000468 0x00400468 GLOBAL FUNC 0 _init
 
[0x004004b0]>

Hash information

How to know if two binary files are similar? You can't just open a binary file and view the source code inside. In most cases, the hash value of a binary file (md5sum, sha1, sha256) is used to uniquely identify it. You can use the it command to find the binary hash value:

[0x004004b0]> it
md5 7e6732f2b11dec4a0c7612852cede670
sha1 d5fa848c4b53021f6570dd9b18d115595a2290ae
sha256 13dd5a492219dac1443a816ef5f91db8d149e8edbf26f24539c220861769e1c2
[0x004004b0]>

function

The codes are grouped by function; to list the functions that exist in the binary, run the afl command. The following list shows the main function and adder function. Usually, the functions starting with sym.imp are imported from the standard library (here, glibc):

[0x004004b0]> afl
0x004004b0    1 46           entry0
0x004004f0    4 41   -> 34   sym.deregister_tm_clones
0x00400520    4 57   -> 51   sym.register_tm_clones
0x00400560    3 33   -> 32   sym.__do_global_dtors_aux
0x00400590    1 6            entry.init0
0x00400650    1 5            sym.__libc_csu_fini
0x00400658    1 13           sym._fini
0x00400596    1 15           sym.adder
0x004005e0    4 101          loc..annobin_elf_init.c
0x004004e0    1 5            loc..annobin_static_reloc.c
0x004005a5    1 55           main
0x004004a0    1 6            sym.imp.printf
0x00400468    3 27           sym._init
[0x004004b0]>

cross reference

[0x004004b0]> axt sym.adder
main 0x4005b9 [CALL] call sym.adder
[0x004004b0]>
[0x004004b0]> axt main
entry0 0x4004d1 [DATA] mov rdi, main
[0x004004b0]>

Find location

When working with text files, you often move within the file by quoting line numbers and line or column numbers; in binary files, you need to use addresses. These are hexadecimal numbers starting with 0x, followed by an address. To find your position in the binary, run the s command. To move to a different location, use the s command, followed by the address.

The function name is like a label, and it is represented internally by an address. If the function name is in binary (not stripped), you can use the s command after the function name to jump to a specific function address. Similarly, if you want to jump to the beginning of the binary, enter s 0:

[0x004004b0]> s
0x4004b0
[0x004004b0]>
[0x004004b0]> s main
[0x004005a5]>
[0x004005a5]> s
0x4005a5
[0x004005a5]>
[0x004005a5]> s sym.adder
[0x00400596]>
[0x00400596]> s
0x400596
[0x00400596]>
[0x00400596]> s 0
[0x00000000]>
[0x00000000]> s
0x0
[0x00000000]>

Hexadecimal view

Normally, raw binary has no meaning. It is helpful to view binary and its equivalent ASCII notation in hexadecimal mode:

[0x004004b0]> s main
[0x004005a5]>
[0x004005a5]> px
- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x004005a5  5548 89e5 4883 ec10 c745 fc64 0000 008b  UH..H....E.d....
0x004005b5  45fc 89c7 e8d8 ffff ff89 45f8 8b45 f889  E.........E..E..
0x004005c5  c6bf 7806 4000 b800 0000 00e8 cbfe ffff  ..x.@...........
0x004005d5  b800 0000 00c9 c30f 1f40 00f3 0f1e fa41  [email protected]
0x004005e5  5749 89d7 4156 4989 f641 5541 89fd 4154  WI..AVI..AUA..AT
0x004005f5  4c8d 2504 0820 0055 488d 2d04 0820 0053  L.%.. .UH.-.. .S
0x00400605  4c29 e548 83ec 08e8 57fe ffff 48c1 fd03  L).H....W...H...
0x00400615  741f 31db 0f1f 8000 0000 004c 89fa 4c89  t.1........L..L.
0x00400625  f644 89ef 41ff 14dc 4883 c301 4839 dd75  .D..A...H...H9.u
0x00400635  ea48 83c4 085b 5d41 5c41 5d41 5e41 5fc3  .H...[]A\A]A^A_.
0x00400645  9066 2e0f 1f84 0000 0000 00f3 0f1e fac3  .f..............
0x00400655  0000 00f3 0f1e fa48 83ec 0848 83c4 08c3  .......H...H....
0x00400665  0000 0001 0002 0000 0000 0000 0000 0000  ................
0x00400675  0000 004e 756d 6265 7220 6e6f 7720 6973  ...Number now is
0x00400685  2020 3a20 2564 0a00 0000 0001 1b03 3b44    : %d........;D
0x00400695  0000 0007 0000 0000 feff ff88 0000 0020  ...............
[0x004005a5]>

Disassembly

If you are using a compiled binary file, you cannot view the source code. The compiler translates the source code into machine language instructions that the CPU can understand and execute; the result is a binary or executable file. However, you can look at the assembly instructions (mnemonics) to understand what the program is doing. For example, if you want to see what the main function is doing, you can use s main to find the address of the main function, and then run the pdf command to see the disassembled instructions.

To understand assembly instructions, you need to refer to the architecture manual (here x86), its application binary interface (ABI, or calling convention), and have a basic understanding of how the stack works:

[0x004004b0]> s main
[0x004005a5]>
[0x004005a5]> s
0x4005a5
[0x004005a5]>
[0x004005a5]> pdf
            ; DATA XREF from entry0 @ 0x4004d1
┌ 55: int main (int argc, char **argv, char **envp);
│           ; var int64_t var_8h @ rbp-0x8
│           ; var int64_t var_4h @ rbp-0x4
│           0x004005a5      55             push rbp
│           0x004005a6      4889e5         mov rbp, rsp
│           0x004005a9      4883ec10       sub rsp, 0x10
│           0x004005ad      c745fc640000.  mov dword [var_4h], 0x64    ; 'd' ; 100
│           0x004005b4      8b45fc         mov eax, dword [var_4h]
│           0x004005b7      89c7           mov edi, eax
│           0x004005b9      e8d8ffffff     call sym.adder
│           0x004005be      8945f8         mov dword [var_8h], eax
│           0x004005c1      8b45f8         mov eax, dword [var_8h]
│           0x004005c4      89c6           mov esi, eax
│           0x004005c6      bf78064000     mov edi, str.Number_now_is__:__d ; 0x400678 ; "Number now is  : %d\n" ; const char *format
│           0x004005cb      b800000000     mov eax, 0
│           0x004005d0      e8cbfeffff     call sym.imp.printf         ; int printf(const char *format)
│           0x004005d5      b800000000     mov eax, 0
│           0x004005da      c9             leave
└           0x004005db      c3             ret
[0x004005a5]>
这是 adder 函数的反汇编结果：

[0x004005a5]> s sym.adder
[0x00400596]>
[0x00400596]> s
0x400596
[0x00400596]>
[0x00400596]> pdf
            ; CALL XREF from main @ 0x4005b9
┌ 15: sym.adder (int64_t arg1);
│           ; var int64_t var_4h @ rbp-0x4
│           ; arg int64_t arg1 @ rdi
│           0x00400596      55             push rbp
│           0x00400597      4889e5         mov rbp, rsp
│           0x0040059a      897dfc         mov dword [var_4h], edi     ; arg1
│           0x0040059d      8b45fc         mov eax, dword [var_4h]
│           0x004005a0      83c001         add eax, 1
│           0x004005a3      5d             pop rbp
└           0x004005a4      c3             ret
[0x00400596]>

String

See which strings exist in the binary can be used as a starting point for binary analysis. Strings are hard-coded into the binary, and usually provide important hints that allow you to shift your focus to certain areas of the analysis. Run the iz command in the binary to list all the strings. There is only one hard-coded string in this test binary:

[0x004004b0]> iz
[Strings]
nth paddr      vaddr      len size section type  string
―――――――――――――――――――――――――――――――――――――――――――――――――――――――
0   0x00000678 0x00400678 20  21   .rodata ascii Number now is  : %d\n
 
[0x004004b0]>

Cross-reference string

As with functions, you can cross-reference strings, see where they are printed from, and understand the code surrounding them:

[0x004004b0]> ps @ 0x400678
Number now is  : %d
 
[0x004004b0]>
[0x004004b0]> axt 0x400678
main 0x4005c6 [DATA] mov edi, str.Number_now_is__:__d
[0x004004b0]>

Visual mode

When your code is complex and multiple functions are called, it is easy to get lost. It would be helpful to see which functions were called in a graphical or visual way, and which paths were taken according to certain conditions. After moving to the function of interest, you can use the VV command to explore the visualization mode of r2. For example, for the adder function:

[0x004004b0]> s sym.adder
[0x00400596]>
[0x00400596]> VV

Full-featured binary file analysis tool Radare2 Full-featured binary file analysis tool Radare2

Debugger

So far, what you have been doing is static analysis-you are just looking at the things in the binary file without running it. Sometimes you need to execute the binary file and analyze various information in the memory at runtime. R2's internal debugger allows you to run binary files, set breakpoints, analyze the value of variables, or dump the contents of registers.

Start the debugger with the -d flag, and add the -A flag for analysis when loading the binary. You can set breakpoints in different places, such as functions or memory addresses, by using the db <function-name> command. To view existing breakpoints, use the dbi command. Once you have placed a breakpoint, use the dc command to start running the binary file. You can use the dbt command to view the stack, which can display function calls. Finally, you can use the drr command to dump the contents of the register:

$ r2 -d -A ./adder
Process with PID 17453 started...
= attach 17453 17453
bin.baddr 0x00400000
Using 0x400000
asm.bits 64
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Check for vtables
[x] Type matching analysis for all functions (aaft)
[x] Propagate noreturn information
[x] Use -AA or aaaa to perform additional experimental analysis.
 -- git checkout hamster
[0x7f77b0a28030]>
[0x7f77b0a28030]> db main
[0x7f77b0a28030]>
[0x7f77b0a28030]> db sym.adder
[0x7f77b0a28030]>
[0x7f77b0a28030]> dbi
0 0x004005a5 E:1 T:0
1 0x00400596 E:1 T:0
[0x7f77b0a28030]>
[0x7f77b0a28030]> afl | grep main
0x004005a5    1 55           main
[0x7f77b0a28030]>
[0x7f77b0a28030]> afl | grep sym.adder
0x00400596    1 15           sym.adder
[0x7f77b0a28030]>
[0x7f77b0a28030]> dc
hit breakpoint at: 0x4005a5
[0x004005a5]>
[0x004005a5]> dbt
0  0x4005a5           sp: 0x0                 0    [main]  main sym.adder+15
1  0x7f77b0687873     sp: 0x7ffe35ff6858      0    [??]  section..gnu.build.attributes-1345820597
2  0x7f77b0a36e0a     sp: 0x7ffe35ff68e8      144  [??]  map.usr_lib64_ld_2.28.so.r_x+65034
[0x004005a5]> dc
hit breakpoint at: 0x400596
[0x00400596]> dbt
0  0x400596           sp: 0x0                 0    [sym.adder]  rip entry.init0+6
1  0x4005be           sp: 0x7ffe35ff6838      0    [main]  main+25
2  0x7f77b0687873     sp: 0x7ffe35ff6858      32   [??]  section..gnu.build.attributes-1345820597
3  0x7f77b0a36e0a     sp: 0x7ffe35ff68e8      144  [??]  map.usr_lib64_ld_2.28.so.r_x+65034
[0x00400596]>
[0x00400596]>
[0x00400596]> dr
rax = 0x00000064
rbx = 0x00000000
rcx = 0x7f77b0a21738
rdx = 0x7ffe35ff6948
r8 = 0x7f77b0a22da0
r9 = 0x7f77b0a22da0
r10 = 0x0000000f
r11 = 0x00000002
r12 = 0x004004b0
r13 = 0x7ffe35ff6930
r14 = 0x00000000
r15 = 0x00000000
rsi = 0x7ffe35ff6938
rdi = 0x00000064
rsp = 0x7ffe35ff6838
rbp = 0x7ffe35ff6850
rip = 0x00400596
rflags = 0x00000202
orax = 0xffffffffffffffff
[0x00400596]>

Decompiler

Being able to understand assembly is a prerequisite for binary analysis. Assembly language is always related to the architecture of the binary build and expected operation. There is never a 1:1 mapping between a line of source code and assembly code. Usually, one line of C source code will produce multiple lines of assembly code. Therefore, reading the assembly code line by line is not the best choice.

This is the role of the decompiler. They try to reconstruct possible source code based on assembly instructions. This is by no means exactly the same as the source code used to create the binary, it is an approximate representation of the source code based on assembly. In addition, the optimization performed by the compiler must be taken into consideration. It will generate different assembly codes to speed up the speed, reduce the size of the binary, etc., which will make the work of the decompiler more difficult. In addition, malware authors often deliberately obfuscate the code, discouraging malware analysts.

Radare2 provides a decompiler through a plug-in. You can install any decompiler supported by Radare2. Use the r2pm -l command to view the current plug-in. Use the r2pm install command to install an example decompiler r2dec:

$ r2pm  -l
$
$ r2pm install r2dec
Cloning into 'r2dec'...
remote: Enumerating objects: 100, done.
remote: Counting objects: 100% (100/100), done.
remote: Compressing objects: 100% (97/97), done.
remote: Total 100 (delta 18), reused 27 (delta 1), pack-reused 0
Receiving objects: 100% (100/100), 1.01 MiB | 1.31 MiB/s, done.
Resolving deltas: 100% (18/18), done.
Install Done For r2dec
gmake: Entering directory '/root/.local/share/radare2/r2pm/git/r2dec/p'
[CC] duktape/duktape.o
[CC] duktape/duk_console.o
[CC] core_pdd.o
[CC] core_pdd.so
gmake: Leaving directory '/root/.local/share/radare2/r2pm/git/r2dec/p'
$
$ r2pm  -l
r2dec
$

Decompiler view

To decompile a binary file, load the binary file in r2 and analyze it automatically. In this example, use the s sym.adder command to move to the adder function of interest, and then use the pdda command to view the assembled and decompiled source code side by side. Reading this decompiled source code is often easier than reading the assembly line by line:

$ r2 -A ./adder
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Check for vtables
[x] Type matching analysis for all functions (aaft)
[x] Propagate noreturn information
[x] Use -AA or aaaa to perform additional experimental analysis.
 -- What do you want to debug today?
[0x004004b0]>
[0x004004b0]> s sym.adder
[0x00400596]>
[0x00400596]> s
0x400596
[0x00400596]>
[0x00400596]> pdda
    ; assembly                               | /* r2dec pseudo code output */
                                             | /* ./adder @ 0x400596 */
                                             | #include &lt;stdint.h>
                                             |  
    ; (fcn) sym.adder ()                     | int32_t adder (int64_t arg1) {
                                             |     int64_t var_4h;
                                             |     rdi = arg1;
    0x00400596 push rbp                      |    
    0x00400597 mov rbp, rsp                  |    
    0x0040059a mov dword [rbp - 4], edi      |     *((rbp - 4)) = edi;
    0x0040059d mov eax, dword [rbp - 4]      |     eax = *((rbp - 4));
    0x004005a0 add eax, 1                    |     eax++;
    0x004005a3 pop rbp                       |    
    0x004005a4 ret                           |     return eax;
                                             | }
[0x00400596]>

Configuration settings

As you become more familiar with the use of Radare2, you will want to change its configuration to suit the way you work. You can use the e command to view the default configuration of r2. To set a specific configuration, add config = value after the e command:

[0x004005a5]> e | wc -l
593
[0x004005a5]> e | grep syntax
asm.syntax = intel
[0x004005a5]>
[0x004005a5]> e asm.syntax = att
[0x004005a5]>
[0x004005a5]> e | grep syntax
asm.syntax = att
[0x004005a5]>

To make configuration changes permanent, place them in a startup file named .radare2rc that is read when r2 starts. This file is usually in your home directory, if not, you can create one. Some example configuration options include:

$ cat ~/.radare2rc
e asm.syntax = att
e scr.utf8 = true
eco solarized
e cmd.stack = true
e stack.size = 256
$

Explore more

You have seen enough Radare2 features and have a certain understanding of this tool. Because Radare2 follows the Unix philosophy, even if you can do various things from its main console, it will use a set of independent binaries below to complete its tasks. Linux should be learned like this

Explore the individual binaries listed below to see how they work. For example, the binary information seen on the console with the iI command can also be found with the rabin2 <binary> command:

$ cd bin/
$
$ ls
prefix  r2agent    r2pm  rabin2   radiff2  ragg2    rarun2   rasm2
r2      r2-indent  r2r   radare2  rafind2  rahash2  rasign2  rax2
$

Full-featured binary file analysis tool Radare2

Guess you like