Guide | Radare2 is an open source tool customized for binary analysis. There are a large number of (non-native) Linux tools available for binary analysis, why choose Radare2? |
Why do I need another tool?
If the existing Linux native tools can do similar things, you will naturally ask why you need another tool. Well, this is the same reason you use your mobile phone as an alarm clock, taking notes, making a camera, listening to music, surfing the Internet, and occasionally making and answering calls. In the past, separate devices and tools were used to handle these functions-such as a physical camera to take pictures, a small notepad to take notes, a bedside alarm to wake up, and so on. It is convenient for users to have one device to do multiple (but related) things . In addition, the killer feature is the interoperability between independent functions .
Similarly, even though many Linux tools have a specific purpose, it is very useful to bundle similar (and better) features in one tool. This is why I think Radare2 should be the tool of choice when you need to process binary files.
According to its GitHub profile, Radare2 (also known as r2) is a "reverse engineering framework and command- line tool set on Unix-like systems. " The "2" in its name is because this version was rewritten from scratch to make it more modular.
Why choose Radare2?
There are plenty of (non-native) Linux tools available for binary analysis, why choose Radare2? My reason is simple.
First of all, it is an open source project with an active and healthy community. This is important if you are looking for novel features or provide bug fixing tools.
Secondly, Radare2 can be used on the command line, and it has a feature-rich graphical user interface (GUI) environment called Cutter, suitable for those who are familiar with GUI. As a long-term Linux user, I am used to typing on the shell . Although there is a little learning curve to be familiar with Radare2 commands, I would compare it to learning Vim. You can learn the basic things first, and once you master them, you can continue to learn more advanced things. Soon, it became a muscle memory.
Third, Radare2 can support external tools well through plug-ins. For example, the recently open source Ghidra binary analysis and reverse engineering tool reversing tool is very popular because its decompiler function is a key element of the reverse software. You can install and use the Ghidra decompiler directly from the Radare2 console. This is amazing and gives you the best of both worlds.
Start using Radare2
To install Radare2, just clone its repository and run the user.sh
script . If you do not have some preparatory packages on your system, you may need to install them. Once the installation is complete, run the r2 -v command to see if Radare2 is installed correctly:
$ git clone https://github.com/radareorg/radare2.git
$ cd radare2
$ ./sys/user.sh
# version
$ r2 -v
radare2 4.6.0-git 25266 @ linux-x86-64 git.4.4.0-930-g48047b317
commit: 48047b3171e6ed0480a71a04c3693a0650d03543 build: 2020-11-17__09:31:03
$
Obtain binary test samples
Now that r2 is installed, you need a sample binary program to try it out. You can use any system binary file (ls, bash, etc.), but to keep the content of this tutorial simple, please compile the following C program:
$ cat adder.c
#include <stdio.h>
int adder(int num) {
return num + 1;
}
int main() {
int res, num1 = 100;
res = adder(num1);
printf("Number now is : %d\n", res);
return 0;
}
$ gcc adder.c -o adder
$ file adder
adder: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=9d4366f7160e1ffb46b14466e8e0d70f10de2240, not stripped
$ ./adder
Number now is : 101
Load the binary file
To analyze a binary file, you must load it in Radare2. r2
Load it by providing the file name as a command line argument to the command. You will enter a separate Radare2 console, which is different from your shell. To exit the console, you can type Quit
or Exit
or press Ctrl+D
:
$ r2 ./adder
-- Learn pancake as if you were radare!
[0x004004b0]> quit
$
Analyze the binary
Before you explore the binary, you must let r2 analyze it for you. You can do this by running the aaa command in the r2 console:
$ r2 ./adder
-- Learn pancake as if you were radare!
[0x004004b0]> quit
$
This means that every time you select a binary file for analysis, you must enter an additional command aaa after loading the binary file. You can bypass this and call r2 with -A after the command; this will tell r2 to automatically analyze the binary for you:
$ r2 -A ./adder
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Check for vtables
[x] Type matching analysis for all functions (aaft)
[x] Propagate noreturn information
[x] Use -AA or aaaa to perform additional experimental analysis.
-- Already up-to-date.
[0x004004b0]>
Get some basic information about binary
Before you start analyzing a binary file, you need some background information. In many cases, this can be the format of the binary file (ELF, PE, etc.), the binary architecture (x86, AMD, ARM, etc.), and whether the binary is 32-bit or 64-bit. The convenient iI command of r2 can provide the required information:
[0x004004b0]> iI
arch x86
baddr 0x400000
binsz 14724
bintype elf
bits 64
canary false
class ELF64
compiler GCC: (GNU) 8.3.1 20190507 (Red Hat 8.3.1-4)
crypto false
endian little
havecode true
intrp /lib64/ld-linux-x86-64.so.2
laddr 0x0
lang c
linenum true
lsyms true
machine AMD x86-64 architecture
maxopsz 16
minopsz 1
nx true
os linux
pcalign 0
pic false
relocs true
relro partial
rpath NONE
sanitiz false
static false
stripped false
subsys linux
va true
[0x004004b0]>
[0x004004b0]>
Import and export
Usually, when you know what kind of file you are dealing with, you want to know what standard library functions the binary program uses, or understand the potential functions of the program. In the sample C program in this tutorial, the only library function is printf, which is used to print information. You can see this by running the ii command, which shows all the libraries imported by the binary:
[0x004004b0]> ii
[Imports]
nth vaddr bind type lib name
―――――――――――――――――――――――――――――――――――――
1 0x00000000 WEAK NOTYPE _ITM_deregisterTMCloneTable
2 0x004004a0 GLOBAL FUNC printf
3 0x00000000 GLOBAL FUNC __libc_start_main
4 0x00000000 WEAK NOTYPE __gmon_start__
5 0x00000000 WEAK NOTYPE _ITM_registerTMCloneTable
The binary can also have its own symbols, functions, or data. These functions are usually displayed under Exports. The binary of this test exports two functions: main and adder. The rest of the functions are added during the compilation phase when the binary file is built. The loader needs these functions to load binary files (don’t care about them too much for now):
[0x004004b0]>
[0x004004b0]> iE
[Exports]
nth paddr vaddr bind type size lib name
――――――――――――――――――――――――――――――――――――――――――――――――――――――
82 0x00000650 0x00400650 GLOBAL FUNC 5 __libc_csu_fini
85 ---------- 0x00601024 GLOBAL NOTYPE 0 _edata
86 0x00000658 0x00400658 GLOBAL FUNC 0 _fini
89 0x00001020 0x00601020 GLOBAL NOTYPE 0 __data_start
90 0x00000596 0x00400596 GLOBAL FUNC 15 adder
92 0x00000670 0x00400670 GLOBAL OBJ 0 __dso_handle
93 0x00000668 0x00400668 GLOBAL OBJ 4 _IO_stdin_used
94 0x000005e0 0x004005e0 GLOBAL FUNC 101 __libc_csu_init
95 ---------- 0x00601028 GLOBAL NOTYPE 0 _end
96 0x000004e0 0x004004e0 GLOBAL FUNC 5 _dl_relocate_static_pie
97 0x000004b0 0x004004b0 GLOBAL FUNC 47 _start
98 ---------- 0x00601024 GLOBAL NOTYPE 0 __bss_start
99 0x000005a5 0x004005a5 GLOBAL FUNC 55 main
100 ---------- 0x00601028 GLOBAL OBJ 0 __TMC_END__
102 0x00000468 0x00400468 GLOBAL FUNC 0 _init
[0x004004b0]>
Hash information
How to know if two binary files are similar? You can't just open a binary file and view the source code inside. In most cases, the hash value of a binary file (md5sum, sha1, sha256) is used to uniquely identify it. You can use the it command to find the binary hash value:
[0x004004b0]> it
md5 7e6732f2b11dec4a0c7612852cede670
sha1 d5fa848c4b53021f6570dd9b18d115595a2290ae
sha256 13dd5a492219dac1443a816ef5f91db8d149e8edbf26f24539c220861769e1c2
[0x004004b0]>
function
The codes are grouped by function; to list the functions that exist in the binary, run the afl command. The following list shows the main function and adder function. Usually, the functions starting with sym.imp are imported from the standard library (here, glibc):
[0x004004b0]> afl
0x004004b0 1 46 entry0
0x004004f0 4 41 -> 34 sym.deregister_tm_clones
0x00400520 4 57 -> 51 sym.register_tm_clones
0x00400560 3 33 -> 32 sym.__do_global_dtors_aux
0x00400590 1 6 entry.init0
0x00400650 1 5 sym.__libc_csu_fini
0x00400658 1 13 sym._fini
0x00400596 1 15 sym.adder
0x004005e0 4 101 loc..annobin_elf_init.c
0x004004e0 1 5 loc..annobin_static_reloc.c
0x004005a5 1 55 main
0x004004a0 1 6 sym.imp.printf
0x00400468 3 27 sym._init
[0x004004b0]>
cross reference
[0x004004b0]> axt sym.adder
main 0x4005b9 [CALL] call sym.adder
[0x004004b0]>
[0x004004b0]> axt main
entry0 0x4004d1 [DATA] mov rdi, main
[0x004004b0]>
Find location
When working with text files, you often move within the file by quoting line numbers and line or column numbers; in binary files, you need to use addresses. These are hexadecimal numbers starting with 0x, followed by an address. To find your position in the binary, run the s command. To move to a different location, use the s command, followed by the address.
The function name is like a label, and it is represented internally by an address. If the function name is in binary (not stripped), you can use the s command after the function name to jump to a specific function address. Similarly, if you want to jump to the beginning of the binary, enter s 0:
[0x004004b0]> s
0x4004b0
[0x004004b0]>
[0x004004b0]> s main
[0x004005a5]>
[0x004005a5]> s
0x4005a5
[0x004005a5]>
[0x004005a5]> s sym.adder
[0x00400596]>
[0x00400596]> s
0x400596
[0x00400596]>
[0x00400596]> s 0
[0x00000000]>
[0x00000000]> s
0x0
[0x00000000]>
Hexadecimal view
Normally, raw binary has no meaning. It is helpful to view binary and its equivalent ASCII notation in hexadecimal mode:
[0x004004b0]> s main
[0x004005a5]>
[0x004005a5]> px
- offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
0x004005a5 5548 89e5 4883 ec10 c745 fc64 0000 008b UH..H....E.d....
0x004005b5 45fc 89c7 e8d8 ffff ff89 45f8 8b45 f889 E.........E..E..
0x004005c5 c6bf 7806 4000 b800 0000 00e8 cbfe ffff ..x.@...........
0x004005d5 b800 0000 00c9 c30f 1f40 00f3 0f1e fa41 [email protected]
0x004005e5 5749 89d7 4156 4989 f641 5541 89fd 4154 WI..AVI..AUA..AT
0x004005f5 4c8d 2504 0820 0055 488d 2d04 0820 0053 L.%.. .UH.-.. .S
0x00400605 4c29 e548 83ec 08e8 57fe ffff 48c1 fd03 L).H....W...H...
0x00400615 741f 31db 0f1f 8000 0000 004c 89fa 4c89 t.1........L..L.
0x00400625 f644 89ef 41ff 14dc 4883 c301 4839 dd75 .D..A...H...H9.u
0x00400635 ea48 83c4 085b 5d41 5c41 5d41 5e41 5fc3 .H...[]A\A]A^A_.
0x00400645 9066 2e0f 1f84 0000 0000 00f3 0f1e fac3 .f..............
0x00400655 0000 00f3 0f1e fa48 83ec 0848 83c4 08c3 .......H...H....
0x00400665 0000 0001 0002 0000 0000 0000 0000 0000 ................
0x00400675 0000 004e 756d 6265 7220 6e6f 7720 6973 ...Number now is
0x00400685 2020 3a20 2564 0a00 0000 0001 1b03 3b44 : %d........;D
0x00400695 0000 0007 0000 0000 feff ff88 0000 0020 ...............
[0x004005a5]>
Disassembly
If you are using a compiled binary file, you cannot view the source code. The compiler translates the source code into machine language instructions that the CPU can understand and execute; the result is a binary or executable file. However, you can look at the assembly instructions (mnemonics) to understand what the program is doing. For example, if you want to see what the main function is doing, you can use s main to find the address of the main function, and then run the pdf command to see the disassembled instructions.
To understand assembly instructions, you need to refer to the architecture manual (here x86), its application binary interface (ABI, or calling convention), and have a basic understanding of how the stack works:
[0x004004b0]> s main
[0x004005a5]>
[0x004005a5]> s
0x4005a5
[0x004005a5]>
[0x004005a5]> pdf
; DATA XREF from entry0 @ 0x4004d1
┌ 55: int main (int argc, char **argv, char **envp);
│ ; var int64_t var_8h @ rbp-0x8
│ ; var int64_t var_4h @ rbp-0x4
│ 0x004005a5 55 push rbp
│ 0x004005a6 4889e5 mov rbp, rsp
│ 0x004005a9 4883ec10 sub rsp, 0x10
│ 0x004005ad c745fc640000. mov dword [var_4h], 0x64 ; 'd' ; 100
│ 0x004005b4 8b45fc mov eax, dword [var_4h]
│ 0x004005b7 89c7 mov edi, eax
│ 0x004005b9 e8d8ffffff call sym.adder
│ 0x004005be 8945f8 mov dword [var_8h], eax
│ 0x004005c1 8b45f8 mov eax, dword [var_8h]
│ 0x004005c4 89c6 mov esi, eax
│ 0x004005c6 bf78064000 mov edi, str.Number_now_is__:__d ; 0x400678 ; "Number now is : %d\n" ; const char *format
│ 0x004005cb b800000000 mov eax, 0
│ 0x004005d0 e8cbfeffff call sym.imp.printf ; int printf(const char *format)
│ 0x004005d5 b800000000 mov eax, 0
│ 0x004005da c9 leave
└ 0x004005db c3 ret
[0x004005a5]>
这是 adder 函数的反汇编结果:
[0x004005a5]> s sym.adder
[0x00400596]>
[0x00400596]> s
0x400596
[0x00400596]>
[0x00400596]> pdf
; CALL XREF from main @ 0x4005b9
┌ 15: sym.adder (int64_t arg1);
│ ; var int64_t var_4h @ rbp-0x4
│ ; arg int64_t arg1 @ rdi
│ 0x00400596 55 push rbp
│ 0x00400597 4889e5 mov rbp, rsp
│ 0x0040059a 897dfc mov dword [var_4h], edi ; arg1
│ 0x0040059d 8b45fc mov eax, dword [var_4h]
│ 0x004005a0 83c001 add eax, 1
│ 0x004005a3 5d pop rbp
└ 0x004005a4 c3 ret
[0x00400596]>
String
See which strings exist in the binary can be used as a starting point for binary analysis. Strings are hard-coded into the binary, and usually provide important hints that allow you to shift your focus to certain areas of the analysis. Run the iz
command in the binary to list all the strings. There is only one hard-coded string in this test binary:
[0x004004b0]> iz
[Strings]
nth paddr vaddr len size section type string
―――――――――――――――――――――――――――――――――――――――――――――――――――――――
0 0x00000678 0x00400678 20 21 .rodata ascii Number now is : %d\n
[0x004004b0]>
Cross-reference string
As with functions, you can cross-reference strings, see where they are printed from, and understand the code surrounding them:
[0x004004b0]> ps @ 0x400678
Number now is : %d
[0x004004b0]>
[0x004004b0]> axt 0x400678
main 0x4005c6 [DATA] mov edi, str.Number_now_is__:__d
[0x004004b0]>
Visual mode
When your code is complex and multiple functions are called, it is easy to get lost. It would be helpful to see which functions were called in a graphical or visual way, and which paths were taken according to certain conditions. After moving to the function of interest, you can use the VV command to explore the visualization mode of r2. For example, for the adder function:
[0x004004b0]> s sym.adder
[0x00400596]>
[0x00400596]> VV
Debugger
So far, what you have been doing is static analysis-you are just looking at the things in the binary file without running it. Sometimes you need to execute the binary file and analyze various information in the memory at runtime. R2's internal debugger allows you to run binary files, set breakpoints, analyze the value of variables, or dump the contents of registers.
Start the debugger with the -d flag, and add the -A flag for analysis when loading the binary. You can set breakpoints in different places, such as functions or memory addresses, by using the db <function-name> command. To view existing breakpoints, use the dbi command. Once you have placed a breakpoint, use the dc command to start running the binary file. You can use the dbt command to view the stack, which can display function calls. Finally, you can use the drr command to dump the contents of the register:
$ r2 -d -A ./adder
Process with PID 17453 started...
= attach 17453 17453
bin.baddr 0x00400000
Using 0x400000
asm.bits 64
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Check for vtables
[x] Type matching analysis for all functions (aaft)
[x] Propagate noreturn information
[x] Use -AA or aaaa to perform additional experimental analysis.
-- git checkout hamster
[0x7f77b0a28030]>
[0x7f77b0a28030]> db main
[0x7f77b0a28030]>
[0x7f77b0a28030]> db sym.adder
[0x7f77b0a28030]>
[0x7f77b0a28030]> dbi
0 0x004005a5 E:1 T:0
1 0x00400596 E:1 T:0
[0x7f77b0a28030]>
[0x7f77b0a28030]> afl | grep main
0x004005a5 1 55 main
[0x7f77b0a28030]>
[0x7f77b0a28030]> afl | grep sym.adder
0x00400596 1 15 sym.adder
[0x7f77b0a28030]>
[0x7f77b0a28030]> dc
hit breakpoint at: 0x4005a5
[0x004005a5]>
[0x004005a5]> dbt
0 0x4005a5 sp: 0x0 0 [main] main sym.adder+15
1 0x7f77b0687873 sp: 0x7ffe35ff6858 0 [??] section..gnu.build.attributes-1345820597
2 0x7f77b0a36e0a sp: 0x7ffe35ff68e8 144 [??] map.usr_lib64_ld_2.28.so.r_x+65034
[0x004005a5]> dc
hit breakpoint at: 0x400596
[0x00400596]> dbt
0 0x400596 sp: 0x0 0 [sym.adder] rip entry.init0+6
1 0x4005be sp: 0x7ffe35ff6838 0 [main] main+25
2 0x7f77b0687873 sp: 0x7ffe35ff6858 32 [??] section..gnu.build.attributes-1345820597
3 0x7f77b0a36e0a sp: 0x7ffe35ff68e8 144 [??] map.usr_lib64_ld_2.28.so.r_x+65034
[0x00400596]>
[0x00400596]>
[0x00400596]> dr
rax = 0x00000064
rbx = 0x00000000
rcx = 0x7f77b0a21738
rdx = 0x7ffe35ff6948
r8 = 0x7f77b0a22da0
r9 = 0x7f77b0a22da0
r10 = 0x0000000f
r11 = 0x00000002
r12 = 0x004004b0
r13 = 0x7ffe35ff6930
r14 = 0x00000000
r15 = 0x00000000
rsi = 0x7ffe35ff6938
rdi = 0x00000064
rsp = 0x7ffe35ff6838
rbp = 0x7ffe35ff6850
rip = 0x00400596
rflags = 0x00000202
orax = 0xffffffffffffffff
[0x00400596]>
Decompiler
Being able to understand assembly is a prerequisite for binary analysis. Assembly language is always related to the architecture of the binary build and expected operation. There is never a 1:1 mapping between a line of source code and assembly code. Usually, one line of C source code will produce multiple lines of assembly code. Therefore, reading the assembly code line by line is not the best choice.
This is the role of the decompiler. They try to reconstruct possible source code based on assembly instructions. This is by no means exactly the same as the source code used to create the binary, it is an approximate representation of the source code based on assembly. In addition, the optimization performed by the compiler must be taken into consideration. It will generate different assembly codes to speed up the speed, reduce the size of the binary, etc., which will make the work of the decompiler more difficult. In addition, malware authors often deliberately obfuscate the code, discouraging malware analysts.
Radare2 provides a decompiler through a plug-in. You can install any decompiler supported by Radare2. Use the r2pm -l command to view the current plug-in. Use the r2pm install command to install an example decompiler r2dec:
$ r2pm -l
$
$ r2pm install r2dec
Cloning into 'r2dec'...
remote: Enumerating objects: 100, done.
remote: Counting objects: 100% (100/100), done.
remote: Compressing objects: 100% (97/97), done.
remote: Total 100 (delta 18), reused 27 (delta 1), pack-reused 0
Receiving objects: 100% (100/100), 1.01 MiB | 1.31 MiB/s, done.
Resolving deltas: 100% (18/18), done.
Install Done For r2dec
gmake: Entering directory '/root/.local/share/radare2/r2pm/git/r2dec/p'
[CC] duktape/duktape.o
[CC] duktape/duk_console.o
[CC] core_pdd.o
[CC] core_pdd.so
gmake: Leaving directory '/root/.local/share/radare2/r2pm/git/r2dec/p'
$
$ r2pm -l
r2dec
$
Decompiler view
To decompile a binary file, load the binary file in r2 and analyze it automatically. In this example, use the s sym.adder command to move to the adder function of interest, and then use the pdda command to view the assembled and decompiled source code side by side. Reading this decompiled source code is often easier than reading the assembly line by line:
$ r2 -A ./adder
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Check for vtables
[x] Type matching analysis for all functions (aaft)
[x] Propagate noreturn information
[x] Use -AA or aaaa to perform additional experimental analysis.
-- What do you want to debug today?
[0x004004b0]>
[0x004004b0]> s sym.adder
[0x00400596]>
[0x00400596]> s
0x400596
[0x00400596]>
[0x00400596]> pdda
; assembly | /* r2dec pseudo code output */
| /* ./adder @ 0x400596 */
| #include <stdint.h>
|
; (fcn) sym.adder () | int32_t adder (int64_t arg1) {
| int64_t var_4h;
| rdi = arg1;
0x00400596 push rbp |
0x00400597 mov rbp, rsp |
0x0040059a mov dword [rbp - 4], edi | *((rbp - 4)) = edi;
0x0040059d mov eax, dword [rbp - 4] | eax = *((rbp - 4));
0x004005a0 add eax, 1 | eax++;
0x004005a3 pop rbp |
0x004005a4 ret | return eax;
| }
[0x00400596]>
Configuration settings
As you become more familiar with the use of Radare2, you will want to change its configuration to suit the way you work. You can use the e command to view the default configuration of r2. To set a specific configuration, add config = value after the e command:
[0x004005a5]> e | wc -l
593
[0x004005a5]> e | grep syntax
asm.syntax = intel
[0x004005a5]>
[0x004005a5]> e asm.syntax = att
[0x004005a5]>
[0x004005a5]> e | grep syntax
asm.syntax = att
[0x004005a5]>
To make configuration changes permanent, place them in a startup file named .radare2rc that is read when r2 starts. This file is usually in your home directory, if not, you can create one. Some example configuration options include:
$ cat ~/.radare2rc
e asm.syntax = att
e scr.utf8 = true
eco solarized
e cmd.stack = true
e stack.size = 256
$
Explore more
You have seen enough Radare2 features and have a certain understanding of this tool. Because Radare2 follows the Unix philosophy, even if you can do various things from its main console, it will use a set of independent binaries below to complete its tasks. Linux should be learned like this
Explore the individual binaries listed below to see how they work. For example, the binary information seen on the console with the iI command can also be found with the rabin2 <binary> command:
$ cd bin/
$
$ ls
prefix r2agent r2pm rabin2 radiff2 ragg2 rarun2 rasm2
r2 r2-indent r2r radare2 rafind2 rahash2 rasign2 rax2
$