原文链接:http://mcdermottcybersecurity.com/articles/windows-x64-shellcode

Windows x64 Shellcode

January 11, 2011

Introduction

Shellcode refers to a chunk of executable machine code (along with any associated data) which is executed after being injected into the memory of a process usually by means of a buffer-overflow type of security vulnerability. The term comes from the fact that in early exploits against Unix platforms, an attacker would typically execute code that would start a command shell listening on a TCP/IP port, to which the attacker could then connect and have full access to the system. For the common web-browser and application exploits on Windows today, the “shellcode” is more likely to download and execute another program than spawn a command shell, but the term remains.

In general, shellcode can be thought of as any code that is capable of being executed from an arbitrary location in memory and without relying on services provided by the operating system loader as with traditional executables. Depending on the exploit, additional requirements for shellcode may include small size and avoiding certain byte patterns in the code. In any case, there are two tasks performed by the loader which shellcode must take care of itself:

Getting the addresses of data elements (such as strings referenced by the code)
Getting the addresses of system API functions used

This article describes a shellcode implementation of the x64 assembly program from my Windows Assembly Languages article (refer to that article for general x64 assembly programming issues such as calling conventions and stack usage). As you’ll see, the main program code doesn’t look much different. Task #1 above actually turns out to be a non-issue on x64 platforms due to a new feature called RIP-relative addressing. Task #2 is what comprises the bulk of the effort. In fact, the code for looking up API functions is significantly larger and more complex than the main program itself. The only other difference between the vanilla and shellcode versions of x64 hello world is that the shellcode does not use a .data section, instead placing the strings in the .code section after main. This is because “sections” are a feature of the executable file format, whereas shellcode needs to be just a single block of code and data.

RIP-Relative Addressing

RIP refers to the instruction pointer register on x64, and RIP-relative addressing means that references to memory addresses being read or written can be encoded as offsets from the currently-executing instruction. This is not a completely new concept, as jmp and call instructions have always supported relative targets on x86, but the ability to read and write memory using relative addressing is new with x64.

On x86, the labels referring to data variables would be replaced with actual hard-coded memory addresses when the program was assembled and linked, under the assumption that the program would be loaded at a specific base address. If at runtime the program needed to load at a different base address, the loader would perform relocationby updating all of those hard-coded addresses. Because shellcode needed to run from anywhere in memory, it needed to determine these addresses dynamically and typically used a trick where the call instruction would push the address just past itself onto the stack as the return address. This “return address” could then be popped off the stack to get a pointer to the string at runtime:

    call skip
    db ‘Hello world’, 0
skip:
    pop esi      ;esi now points to ‘Hello world’ string

On x64 we do not need this trick. RIP-relative addressing is not only supported but is in fact the default, so we can simply refer to strings using labels as with ordinary code and it Just Works.

API Lookup Overview

Even the most trivial programs generally need to call various operating system API functions to perform some of type of input/output (I/O) – displaying things to the user, accessing files, making network connections, etc. On Windows these API functions are implemented in various system DLLs, and in standard application development these API functions can simply be referred to by name. When the program is compiled and linked, the linker puts information in the resulting executable indicating which functions from which DLLs are required. When the program is run, the loader ensures that the necessary DLLs are loaded and that the addresses of the called functions are resolved.

Windows also provides another facility that can be used by applications to load additional DLLs and look up functions on demand: the LoadLibrary() and GetProcAddress() APIs in kernel32.dll. Not having the benefit of the loader, shellcode needs to use LoadLibrary() and GetProcAddress() for all API functions it uses. This unfortunately presents a Catch-22: How does the shellcode get the addresses of LoadLibrary() and GetProcAddress()?

It turns out that an equivalent to GetProcAddress() can be implemented by traversing the data structures of a loaded DLL in memory. Also, kernel32.dll is always loaded in the address space of every process on Windows, so LoadLibrary() can be found there and used to load other DLLs.

Developing shellcode using this technique requires a solid understanding of the Portable Executable (PE) file format used on Windows for EXE and DLL files, and the next section of this article assumes some familiarity. The following references and tools may be helpful:

Matt Pietrek’s An In-Depth Look into the Win32 Portable Executable File Format: part1 and part2. Note that this only covers 32-bit and not 64-bit PE files, but the differences are very minor – mostly just widening some memory address fields to 64 bits
The offical Microsoft Portable Executable and Common Object File Format Specification
Daniel Pistelli’s CFF Explorer is a nice GUI tool for viewing and editing PE files, with 64-bit support
The dumpbin utility included with Visual C++ (including Express Edition) – the most useful switches for our purposes are /headers and /exports
Many of the PE data structures are documented in MSDN under ImageHlp Structures
Definitions of the data structures can be found in winnt.h in the Include directory of the Windows SDK
The dt command in WinDbg is able to display many of these structures

API Lookup Demo

This demonstration of how to find the address of a function in a loaded DLL can be followed by attaching WinDbg to any 64-bit process (I’m using notepad.exe). Note that the particular values seen here may be different on your system.

First we’ll get the address of the Thread Environment Block (TEB), sometimes also referred to as the Thread Information Block (TIB). The TEB contains a large number of fields pertaining to the current thread, and on x64 the fields can be accessed as offsets from the GS segment register during program execution (the FS register was used on x86). In WinDbg, the pseudo register $teb contains the address of the TEB.

0:001> r $teb
$teb=000007fffffdb000
0:001> dt _TEB @$teb
ntdll!_TEB
   +0x000 NtTib            : _NT_TIB
   +0x038 EnvironmentPointer : (null)
   +0x040 ClientId         : _CLIENT_ID
   +0x050 ActiveRpcHandle  : (null)
   +0x058 ThreadLocalStoragePointer : (null)
   +0x060 ProcessEnvironmentBlock : 0x000007ff`fffdd000 _PEB
   +0x068 LastErrorValue   : 0
   [...]

The only field from the TEB we are interested in is the pointer to the Process Environment Block (PEB). Note that WinDbg also has a $peb pseudo-register, but in the shellcode implementation we will have to use the GS register to go through the TEB first.

0:001> dt _PEB 7ff`fffdd000
ntdll!_PEB
   +0×000 InheritedAddressSpace : 0 ''
   +0×001 ReadImageFileExecOptions : 0 ''
   +0×002 BeingDebugged    : 0×1 ''
   +0×003 BitField         : 0×8 ''
   +0×003 ImageUsesLargePages : 0y0
   +0×003 IsProtectedProcess : 0y0
   +0×003 IsLegacyProcess  : 0y0
   +0×003 IsImageDynamicallyRelocated : 0y1
   +0×003 SkipPatchingUser32Forwarders : 0y0
   +0×003 SpareBits        : 0y000
   +0×008 Mutant           : 0xffffffff`ffffffff Void
   +0×010 ImageBaseAddress : 0×00000000`ff8b0000 Void
   +0×018 Ldr              : 0×00000000`779a3640 _PEB_LDR_DATA
   [...]

The PEB contains numerous fields with process-specific data and we are interested in the Ldr field at offset 0x18 which points to a structure of type PEB_LDR_DATA.

0:001> dt _PEB_LDR_DATA 779a3640
ntdll!_PEB_LDR_DATA
   +0×000 Length           : 0×58
   +0×004 Initialized      : 0×1 ''
   +0×008 SsHandle         : (null)
   +0×010 InLoadOrderModuleList: _LIST_ENTRY [ 0x00000000`00373040 - 0x39a3b0 ]
   +0×020 InMemoryOrderModuleList: _LIST_ENTRY [ 0x00000000`00373050 - 0x39a3c0 ]
   +0×030 InInitializationOrderModuleList: _LIST_ENTRY [ 0x00000000`00373150 - 0x39a3d0 ]
   +0×040 EntryInProgress  : (null)
   +0×048 ShutdownInProgress : 0 ''
   +0×050 ShutdownThreadId : (null)

The PEB_LDR_DATA structure contains three linked lists of loaded modules – InLoadOrderModuleList, InMemoryOrderModuleList, and InInitializationOrderModuleList. A module or image refers to any PE file in memory – the main program executable as well as any currently-loaded DLLs. All three lists contain the same elements just in a different order, with the one exception that InInitializationOrderModuleList only contains DLLs and excludes the main executable.

The elements of these lists are of type LDR_DATA_TABLE_ENTRY, though you can’t tell from the previous output because they are only shown as LIST_ENTRY which is the generic linked list header datatype used throughout Windows. A LIST_ENTRY simply consists of a forward and back pointer for creating circular, doubly-linked lists. The address of the _LIST_ENTRY within the _PEB_LDR_DATA structure represents the list head. When traversing the circular list, arriving back at the list head is the way to know when complete.

0:001> dt _LIST_ENTRY
ntdll!_LIST_ENTRY
   +0×000 Flink            : Ptr64 _LIST_ENTRY
   +0×008 Blink            : Ptr64 _LIST_ENTRY

The !list command provides the ability to traverse these types of lists and execute a specific command for each element in the list (in this case displaying the element as an LDR_DATA_TABLE_ENTRY data structure). WinDbg commands can get nasty-looking sometimes but are quite powerful. Here we display the InLoadOrderModuleList with list head at offset 0x10 from the beginning of the PEB_LDR_DATA structure (very long output truncated to show just part of one element):

0:001> !list -t ntdll!_LIST_ENTRY.Flink -x "dt _LDR_DATA_TABLE_ENTRY @$extret" 779a3640+10
   [...]
ntdll!_LDR_DATA_TABLE_ENTRY
   +0x000 InLoadOrderLinks : _LIST_ENTRY [ 0x00000000`00333620 - 0x333130 ]
   +0x010 InMemoryOrderLinks : _LIST_ENTRY [ 0x00000000`00333630 - 0x333140 ]
   +0x020 InInitializationOrderLinks : _LIST_ENTRY [ 0x00000000`003344e0 - 0x333640 ]
   +0x030 DllBase          : 0x00000000`77650000 Void
   +0x038 EntryPoint       : 0x00000000`7766eff0 Void
   +0x040 SizeOfImage      : 0x11f000
   +0x048 FullDllName      : _UNICODE_STRING "C:\Windows\system32\kernel32.dll"
   +0x058 BaseDllName      : _UNICODE_STRING "kernel32.dll"
   +0x068 Flags            : 0x84004
   [...]

Interesting fields for us within an LDR_DATA_TABLE_ENTRY structure are DllBase at 0x30 and BaseDllName at 0x58. Note that BaseDllName is a UNICODE_STRING, which is an actual data structure and not simply a null-terminated Unicode string. The actual string data can be found at offset 0x8 in the structure, for a total of 0x60 from BaseDllName.

0:001> dt _UNICODE_STRING
ntdll!_UNICODE_STRING
   +0×000 Length           : Uint2B
   +0×002 MaximumLength    : Uint2B
   +0×008 Buffer           : Ptr64 Uint2B

Armed with this knowledge, we now have the ability to obtain the base address of any DLL given it’s name. Once we have the base address we can traverse the DLL in memory to locate any function exported by the DLL. Also note that the return value of LoadLibrary() is in fact a DLL base address. The base address of a loaded DLL can also be obtained in WinDbg with the lm command. Let’s take a look at kernel32.dll:

0:001> lm m kernel32
start             end                 module name
00000000`77650000 00000000`7776f000   kernel32   (deferred)

An interesting feature of the PE file and loader is that the PE file format in memory is exactly the same as it is on disk, at least as far as the headers. It’s not exactly true that the entire file is read verbatim into memory, because each section is loaded at a certain byte alignment in memory (typically a multiple of 4096, the virtual memory page size) that may be different from where it falls in the file. Also, some sections (like a debug data section) may not be read into memory at all. However, when we look at the DLL base address in memory, we can expect to find what we see at the beginning of any PE file: a DOS “MZ” header. That’s an IMAGE_DOS_HEADER structure to be exact:

0:001> dt _IMAGE_DOS_HEADER 77650000
ntdll!_IMAGE_DOS_HEADER
   +0×000 e_magic          : 0x5a4d
   +0×002 e_cblp           : 0×90
   +0×004 e_cp             : 3
   +0×006 e_crlc           : 0
   +0×008 e_cparhdr        : 4
   +0x00a e_minalloc       : 0
   +0x00c e_maxalloc       : 0xffff
   +0x00e e_ss             : 0
   +0×010 e_sp             : 0xb8
   +0×012 e_csum           : 0
   +0×014 e_ip             : 0
   +0×016 e_cs             : 0
   +0×018 e_lfarlc         : 0×40
   +0x01a e_ovno           : 0
   +0x01c e_res            : [4] 0
   +0×024 e_oemid          : 0
   +0×026 e_oeminfo        : 0
   +0×028 e_res2           : [10] 0
   +0x03c e_lfanew         : 0n224

The e_lfanew field at 0x3c (which for some reason is displayed as a decimal number even though everything else is hex) contains the byte offset to the NT header (IMAGE_NT_HEADERS64). Converting 224 to hex 0xe0 and adding to the image base will point to the NT header at 0x776500e0. We can use the –r option (recursive) to expand the embedded OptionalHeader field (which is a misnomer as it is required and always present):

0:001> dt -r _IMAGE_NT_HEADERS64 776500e0
ntdll!_IMAGE_NT_HEADERS64
   +0×000 Signature        : 0×4550
   +0×004 FileHeader       : _IMAGE_FILE_HEADER
      +0×000 Machine          : 0×8664
      +0×002 NumberOfSections : 6
      +0×004 TimeDateStamp    : 0x4a5bdfdf
      +0×008 PointerToSymbolTable : 0
      +0x00c NumberOfSymbols  : 0
      +0×010 SizeOfOptionalHeader : 0xf0
      +0×012 Characteristics  : 0×2022
   +0×018 OptionalHeader   : _IMAGE_OPTIONAL_HEADER64
      +0×000 Magic            : 0x20b
      +0×002 MajorLinkerVersion : 0×9 ''
      +0×003 MinorLinkerVersion : 0 ''
      [...]
      +0×068 LoaderFlags      : 0
      +0x06c NumberOfRvaAndSizes : 0×10
      +0×070 DataDirectory    : [16] _IMAGE_DATA_DIRECTORY
      [...]

The DataDirectory field is located a total of 0x88 bytes from the NT headers (offset 0x70 from OptionalHeader which is 0x18 from the NT headers). This is an array of 16 elements corresponding to the various types of data in a PE file.

0:001> dt -a16c _IMAGE_DATA_DIRECTORY 776500e0+88
ntdll!_IMAGE_DATA_DIRECTORY
[0] @ 0000000077650168 +0×000 VirtualAddress 0xa0020  +0×004 Size 0xac33
[1] @ 0000000077650170 +0×000 VirtualAddress 0xf848c  +0×004 Size 0x1f4
[2] @ 0000000077650178 +0×000 VirtualAddress 0×116000  +0×004 Size 0×520
[3] @ 0000000077650180 +0×000 VirtualAddress 0x10c000  +0×004 Size 0×9810
[4] @ 0000000077650188 +0×000 VirtualAddress 0  +0×004 Size 0
[5] @ 0000000077650190 +0×000 VirtualAddress 0×117000  +0×004 Size 0x7a9c
[6] @ 0000000077650198 +0×000 VirtualAddress 0x9b7dc  +0×004 Size 0×38
[7] @ 00000000776501a0 +0×000 VirtualAddress 0  +0×004 Size 0
[8] @ 00000000776501a8 +0×000 VirtualAddress 0  +0×004 Size 0
[9] @ 00000000776501b0 +0×000 VirtualAddress 0  +0×004 Size 0
[10] @ 00000000776501b8 +0×000 VirtualAddress 0  +0×004 Size 0
[11] @ 00000000776501c0 +0×000 VirtualAddress 0x2d8  +0×004 Size 0×408
[12] @ 00000000776501c8 +0×000 VirtualAddress 0x9c000  +0×004 Size 0x1c70
[13] @ 00000000776501d0 +0×000 VirtualAddress 0  +0×004 Size 0
[14] @ 00000000776501d8 +0×000 VirtualAddress 0  +0×004 Size 0
[15] @ 00000000776501e0 +0×000 VirtualAddress 0  +0×004 Size 0

We are interested in the Export Directory which is the first one in the list having VirtualAddress 0xa0020 and Size 0xac33. See the MSDN documentation of the IMAGE_DATA_DIRECTORY structure for a reference on which type of data goes with each array element.

A virtual address, also called a Relative Virtual Address (RVA) is an offset from the base load address of the module. RVAs are used extensively in PE files, including for the pointers to the function names and function addresses in the export table. To get the actual memory address pointed to by an RVA, simply add the base address of the module.

(For convenience, note that the !dh command can be used to automatically display much of the PE header information we’ve extracted manually so far.)

Given that the Export Directory begins at RVA 0xa0020, we add the base address 0x77650000 and should therefore expect to find an IMAGE_EXPORT_DIRECTORY structure at 0x776f0020. Unfortunately IMAGE_EXPORT_DIRECTORY is not understood by the dt command or documented in MSDN, so we will have to refer to the structure definition in winnt.h:

 
          typedef 
           struct 
           _IMAGE_EXPORT_DIRECTORY { 
         
          DWORD   
           Characteristics; 
         
          DWORD   
           TimeDateStamp; 
         
          WORD    
           MajorVersion; 
         
          WORD    
           MinorVersion; 
         
          DWORD   
           Name; 
         
          DWORD   
           Base; 
         
          DWORD   
           NumberOfFunctions; 
         
          DWORD   
           NumberOfNames; 
         
          DWORD   
           AddressOfFunctions;      
          // RVA from base of image 
         
          DWORD   
           AddressOfNames;          
          // RVA from base of image 
         
          DWORD   
           AddressOfNameOrdinals;   
          // RVA from base of image 
         
          } IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;

The best we can do in WinDbg is display the structure as an array of DWORDs and count where things fall using the above structure as a reference.

0:001> dd 776f0020
00000000`776f0020  00000000 4a5bc32c 00000000 000a366c
00000000`776f0030  00000001 0000056a 0000056a 000a0048
00000000`776f0040  000a15f0 000a2b98 000aa10b 000aa12c
[...]

Beginning with the 8th DWORD within the structure we will find AddressOfFunctions (0xa0048), followed by AddressOfNames (0xa15f0) and AddressOfNameOrdinals (0xa2b98). These values are RVAs – when we add the DLL base address we will get the memory address of the array. When working with RVAs a lot it can be handy to stash the DLL base address in a pseudo-register because it will be used so frequently. Here is AddressOfNames:

0:001> r $t0=77650000
0:001> dd @$t0+a15f0
00000000`776f15f0  000a3679 000a3691 000a36a6 000a36b5
00000000`776f1600  000a36be 000a36c7 000a36d8 000a36e9
00000000`776f1610  000a370f 000a372e 000a374d 000a375a
[...]

This is an array of RVAs pointing to the function name strings (the size of the array is given by the NumberOfNames field in IMAGE_EXPORT_DIRECTORY). Take a look at the first one (adding DLL base address of course) and we see the name of a function exported from kernel32.dll.

0:001> da @$t0+a3679
00000000`776f3679  "AcquireSRWLockExclusive"

We can ultimately find the address of a function based on the array index of where the name is found in this array. The AddressOfNameOrdinals array is a parallel array to AddressOfNames, which contains the ordinal valuesassociated with each name. An ordinal value is the index which is finally used to look up the function address in the AddressOfFunctions array. (DLLs have the option of exporting functions by ordinal only without even having a function name, and in fact the GetProcAddress() API can be called with a numeric ordinal instead of a string name).

More often than not, the value in each slot of the AddressOfNameOrdinals array has the same value as its array index but this is not guaranteed. Note that AddressOfNameOrdinals is an array of WORDs, not DWORDs. In this case it appears to follow the pattern of each element having the same value as its index.

0:001> dw @$t0+a2b98
00000000`776f2b98  0000 0001 0002 0003 0004 0005 0006 0007
00000000`776f2ba8  0008 0009 000a 000b 000c 000d 000e 000f
00000000`776f2bb8  0010 0011 0012 0013 0014 0015 0016 0017
[...]

Once we have the ordinal number of a function, the ordinal is used as an index into the AddressOfFunctions array:

0:001> dd @$t0+a0048
00000000`776f0048  000aa10b 000aa12c 000044b0 00066b20
00000000`776f0058  00066ac0 0006ad90 0006ae00 0004b7d0
00000000`776f0068  000956e0 0008fbb0 00048cc0 0004b800
[...]

The interpretation of the values in this array depends on whether the function is forwarded. Export Forwarding is a mechanism by which a DLL can declare that an exported function is actually implemented in a different DLL. If the function is not forwarded, the value is an RVA pointing to the actual function code. If the function is forwarded, the RVA points to an ASCII string giving the target DLL and function name. You can tell in advance if a function is forwarded based on the range of the RVA – the function is forwarded if the RVA falls within the export directory (as given by the VirtualAdress and Size in the IMAGE_DATA_DIRECTORY entry).

You can practically see at a glance which RVAs above are in the vicinity of the export directory addresses we’ve been working with. The first element in the array corresponds to our old friend AcquireSRWLockExclusive which we can see is forwarded to another function in NTDLL:

0:001> da @$t0+aa10b
00000000`776fa10b  "NTDLL.RtlAcquireSRWLockExclusive"
00000000`776fa12b  ""

The third array element, on the other hand, is not forwarded and points directly to the executable code of ActivateActCtx:

0:001> u @$t0+44b0
kernel32!ActivateActCtx:
00000000`776544b0 4883ec28        sub     rsp,28h
00000000`776544b4 4883f9ff        cmp     rcx,0FFFFFFFFFFFFFFFFh
[...]

We now have all of the understanding we need to get the address of a function and it’s just a matter of implementing the above steps in code.

The Code

Updated 11/10/2011 – thanks to Didier Stevens for pointing out a bug in the error handling.

 
          ;shell64.asm 
         
          ;License: MIT (http://www.opensource.org/licenses/mit-license.php) 
         
          .code 
         
          ;note: ExitProcess is forwarded 
         
          main proc 
         
          sub rsp, 28h            ;reserve stack space for called functions 
         
          and rsp, 0fffffffffffffff0h     ;make sure stack 16-byte aligned    
         
          lea rdx, loadlib_func 
         
          lea rcx, kernel32_dll 
         
          call lookup_api         ;get address of LoadLibraryA 
         
          mov r15, rax            ;save for later use with forwarded exports 
         
          lea rcx, user32_dll 
         
          call rax                ;load user32.dll 
         
          lea rdx, msgbox_func 
         
          lea rcx, user32_dll 
         
          call lookup_api         ;get address of MessageBoxA 
         
          xor r9, r9              ;MB_OK 
         
          lea r8, title_str       ;caption 
         
          lea rdx, hello_str      ;Hello world 
         
          xor rcx, rcx            ;hWnd (NULL) 
         
          call rax                ;display message box 
         
          lea rdx, exitproc_func 
         
          lea rcx, kernel32_dll 
         
          call lookup_api         ;get address of ExitProcess 
         
          xor rcx, rcx            ;exit code zero 
         
          call rax                ;exit 
         
          main endp 
         
          kernel32_dll    db  'KERNEL32.DLL', 0 
         
          loadlib_func    db  'LoadLibraryA', 0 
         
          user32_dll      db  'USER32.DLL', 0 
         
          msgbox_func     db  'MessageBoxA', 0 
         
          hello_str       db  'Hello world', 0 
         
          title_str       db  'Message', 0 
         
          exitproc_func   db  'ExitProcess', 0 
         
          ;look up address of function from DLL export table 
         
          ;rcx=DLL name string, rdx=function name string 
         
          ;DLL name must be in uppercase 
         
          ;r15=address of LoadLibraryA (optional, needed if export is forwarded) 
         
          ;returns address in rax 
         
          ;returns 0 if DLL not loaded or exported function not found in DLL 
         
          lookup_api  proc 
         
          sub rsp, 28h            ;set up stack frame in case we call loadlibrary 
         
          start: 
         
          mov r8, gs:[60h]        ;peb 
         
          mov r8, [r8+18h]        ;peb loader data 
         
          lea r12, [r8+10h]       ;InLoadOrderModuleList (list head) - save for later 
         
          mov r8, [r12]           ;follow _LIST_ENTRY->Flink to first item in list 
         
          cld 
         
          for_each_dll:               ;r8 points to current _ldr_data_table_entry 
         
          mov rdi, [r8+60h]       ;UNICODE_STRING at 58h, actual string buffer at 60h 
         
          mov rsi, rcx            ;pointer to dll we're looking for 
         
          compare_dll: 
         
          lodsb                   ;load character of our dll name string 
         
          test al, al             ;check for null terminator 
         
          jz found_dll            ;if at the end of our string and all matched so far, found it 
         
          mov ah, [rdi]           ;get character of current dll 
         
          cmp ah, 61h             ;lowercase 'a' 
         
          jl uppercase 
         
          sub ah, 20h             ;convert to uppercase 
         
          uppercase: 
         
          cmp ah, al 
         
          jne wrong_dll           ;found a character mismatch - try next dll 
         
          inc rdi                 ;skip to next unicode character 
         
          inc rdi 
         
          jmp compare_dll         ;continue string comparison 
         
          wrong_dll: 
         
          mov r8, [r8]            ;move to next _list_entry (following Flink pointer) 
         
          cmp r8, r12             ;see if we're back at the list head (circular list) 
         
          jne for_each_dll 
         
          xor rax, rax            ;DLL not found 
         
          jmp done 
         
          found_dll: 
         
          mov rbx, [r8+30h]       ;get dll base addr - points to DOS "MZ" header 
         
          mov r9d, [rbx+3ch]      ;get DOS header e_lfanew field for offset to "PE" header 
         
          add r9, rbx             ;add to base - now r9 points to _image_nt_headers64 
         
          add r9, 88h             ;18h to optional header + 70h to data directories 
         
          ;r9 now points to _image_data_directory[0] array entry 
         
          ;which is the export directory 
         
          mov r13d, [r9]          ;get virtual address of export directory 
         
          test r13, r13           ;if zero, module does not have export table 
         
          jnz has_exports 
         
          xor rax, rax            ;no exports - function will not be found in dll 
         
          jmp done 
         
          has_exports: 
         
          lea r8, [rbx+r13]       ;add dll base to get actual memory address 
         
          ;r8 points to _image_export_directory structure (see winnt.h) 
         
          mov r14d, [r9+4]        ;get size of export directory 
         
          add r14, r13            ;add base rva of export directory 
         
          ;r13 and r14 now contain range of export directory 
         
          ;will be used later to check if export is forwarded 
         
          mov ecx, [r8+18h]       ;NumberOfNames 
         
          mov r10d, [r8+20h]      ;AddressOfNames (array of RVAs) 
         
          add r10, rbx            ;add dll base 
         
          dec ecx                 ;point to last element in array (searching backwards) 
         
          for_each_func: 
         
          lea r9, [r10 + 4*rcx]   ;get current index in names array 
         
          mov edi, [r9]           ;get RVA of name 
         
          add rdi, rbx            ;add base 
         
          mov rsi, rdx            ;pointer to function we're looking for 
         
          compare_func: 
         
          cmpsb 
         
          jne wrong_func          ;function name doesn't match 
         
          mov al, [rsi]           ;current character of our function 
         
          test al, al             ;check for null terminator 
         
          jz found_func           ;if at the end of our string and all matched so far, found it 
         
          jmp compare_func        ;continue string comparison 
         
          wrong_func: 
         
          loop for_each_func      ;try next function in array 
         
          xor rax, rax            ;function not found in export table 
         
          jmp done 
         
          found_func:                 ;ecx is array index where function name found 
         
          ;r8 points to _image_export_directory structure 
         
          mov r9d, [r8+24h]       ;AddressOfNameOrdinals (rva) 
         
          add r9, rbx             ;add dll base address 
         
          mov cx, [r9+2*rcx]      ;get ordinal value from array of words 
         
          mov r9d, [r8+1ch]       ;AddressOfFunctions (rva) 
         
          add r9, rbx             ;add dll base address 
         
          mov eax, [r9+rcx*4]     ;Get RVA of function using index 
         
          cmp rax, r13            ;see if func rva falls within range of export dir 
         
          jl not_forwarded 
         
          cmp rax, r14            ;if r13 <= func < r14 then forwarded 
         
          jae not_forwarded 
         
          ;forwarded function address points to a string of the form <DLL name>.<function> 
         
          ;note: dll name will be in uppercase 
         
          ;extract the DLL name and add ".DLL" 
         
          lea rsi, [rax+rbx]      ;add base address to rva to get forwarded function name 
         
          lea rdi, [rsp+30h]      ;using register storage space on stack as a work area 
         
          mov r12, rdi            ;save pointer to beginning of string 
         
          copy_dll_name: 
         
          movsb 
         
          cmp byte ptr [rsi], 2eh     ;check for '.' (period) character 
         
          jne copy_dll_name 
         
          movsb                               ;also copy period 
         
          mov dword ptr [rdi], 004c4c44h      ;add "DLL" extension and null terminator 
         
          mov rcx, r12            ;r12 points to "<DLL name>.DLL" string on stack 
         
          call r15                ;call LoadLibraryA with target dll 
         
          mov rcx, r12            ;target dll name 
         
          mov rdx, rsi            ;target function name 
         
          jmp start               ;start over with new parameters 
         
          not_forwarded: 
         
          add rax, rbx            ;add base addr to rva to get function address 
         
          done: 
         
          add rsp, 28h            ;clean up stack 
         
          ret 
         
          lookup_api endp 
         
          end

Building

In the past I had developed 32-bit shellcode using the free and open-source Netwide Assembler (NASM), but when going through the exercise of learning the 64-bit variety I figured I would try it out with the Microsoft Assembler (MASM) instead. One problem quickly became apparent: MASM offers no way (that I know of) to generate raw binary machine code as opposed to an .exe file! All is not lost though, the code bytes can be extracted from the .exe file easily enough (but in the future I might go back to NASM).

First build a regular executable (note that no /defaultlib arguments are required – this code does not directly import any functions from DLLs because it looks them up itself):

ml64 shell64.asm /link /entry:main

Then use dumpbin to display the section headers, and take note of the virtual size and file pointer to raw datafor the .text section:

dumpbin /headers shell64.exe

SECTION HEADER #1
   .text name
     1B2 virtual size
    1000 virtual address (0000000140001000 to 00000001400011B1)
     200 size of raw data
     200 file pointer to raw data (00000200 to 000003FF)
   [...]

Converting these numbers to decimal, this means we need to extract 434 (0x1b2) bytes beginning at offset 512 (0x200) in the file. This can be done with a hex editor, or with the following command if you have a Windows version of dd laying around (I’m using Cygwin):

dd if=shell64.exe of=shell64.bin bs=1 count=434 skip=512

Now we have a file shell64.bin containing our shellcode. I like to open it in IDA Pro the first time and make sure it looks right.

Testing

The following test program simply loads data from a file into memory and then transfers execution to it. It supports an optional argument -d which will insert a debugger breakpoint prior to calling the shellcode. All of the error-handling code is long and tedious, yes, but debugging shellcode can be difficult enough without having to worry about whether the test program is working correctly. There is also a free tool called testival available for testing shellcode, which supposedly has some nice features but I have not personally tried it.

Note the call to VirtualProtect() to enable execute permission on the allocated memory. This is necessary because the process heap memory is non-executable by default on 64-bit Windows. This is called Data Execution Prevention (DEP) and was designed specifically as a security measure. Without the VirtualProtect() call, the program will crash with an Access Violation on the first instruction of the shellcode (debugging note: the !vprotcommand in WinDbg can be used to display the memory permissions for a given address). Bypassing DEP involves a technique called Return-Oriented Programming (ROP) which is beyond the scope of this article (see mitigations section at the end).

Also note the use of compiler intrinsics to insert the debugger breakpoint. Inline assembly language is not allowed by the x64 Visual C++ compiler, so we can no longer write __asm int 3 to trigger a debugger as in x86 and must use the __debugbreak() macro instead (it produces the same int 3 opcode). Take a look through intrin.h – there are numerous such macros available.

 
          //runbin.c 
         
          #include <windows.h> 
         
          #include <stdio.h> 
         
          #include <io.h> 
         
          #include <stdlib.h> 
         
          #include <malloc.h> 
         
          #include <fcntl.h> 
         
          #include <intrin.h> 
         
          typedef 
           void 
           (*FUNCPTR)();  
         
          int 
           main( 
          int 
           argc,  
          char 
           **argv) 
         
          { 
         
          FUNCPTR func; 
         
          void 
           *buf; 
         
          int 
           fd, len; 
         
          int 
           debug; 
         
          char 
           *filename; 
         
          DWORD 
           oldProtect; 
         
          if 
           (argc == 3 &&  
          strlen 
          (argv[1]) == 2 &&  
          strncmp 
          (argv[1],  
          "-d" 
          , 2) == 0) { 
         
          debug = 1; 
         
          filename = argv[2]; 
         
          }  
          else 
           if 
           (argc == 2) { 
         
          debug = 0; 
         
          filename = argv[1]; 
         
          }  
          else 
           { 
         
          fprintf 
          (stderr,  
          "usage: runbin [-d] <filename>\n" 
          ); 
         
          fprintf 
          (stderr,  
          "  -d    insert debugger breakpoint\n" 
          ); 
         
          return 
           1; 
         
          } 
         
          fd = _open(filename, _O_RDONLY | _O_BINARY); 
         
          if 
           (-1 == fd) { 
         
          perror 
          ( 
          "Error opening file" 
          ); 
         
          return 
           1; 
         
          } 
         
          len = _filelength(fd); 
         
          if 
           (-1 == len) { 
         
          perror 
          ( 
          "Error getting file size" 
          ); 
         
          return 
           1; 
         
          } 
         
          buf =  
          malloc 
          (len); 
         
          if 
           (NULL == buf) { 
         
          perror 
          ( 
          "Error allocating memory" 
          ); 
         
          return 
           1; 
         
          } 
         
          if 
           (0 == VirtualProtect(buf, len, PAGE_EXECUTE_READWRITE, &oldProtect)) { 
         
          fprintf 
          (stderr,  
          "Error setting memory executable: error code %d\n" 
          , GetLastError()); 
         
          return 
           1; 
         
          }         
         
          if 
           (len != _read(fd, buf, len)) { 
         
          perror 
          ( 
          "error reading from file" 
          ); 
         
          return 
           1; 
         
          } 
         
          func = (FUNCPTR)buf; 
         
          if 
           (debug) { 
         
          __debugbreak(); 
         
          } 
         
          func(); 
         
          return 
           0; 
         
          }

Build the test program with:

cl runbin.c

Then test the shellcode as follows:

runbin shell64.bin

If all goes well the message box should be seen:

If you want to step through it in a debugger, add the –d option:

runbin –d shell64.bin

For this to work, a Just-In-Time (JIT) debugger (also known as postmortem debugger) must be configured on the system. To enable WinDbg as the JIT debugger, run windbg –I from the command line. For more information see Configuring Automatic Debugging.

Comments

This shellcode was written from scratch with the goal of making it easy to understand (as much as shellcode can be anyway) and to demonstrate how everything works. It is not the smallest or most optimized code possible. There are many other published shellcode examples out there, and the Metasploit source code is particularly worth a look (the path is /external/source/shellcode/windows/x64/src/).

Most shellcode does not handle forwarded exports as in this example, because it bloats and complicates the code and can be worked around by determining in advance if the function is forwarded and just writing your code to call the ultimate target instead. (The only catch is that whether an export is forwarded can change between operating system versions or even service packs, so supporting forwarded exports does in fact make the shellcode more portable.)
A common variation on the technique for locating a function is to iterate through the export table computing a “hash” of each function name, and then comparing it to a pre-computed hash value of the name of the function we’re interested in. This has the advantage of making the shellcode smaller, particularly if it uses many API functions with lengthy names, as the code only needs to contain short hash values rather than full strings like “ExitProcess”. The technique also serves to obscure which functions are being called and has even been used by stand-alone malicious executables for this purpose. Metasploit goes even further and computes a single hash that covers both the function name and DLL name.
It is also common practice to “encrypt” or “encode” the shellcode (typically with just a simple XOR type of algorithm rather than true strong encryption), for the purpose of obfuscation and/or avoiding particular byte values in the code (such as zeroes) that could prevent an exploit from working. The encrypted code is then prepended with a “decoder” stub that decrypts and executes the main code.
Most shellcode does not bother with the error handling I put in place to return zero if the DLL or function cannot be found, again because it makes the code larger and is not necessary once everything is tested.
The lookup_api function does not entirely behave itself according to the x64 calling conventions – in particular it does not bother to save and restore all of the registers that are deemed non-volatile. (A function is allowed to modify rax, rcx, rdx, r8, r9, r10, and r11, but should preserve the values of all others). It also makes an assumption that r15 will point to LoadLibraryA if needed for forwarded functions.
Metasploit and others use NASM instead of MASM as the assembler (probably a good call given the aforementioned limitation of MASM for outputting raw binary, also NASM is open source and runs on Linux and other platforms).
Metasploit uses decimal numbers for the various offsets into the data structures whereas I prefer hex (“You might be a geek if…”).

Mitigations

Unfortunately for exploit developers and fortunately for PC users, the latest versions of Windows employ a variety of effective exploit mitigation technologies. None of these features truly eliminate vulnerabilities but they can make it significantly more difficult to execute arbitrary code via an exploit as opposed to simply crashing the program. For more information on many of these mitigations and techniques for bypassing them, the Corelan exploit writing tutorials are excellent (32-bit centric but still mostly applicable to x64).

Data Execution Prevention (DEP) – This was discussed earlier regarding the VirtualProtect() call in the test program. By default the stack and heap are configured to use non-executable memory pages which trigger an Access Violation if code attempts to execute there. DEP can be bypassed using Return-Oriented Programming (ROP), where snippets of existing executable code on the system are executed in sequence to accomplish a particular task.
Address Space Layout Randomization (ASLR) – Rather than loading DLLs and EXEs at constant base addresses, the operating system randomly varies the load address (at least across reboots, not necessarily between every invocation of a program). ASLR does not prevent shellcode from executing (this example code runs just fine with it), but it makes it more difficult to transfer execution to the shellcode in the first place. It also makes bypassing DEP using ROP much more difficult. There are several approaches to bypassing ASLR, including the use of a secondary information-disclosure vulnerability to obtain the base address of a module.
Stack cookies – Compiler-generated code is inserted before and after functions to detect if the return address on the stack has been overwritten, making it more difficult to exploit stack-based buffer overflow vulnerabilities.
Structured Exception Handler (SEH) overwrite protection – this is not applicable to x64 because exception handlers are not stored on the stack.
Export Address Table Filtering (EAF) – This is a new option released as part of the Enhanced Mitigation Experience Toolkit (EMET) in November 2010. It is designed to block shellcode from looking up API addresses by accessing DLL export tables, and works by setting a hardware breakpoint on memory access to certain data structures. Microsoft acknowledges that it can be easily bypassed but argues that it will break almost all shellcode currently in use today, and that EMET can be updated in response to new attack techniques at much more frequent intervals than new releases of Windows are possible. See this article on bypassing EAF for details.

cosmoslife

发布了172 篇原创文章 · 获赞 132 · 访问量 189万+

私信关注

64位shellcode编程(不错) Windows x64 Shellcode