9.7. Relocation and Position Independent Code (PIC)

9.7. Relocation and Position Independent Code (PIC)

In this section we’ll investigate the difference between position independent code (known as “PIC” from here on) and non-position-independent code and how both affect relocation.

在本节中,我们将研究位置无关代码(此处称为“PIC”)与位置无关代码之间的区别以及两者如何影响重定位。

9.7.1. PIC vs. non-PIC

Much of the complexity in the ELF standard is due to the need to load shared libraries at different locations in a process’ address space. Objects built to be position-independent are specifically meant to be loaded anywhere in the address space. As discussed earlier, the code in ELF files contains relative references to data and relies on the PLT and GOT to resolve the symbols at run time. Let’s take a look at position independent code in more detail, though, because it is an important concept for ELF.

ELF标准的大部分复杂性是由于需要在进程的地址空间中的不同位置加载共享库。 构建与位置无关的对象可以加载到地址空间中的任何位置。如前所述,ELF文件中的代码包含对数据的相对引用,并依赖于PLT和GOT在运行时解析符号。我们更详细地看一下与位置无关的代码,因为它是ELF的一个重要概念。

Consider the following source code:

#include <stdio.h>

 

extern "C" int otherFunction( int val )

{

    return 23 ;

}

 

int myGlobInt = 12;

 

int buzz( void )

{

   int intVal ;

 

   intVal = myGlobInt + otherFunction( 5 ) ;

 

   return intVal ;

}

 

int main( )

{

   printf( "buzz: %d\n", buzz() ) ;

 

   return 0 ;

}

 

Make note that the source code shows a direct call to “buzz” as part of the call to the printf function. How the function is called is not important, but rather that it is called from within the scope of the main function.

请注意,源代码显示直接调用“buzz”作为printf函数调用的一部分。如何调用函数并不重要,而是从main函数的范围内调用它。

This little code snippet contains a few functions and a global variable. Let’s see how the resulting ELF object file differs when it is compiled as PIC or non-PIC. The -fPIC switch tells the g++ compiler to build with position-independent code.

这个小代码片段包含一些函数和一个全局变量。让我们看看生成的ELF目标文件在编译为PIC或非PIC时的差异。-fPIC开关告诉g ++编译器使用与位置无关的代码进行构建。

penguin> g++ -c pic.C -o pic_nopic.o

penguin> g++ -fPIC -c pic.C -o pic.o

 

The first thing worth noting is that the resulting file sizes are different:

首先要注意的是,生成的文件大小不同:

penguin> ls -l pic*.o

-rw-r—r—    1 wilding  build       1016 Dec 28 15:14 pic.o

-rw-r—r—    1 wilding  build        924 Dec 28 15:09 pic_nopic.o

 

The position-independent code is larger by 92 bytes. But what is different about the actual contents of the files? To find out, we need to look deeper. The first tool we’ll use is nm:

与位置无关的代码大于92字节。 但是文件的实际内容有什么不同? 要找到答案,我们需要深入了解。 我们将使用的第一个工具是nm:

penguin> nm -S pic.o

         U _GLOBAL_OFFSET_TABLE_

0000000a 00000036 T _Z4buzzv

00000040 00000045 T main

00000000 00000004 D myGlobInt

00000000 0000000a T otherFunction

         U printf

 

penguin> nm -S pic_nopic.o

0000000a 00000021 T _Z4buzzv

0000002c 00000033 T main

00000000 00000004 D myGlobInt

00000000 0000000a T otherFunction

         U printf

 

The PIC version includes the global offset table as a required symbol; whereas, the non-PIC version does not. We know that the GOT is used to support relocation, so this makes sense. The other important difference is that the functions have difference sizes. Let’s take a look at the assembly instructions for function main using the objdump tool (only the output for the function main is shown here):

PIC版本包括全局偏移表作为必需符号; 而非PIC版本没有。 我们知道GOT用于支持重定位,所以这是有道理的。 另一个重要的区别是函数有不同的大小。 让我们使用objdump工具查看函数main的汇编程序(此处仅显示函数main的输出):

Code View: Scroll / Show All

penguin> objdump -d pic.o

...

00000040 <main>:

  40:   55                     push   %ebp

  41:   89 e5                  mov    %esp,%ebp

  43:   53                     push   %ebx

  44:   83 ec 04               sub    $0x4,%esp

  47:   e8 00 00 00 00         call   4c <main+0xc>

  4c:   5b                     pop    %ebx

  4d:   81 c3 03 00 00 00      add    $0x3,%ebx

  53:   83 e4 f0               and    $0xfffffff0,%esp

  56:   b8 00 00 00 00         mov    $0x0,%eax

  5b:   29 c4                  sub    %eax,%esp

  5d:   83 ec 08               sub    $0x8,%esp

  60:   83 ec 08               sub    $0x8,%esp

  63:   e8 fc ff ff ff         call   64 <main+0x24>

  68:   83 c4 08               add    $0x8,%esp

  6b:   50                     push   %eax

  6c:   8d 83 00 00 00 00      lea    0x0(%ebx),%eax

  72:   50                     push   %eax

  73:   e8 fc ff ff ff         call   74 <main+0x34>

  78:   83 c4 10               add    $0x10,%esp

  7b:   b8 00 00 00 00         mov    $0x0,%eax

  80:   8b 5d fc               mov    0xfffffffc(%ebp),%ebx

  83:   c9                     leave

  84:   c3                     ret

 

penguin> objdump -d pic_nopic.o

...

0000002c <main>:

  2c:   55                     push   %ebp

  2d:   89 e5                  mov    %esp,%ebp

  2f:   83 ec 08               sub    $0x8,%esp

  32:   83 e4 f0               and    $0xfffffff0,%esp

  35:   b8 00 00 00 00         mov    $0x0,%eax

  3a:   29 c4                  sub    %eax,%esp

  3c:   83 ec 08               sub    $0x8,%esp

  3f:   83 ec 08               sub    $0x8,%esp

  42:   e8 fc ff ff ff         call   43 <main+0x17>

  47:   83 c4 08               add    $0x8,%esp

  4a:   50                     push   %eax

  4b:   68 00 00 00 00         push   $0x0

  50:   e8 fc ff ff ff         call   51 <main+0x25>

  55:   83 c4 10               add    $0x10,%esp

  58:   b8 00 00 00 00         mov    $0x0,%eax

  5d:   c9                     leave

  5e:   c3                     ret

 

The non-PIC version is certainly smaller, and there is a good reason for this. It is also interesting that neither version makes a direct call to the function buzz(). For that matter, there is no direct call to printf either. The secret here is relocation and how it works with PIC and non-PIC code.

非PIC版本肯定更小,这是有充分理由的。 同样有趣的是,两个版本都没有直接调用函数buzz()。 就此而言,也没有直接调用printf。 这里的秘密是重定位以及它如何与PIC和非PIC代码一起使用。

The PIC code needs first to find the procedure linkage table before it can make a call to the function buzz(). This is because buzz() could be anywhere in the address space. The non-PIC code, on the other hand, can make some assumptions that the buzz() will eventually be at a predictable offset from any code that needs it. Well, sort of. There is an exception listed later under “Relocation and Linking.” In any case, let’s see how relocation is affected by position-independent code.

PIC代码需要先查找进程链接表,然后才能调用函数buzz()。 这是因为buzz()可以在地址空间的任何地方。另一方面,非PIC代码可以做出一些假设,即buzz()最终会与任何需要它的代码产生可预测的偏移。 好吧,有点太神了。 稍后在“重定位和链接”下列出了一个例外。在任何情况下,让我们看看重定位如何受位置无关代码的影响。

9.7.2. Relocation and Position Independent Code

As discussed before, relocation is a mechanism used to change values in a shared library or executable when it is loaded into a process’ address space. As discussed earlier, calls to either printf() or buzz() at compile time would be premature because the compiler doesn’t know where these functions will be located at run time.

如前所述,重定位是一种机制,用于在加载到进程的地址空间时更改共享库或可执行文件中的值。 如前所述,在编译时调用printf()或buzz()是不成熟的,因为编译器不知道这些函数运行时的位置。

For simplicity, let’s look at the relocation information for the non-PIC version first:

为简单起见,我们先看一下非PIC版本的重定位信息:

penguin> readelf -r pic_nopic.o

 

Relocation section '.rel.text' at offset 0x378 contains 5 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

00000016  00000702 R_386_PC32        00000000   otherFunction

0000001f  00000801 R_386_32          00000000   myGlobInt

00000043  00000902 R_386_PC32        0000000a   Z4buzzv

0000004c  00000501 R_386_32          00000000   .rodata

00000051  00000b02 R_386_PC32        00000000   printf

 

The relocations solve the mystery of the missing calls to buzz() and printf() in the previous section on PIC vs non-PIC. The relocation for buzz() instructs the run time linker to change the 32-bit value at offset 0x43 in the .text section to the eventual, run time location of the function buzz(). A quick look at the assembly language at 0x42 makes the purpose of this relocation even more clear:

重定位解决了上一节关于PIC与非PIC的buzz()和printf()缺失调用的谜团。 buzz()的重定位指示运行时链接程序将.text部分中偏移量0x43处的32位值更改为函数buzz()的最终运行时位置。 快速查看0x42处的汇编语言使得此重定位的目的更加清晰:

42:    e8 fc ff ff ff          call    43 <main+0x17>

 

The current instruction calls a false instruction at 0x43 because it will be relocated at a later time anyway. After the relocation, the 32-bit value at 0x43 will point to the address of buzz, so the call instruction at 0x42 will be correct. The same mechanism is used for printf at offset 0x51.

当前指令在0x43处调用错误指令,因为它将在以后重新定位。重定位后,0x43处的32位值将指向buzz的地址,因此0x42处的调用指令将是正确的。相同的机制用于偏移0x51处的printf。

Looking at the relocation information for the PIC object reveals some interesting differences:

查看PIC对象的重定位信息可以发现一些有趣的差异:

penguin> readelf -r pic.o

 

Relocation section '.rel.text' at offset 0x3c8 contains 7 entries:

 Offset     Info    Type        Sym.Value  Sym. Name

0000001a  00000a0a R_386_GOTPC  00000000   _GLOBAL_OFFSET_TABLE_

00000020  00000803 R_386_GOT32  00000000   myGlobInt

0000002a  00000704 R_386_PLT32  00000000   otherFunction

0000004f  00000a0a R_386_GOTPC  00000000   GLOBAL_OFFSET_TABLE_

00000064  00000904 R_386_PLT32  0000000a   _Z4buzzv

0000006e  00000509 R_386_GOTOFF 00000000   .rodata

00000074  00000c04 R_386_PLT32  00000000   printf

 

Notice that the relocation entries for the PIC and non-PIC object files have different types for the functions and variables. In the PIC version, the relocation types are PLT32 and for the non-PIC version, the relocation types are PC32. The PLT32 is a type of relocation used with the procedure linkage table. A relocation of PC32 is a more primitive form of relocation.

请注意,PIC和非PIC对象文件的重定位条目具有不同的函数和变量类型。 在PIC版本中,重定位类型是PLT32,而对于非PIC版本,重定位类型是PC32。 PLT32是一种与进程链接表一起使用的重定位。 PC32的重定位是一种更原始的重定位形式。

There is an obvious performance impact when using position-independent code. A few years ago, a benchmark measured the impact at about 2 to 3%, although the actual percentage will depend on many factors (average size of functions, and so on). Regardless of the performance implications, position-independent code is required and effective and is used widely on Linux.

使用与位置无关的代码时会产生明显的性能影响。 几年前,一个基准测试的影响大约在2%到3%之间,尽管实际百分比将取决于许多因素(功能的平均大小,等等)。 无论性能影响如何,与位置无关的代码都是必需且有效的,并且在Linux上得到广泛使用。

9.7.3. Relocation and Linking

As discussed earlier in the chapter, linking is the process of matching or binding undefined symbols to defined symbols of the same type and name. Linking can be done when a shared library or executable is actually created or at run time, although the mechanisms are very different. The relocation entries for object files are processed during the link phase, and relocation entries in executables and shared libraries are processed at run time.

正如本章前面所讨论的,链接是将未定义的符号匹配或绑定到相同类型和名称的定义符号的过程。 实际创建共享库或可执行文件时或在运行时可以进行链接,尽管机制非常不同。 在链接阶段处理目标文件的重定位条目,并在运行时处理可执行文件和共享库中的重定位条目。

When creating an executable or shared library, the linker (usually called “ld”) will try to resolve undefined function symbols using the defined function symbols found in the constituent object files. This is where the main symbol table is used. Static functions are referenced through relative addressing, as are global functions. The main difference is that static functions will not be included in the dynamic symbol table.

创建可执行文件或共享库时,链接器(通常称为“ld”)将尝试使用组成对象文件中找到的已定义函数符号来解析未定义的函数符号。这是使用主符号表的地方。静态函数通过相对寻址引用,全局函数也是如此。主要区别在于静态函数不会包含在动态符号表中。

Let’s take a look at how the linker processes the relocation entries for pic.o. For quick reference, here are the relocation entries for pic.o from before:

我们来看看链接器如何处理pic.o的重定位条目。 为了快速参考,以下是pic.o之前的重定位条目:

penguin> readelf -r pic.o

 

Relocation section '.rel.text' at offset 0x3c8 contains 7 entries:

 Offset     Info    Type         Sym.Value  Sym. Name

0000001a  00000a0a R_386_GOTPC   00000000   GLOBAL_OFFSET_TABLE_

00000020  00000803 R_386_GOT32   00000000   myGlobInt

0000002a  00000704 R_386_PLT32   00000000   otherFunction

0000004f  00000a0a R_386_GOTPC   00000000   GLOBAL_OFFSET_TABLE_

00000064  00000904 R_386_PLT32   0000000a   Z4buzzv

0000006e  00000509 R_386_GOTOFF  00000000   .rodata

00000074  00000c04 R_386_PLT32   00000000   printf

 

Each of the function symbols, including the ones that could be satisfied locally by the function symbols in pic.o, have a relocation entry. The global variable myGlobInt also has a relocation entry. Let’s see what happens when the linker links the object file pic.o and creates an executable.

每个函数符号,包括可由pic.o中的函数符号,具有重定位条目。全局变量myGlobInt也有一个重定位条目。 让我们看看当链接器链接目标文件pic.o并创建可执行文件时会发生什么。

Note: The linker ld is called by g++. It is usually not a good idea to directly link an executable using ld. We could get away with it here because the source code does not include any C++ features. We will use g++ here as we have for the entire chapter because it allows us to show how ELF handles basic C++ features.

注意:链接器ld由g ++调用。使用ld直接链接可执行文件通常不是一个好主意。 我们可以在这里使用它,因为源代码不包含任何C ++功能。我们将在整个章节中使用g ++,因为它允许我们展示ELF如何处理基本的C ++特性。

 

penquin> g++ -o pic

pic.o penguin> readelf -r pic

 

Relocation section '.rel.dyn' at offset 0x28c contains 2 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

080495d4  00000106 R_386_GLOB_DAT    080494bc   myGlobInt

080495d8  00000606 R_386_GLOB_DAT    00000000   gmon_start__

 

Relocation section '.rel.plt' at offset 0x29c contains 2 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

080495cc  00000207 R_386_JUMP_SLOT   080482d4   libc_start_main

080495d0  00000307 R_386_JUMP_SLOT   080482e4   printf

 

There are a few differences. The pic.o object file had one relocation section called .rel.text, and the executable “pic” contains two relocation sections called .rel.dyn and .rel.plt. The relocation for function buzz() is also missing from the executable. This is because the reference was satisfied by the function buzz() in the object file.

有一些差异。 pic.o目标文件有一个名为.rel.text的重定位部分,可执行文件“pic”包含两个名为.rel.dyn和.rel.plt的重定位部分。 可执行文件中也缺少函数buzz()的重定位。 这是因为目标文件中的函数buzz()满足了引用。

To see what these relocation entries really do, we need to find out which sections they belong to:

要查看这些重定位条目的确实做了什么,我们需要找出它们属于哪些部分:

penguin> readelf -S pic |egrep "got|plt"

  [ 9] .rel.plt   REL      0804829c 00029c 000010 08  A  4  b  4

  [11] .plt       PROGBITS 080482c4 0002c4 000030 04 AX  0  0  4

  [21] .got       PROGBITS 080495c0 0005c0 00001c 04 WA  0  0  4

 

The GLOB_DAT entries have offsets of 0x80495d4 and 0x80495d8, both of which are in the global offset table. The purpose of these entries is to set the address of the symbol for this relocation entry in the corresponding slot of the global offset table. The executable code will be expecting it to be there when the program is loaded. The JUMP_SLOT relocation entries have offsets of 0x80495cc and 0x80495d0, and both of these are also in the GOT. These entries tell the run time linker to set entries in the GOT for the corresponding slots for the same symbol in the PLT. This is required for dynamic linking and in particular, lazy binding. See section “.plt” for more information.

GLOB_DAT条目的偏移量为0x80495d4和0x80495d8,两者都在全局偏移表中。 这些条目的目的是在全局偏移表的相应槽中设置此重定位条目的符号地址。 可执行代码将在程序加载时期望它存在。 JUMP_SLOT重定位条目具有0x80495cc和0x80495d0的偏移量,并且这两个条目也都在GOT中。 这些条目告诉运行时链接器在GOT中为PLT中相同符号的相应槽设置条目。 这是动态链接所必需的,特别是延迟绑定。 有关更多信息,请参见“.plt”部分。

If we link this object file as a shared library, the relocation entries are very different:

如果我们将此对象文件链接为共享库,则重定位条目会有很大不同:

penguin> g++ -shared pic.o -o libpic.so

penguin> readelf -r libpic.so

 

Relocation section '.rel.dyn' at offset 0x5d0 contains 7 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

00001888  00000008 R_386_RELATIVE

0000188c  00000008 R_386_RELATIVE

000019a4  00000008 R_386_RELATIVE

000019a8  00001d06 R_386_GLOB_DAT    00001890   myGlobInt

000019ac  00002206 R_386_GLOB_DAT    00000000   cxa_finalize

000019b0  00002706 R_386_GLOB_DAT    00000000   Jv_RegisterClasses

000019b4  00002806 R_386_GLOB_DAT    00000000   gmon_start__

 

Relocation section '.rel.plt' at offset 0x608 contains 5 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

00001990  00001a07 R_386_JUMP_SLOT   0000079e   Z4buzzv

00001994  00002007 R_386_JUMP_SLOT   00000000   printf

00001998  00002207 R_386_JUMP_SLOT   00000000   cxa_finalize

0000199c  00002307 R_386_JUMP_SLOT   00000794   otherFunction

000019a0  00002707 R_386_JUMP_SLOT   00000000   Jv_RegisterClasses

 

There are quite a few more relocation entries for the shared library than for the executable. This includes the function buzz() because the reference for buzz() might not be satisfied by the buzz() contained in the shared library. See the section, “Symbol Resolution,” for more details on how to force a shared library to use the symbols that it contains (“symbolic linking”).

共享库的重定位条目比可执行文件要多得多。 这包括函数buzz(),因为共享库中包含的buzz()可能不满足buzz()的引用。 有关如何强制共享库使用其包含的符号(“符号链接”)的更多详细信息,请参阅“符号解析”一节。

What if we try to create a shared library with the non-PIC object file? 如果我们尝试使用非 PIC 对象文件创建共享库, 该怎么办?

penguin> g++ -shared pic_nopic.o -o libnopic.so

penguin> readelf -r libnopic.so

 

Relocation section '.rel.dyn' at offset 0x5d0 contains 11 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

000007b0  00000008 R_386_RELATIVE

00001838  00000008 R_386_RELATIVE

0000183c  00000008 R_386_RELATIVE

00001950  00000008 R_386_RELATIVE

0000077a  00002302 R_386_PC32        00000764   otherFunction

00000783  00001d01 R_386_32          00001840   myGlobInt

000007a7  00001a02 R_386_PC32        0000076e   Z4buzzv

000007b5  00002002 R_386_PC32        00000000   printf

00001954  00002206 R_386_GLOB_DAT    00000000   cxa_finalize

00001958  00002706 R_386_GLOB_DAT    00000000   Jv_RegisterClasses

0000195c  00002806 R_386_GLOB_DAT    00000000   gmon_start__

 

Relocation section '.rel.plt' at offset 0x628 contains 2 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

00001948  00002207 R_386_JUMP_SLOT   00000000   cxa_finalize

0000194c  00002707 R_386_JUMP_SLOT   00000000   Jv_RegisterClasses

 

The shared library is created, although the relocations are very different. The relocation type for the functions is PC32 and modifies the executable code. How does this work, though? The text segment is always loaded as read-only, and yet these relocations apparently change some values in the text segment. For a better understanding of this special type of relocation, let’s build libfoo.so without using the -fPIC switch and then run the executable foo, seeing how it modifies the text segment of libfoo.so.

虽然重定位非常不同,但创建了共享库。 函数的重定位类型是PC32并修改可执行代码。 但是这怎么工作呢? 文本段始终以只读方式加载,但这些重定位显然会更改文本段中的某些值。 为了更好地理解这种特殊类型的重定位,让我们构建libfoo.so而不使用-fPIC开关,然后运行可执行文件foo,看看它如何修改libfoo.so的文本段。

penguin> g++ -c foo.C

penguin> g++ -shared foo.o -o libfoo.so

penguin> g++ -o foo main.o -L. -Wl,-rpath,. -lfoo

 

Now to use strace to see how this works under the covers:

现在使用strace来了解它是如何工作的:

Code View: Scroll / Show All

penguin> strace -o foo.st foo

This is a printf format string in baz

This is a printf format string in main

 

penguin> less foo.st

<...>

open("./libfoo.so", O_RDONLY)          = 3

read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360\7\0"..., 1024) = 1024

fstat64(3, {st_mode=S_IFREG|0755, st_size=7113, ...}) = 0

getcwd("/home/wilding/src/Linuxbook/ELF", 128) = 32

mmap2(NULL, 7412, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40014000

mprotect(0x40015000, 3316, PROT_NONE)   = 0

mmap2(0x40015000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x40015000

close(3)                                = 0

<...>

mprotect(0x40014000, 4096, PROT_READ|PROT_WRITE) = 0

mprotect(0x40014000, 4096, PROT_READ|PROT_EXEC) = 0

<...>

 

The first part of the output shows where libfoo.so is loaded into the address space. It is loaded at address 0x40014000 as shown with the mmap2 system call. Later on in the run, the program (the run time linker) uses mprotect to change the attributes of part of the text section to read/write and then back to read/exec. In between these two system calls is where the relocations take place in order to perform the relocations on the text file. These two calls to mprotect are not needed for the library that was built with a position-independent object because of the relocations action on the GOT, which is in the data segment.

输出的第一部分显示了libfoo.so加载到地址空间的位置。 它在地址0x40014000处加载,如mmap2系统调用所示。 稍后在运行中,程序(运行时链接程序)使用mprotect将文本部分的属性更改为读/写然后再返回read / exec。 在这两个系统调用之间是重定位发生的位置,以便在文本文件上执行重定位。 由于GOT上的重定位操作(在数据段中),因此对于使用位置无关对象构建的库,不需要这两个对mprotect的调用。

Even though this special type of relocation works, it is not used much in practice. Non-PIC code is not meant to be position-independent, and forcing it to be part of a shared library is not standard and not recommended

即使这种特殊类型的重新定位有效,但在实践中并没有太多用处。 非PIC代码并不意味着与位置无关,并且强制它成为共享库的一部分是不标准的,不建议使用

 

发布了234 篇原创文章 · 获赞 12 · 访问量 24万+

猜你喜欢

转载自blog.csdn.net/mounter625/article/details/102754150
pic
9.7