csapp link

Copyright statement: This article is reprinted by the blogger, following the CC 4.0 BY-SA copyright agreement, and the link to the source of the original text and this statement are attached.
Original link: https://blog.csdn.net/weixin_44813883/article/details/102856987

What is the link?

Linking is the process of collecting and combining various code and data fragments into a single file, which can be loaded (copied) into memory and executed. Linking can be performed during compilation, loading, and runtime. In the early days, linking was performed manually. In modern systems, linking is performed automatically by a program called linker.

Insert picture description here

  • When preprocessing : the process of processing

    • Pre-compiled instructions beginning with #; delete "#define" and expand the defined macro; process all conditional pre-compiled instructions, such as #if, #define; insert header files into #include, you can use recursion Processing; delete all comments, add line number and file name identifier; keep all #pragma compilation instructions.
    • After preprocessing, what is obtained is the processing file (hello.i), which is still aReadable text file, But does not contain any macro definitions.
  • Compile time : Translate the source code into machine code. The process of compilation is to perform lexical, grammatical, and semantic analysis on the obtained preprocessed file, and generate assembly code files.The compiled code file is still a readable file, but the CPU cannot understand and execute it

  • When compiling :

    • The assembly file code is composed of assembly instructions; the assembler (as) is used to convert assembly language programs intoMachine instruction sequence
    • Assembly instructions and machine instructions are both machine instructions, and the composed program is called machine and code.
    • The result of the compilation is aRelocatable target file, The included result isUnreadable binary code
  • Link : Combine multiple relocatable object files to generate executable object files.

    • Assembled language: use mnemonics to represent operation codes, symbols to represent positions, and mnemonics to represent registers.
    • Link operation steps: determine the symbol reference relationship (symbol analysis); merge related .o files; determine the address of each symbol (advantages: modularization, divided into multiple source program files; high efficiency, can be compiled separately); in the instruction Fill in the new address.

The execution method in gcc:
1. Preprocessing: hello.c becomes helloc.i (command gcc -E or cpp)
2. Compilation: hello.i becomes hello.s (command gcc -0g -S)
3. Assembly: hello.s becomes hello.o (gcc or as)
4. Link: hello.o + required static library-hello (gcc or ld)

What is the nature of the link?

Insert picture description here
It can be seen from the figure that the essence of connection is to merge the same sections, and symbol analysis is required before merging.

Symbol resolution
  • Symbols are divided into strong symbols and weak symbols
    • Strong symbols: function names and initialized global variable names are strong symbols
    • Weak symbols: uninitialized global variable names are weak symbols, and external symbols refer to function declarations
  • Use the following rules in the Linux linker to handle multiple defined symbol names:
    • Rule 1: Multiple strong symbols with the same name are not allowed;
    • Rule 2: If there is a strong symbol and multiple weak symbols with the same name, then choose the strong symbol;
    • Rule 3: If there are multiple weak symbols with the same name, choose any one of these weak symbols.
  • Rules for handling multiple defined symbols: Strong symbols can only be defined once, otherwise the link is wrong.

After knowing this knowledge, let's take a look at how it is in Linux.

//mismatch-main.c
long int x;  /* Weak symbol */
#include <stdio.h>
	int main(int argc, char *argv[]) {
    
    
	printf("%ld\n", x);
return 0;
}
 
  
  
 
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
//文件mismatch-variable
/* Global strong symbol */
double x = 3.14;
 
  
  
 
  
  
  • 1
  • 2

In the first paragraph of code, a long integer weak symbol x is defined and no initial value is assigned. In the second paragraph of code mismatch-variable, a long integer strong symbol x is defined and assigned a value of 3.14.
Enter under Linux platformgcc -Wall -Og -o mismatch mismatch-main.c mismatch-variable.c

Can get a relocation target file mismatch. At this time, we can enter ./mismatch and get a result of 4614253070214989087.
Insert picture description hereSo why is this result?
It is because of the strong symbol x=3.14, and the storage, operation, and representation of floating-point numbers in the computer are in accordance with the IEEE754 standard.
In the command gcc -Wall -Og -o mismatch mismatch-main.c mismatch-variable.c

  • -Wall means that all useful alarm information provided by gcc is allowed
  • -Og means to enable global optimization
  • -o means to set the output file name, without this option, the executable file name is a.out by default
  • In addition, this command can also link .o files to executable object files.
//global.h
extern int g;
int f();
 
  
  
 
  
  
  • 1
  • 2
//global-c1.c
#include "global.h"
int f() {
    
    
    return g+1;
}
 
  
  
 
  
  
  • 1
  • 2
  • 3
  • 4
//global-c2.c
#include <stdio.h>
#include <stdlib.h>
#include "global.h"
int g = 0;
	int main(int argc, char *argv[]) {
    
    
 	if (argc >= 2) {
    
    
 	g = atoi(argv[1]);
    	}
    	printf("g = %d.  f() = %d\n", g, f());
    return 0;
}
 
  
  
 
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

Enter gcc -Wall -Og -o global global.h global-c1.c global-c2.c under Linux and then enter ./global to get the following results

Insert picture description here
Here, a global variable g of type int is defined, and an initial value of 0 is assigned, so the execution result is no problem.

Link with static library

All compilation systems provide a mechanism to package all related object modules into a single file calledStatic library, It can be used as input to the linker.
The library function module is that many functions such as printf, scanf, sqrt and other functions do not need to be written by themselves, as long as they are called from shared library functions.
In Linux systems, static libraries are stored on disks in a special archive format. The archive file is a set of connected relocatable object files. There is a header that is not used to describe the size and location of each member object file. The archive file name is suffixed.aIdentification, which can enhance the function of the linker, and resolve the symbols by searching for the defined symbols in one or more library files.

To create these functions, we need to use AR tools
ar (archive program) canPackage the formulated .o files to generate static library files.
Let's look at an example

/* addvec.c */
/* $begin addvec */
int addcnt = 0;
void addvec(int *x, int *y,
     int *z, int n) 
{
    
    
    int i;
    addcnt++;
    for (i = 0; i < n; i++)
 z[i] = x[i] + y[i];
}
/* $end addvec */
 
  
  
 
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
/* multvec.c */
/* $begin multvec */
int multcnt = 0;
void multvec(int *x, int *y, 
      int *z, int n) 
{
    
    
    int i;
     multcnt++;
     for (i = 0; i < n; i++)
 z[i] = x[i] * y[i];
}
/* $end multvec */
 
  
  
 
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

Pack these two pieces of code into libvector library functions
by using

  • gcc -c addvec.c multvec.c
  • ar rcs libvector.a addvec.o multvec.o
/* main2.c */
/* $begin main2 */
#include <stdio.h>
#include "vector.h"
int x[2] = {
    
    1, 2};
int y[2] = {
    
    3, 4};
int z[2];
int main() 
{
    
    
    addvec(x, y, z, 2);
    printf("z = [%d %d]\n", z[0], z[1]);
    return 0;
}
/* $end main2 */
 
  
  
 
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

The header file vector.h is included in the main2.c code segment, which defines the function prototype of the routine in libvextor.a.
Next, enter

  • gcc -c main2.c
  • gcc -static -o prog2c main2.o ./libvector.a
  • ./prog2c
    The following results can be obtained: The
    Insert picture description here
    linking process is as follows:
    Insert picture description here
reset

After the linker completes the symbol resolution, it can be relocated. Relocation consists of two parts:

  • Relocating section and symbol definitions, the linker merges all sections of the same type into a new aggregate class of the same type.
  • Relocate the symbol references in the section. In this step, the linker modifies the reference to each symbol in the code section and data section so that they point to the correct runtime address. To perform this step, the linker mainly relies on a data structure called relocation entry in the relocatable target module.
    Relocatable target file

Insert picture description here

  • .text: The machine code of the compiled program
  • .rodata: read-only data, such as the format string in the printf statement and the jump table of the switch statement
  • .data: initialized global and static C variables, local C variables are stored in the stack
  • .bss: Uninitialized global and static C variables, and all global or static variables that are initialized to 0
  • .symtab: Symbol table, which stores information about functions and global variables defined and referenced in the program

Next we look at the symbols in the code

#include <stdio.h>
int time;
int foo(int a) {
    
    
    int b = a + 1;
    return b;
}
int main(int argc, char *argv[])
{
    
    
    printf("%d\n", foo(5));
    return 0;
}
 
  
  
 
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

Knock in the Linux environmentreadelf -s symbols.o
Can get
Insert picture description here

  • It can be known that foo and main are global variables, and they are both functions, so they are strong symbols.
    foo is 4 bytes, because it is int type
  • printf has no type, it is a global variable, UND should mean undefined
  • Time also occupies 4 bytes. It is an int type variable. It is also a global variable. Then it is COM, which means that it is an uninitialized global variable.
  • At the same time, there are many local variables in the code.

objdump -dx main.o

  • -d means decompile the code segment
  • -x displays all available header information, including symbol table, relocation entry, use this option to mark the content of the
    Insert picture description here
    file, so that we can see and understand that this is a relocatable file rather than an executable object file, so its starting address Is 0. In the section part, most section names have a corresponding, among which File off indicates the offset address of each section in the ELF, not the actual address. Size indicates the size of each section. For example, the address of the .data section is 0x00000054 plus the length of 0x00000008, which is 0x0000005c, which is the address of the .bss section.
    objdump -dx -j .data main.o
  • -j name Displays the information of the section named nane,
    which is only displayed in the disassembly of main.o. Information in the .data section Insert picture description here
    Using this command we will only see the information in the .data section, and finally disassemble the global variable array stored in the .data section. Since it has been initialized and the machine is in little-endian mode,00000000 <array>: 0: 01 00 00 00 02 00 00 00

I'm really too rubbish. This assignment took a month to complete. When I first learned it, I really didn’t, and I don’t know how the code given by the teacher will run on Linux. I have read a lot of logs written by other students in the past two days. After pondering and practicing, I finally finished it. After writing these things, I found that I have a better understanding of knowledge. After all, I have been dealing with it in these two days. In short, I am still too bad, and I still need to work hard!


Much of the information in this article comes from "In-depth understanding of computer systems". The first two pictures are from the courseware of teacher Yuan Chunfeng of Nanjing University. One picture is from the PPT given by the teacher. Part of the content is referenced at https://blog.csdn .net/ziyonghong/article/details/101560077
https://blog.csdn.net/angranxueer/article/details/102236976

Add something
in the hello.c file

#include <stdio.h>
int main()
{
    
    
 printf("Hello World!");
                return 0;
}
 
  
  
 
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • View with readelf -S hello.o
    Insert picture description here
    Insert picture description here
  • View with readelf -h hello.o
    Insert picture description here
  • View with readelf -s hello.o
    Insert picture description here
                                </div>
            <link href="https://csdnimg.cn/release/phoenix/mdeditor/markdown_views-b6c3c6d139.css" rel="stylesheet">
                </div>
Copyright statement: This article is reprinted by the blogger, following the CC 4.0 BY-SA copyright agreement, and the link to the source of the original text and this statement are attached.
Original link: https://blog.csdn.net/weixin_44813883/article/details/102856987

Guess you like

Origin blog.csdn.net/xiongdan626/article/details/103424797