Programmer's Self-cultivation Notes: Chapter 11

Chapter 11 Runtime Library
1. A typical program running steps are as follows:
After the system creates a process, it hands control to the program entrance, and this entrance is called the entry function or entry point. This entrance is often an entrance in the runtime library. function.
The entry function initializes the runtime library and program running environment, including heap, IO, threads, global variable construction, etc.
After the entry function completes initialization, call the main function to formally execute the main part of the program.
After the main function is executed, it returns to the entry function. The entry function performs cleaning work, including global variable destruction, heap destruction, closing IO, etc., and then makes a system call (__exit) to end the process.

2. glibc entry function
The program entry point of glibc is _start (this entry is specified by the default link script of the ld connection, and you can set your own entry through relevant parameters). _start is implemented in assembly and is platform dependent.
Then the _start implementation of i386 is introduced.
Under Linux, environment variables and program parameters are pushed onto the stack by the loader.
_start sets ebp to 0 and gets argc and argv. Call _libc_start_main, its parameters are:
main function address,
argc,
argv,
init (initialization work before main call),
fini (finishing work after main),
rtld_fini (finishing work related to dynamic loading),
stack_end (i.e. Bottom of stack)
_libc_start_main sets the environment variable pointer and calls a series of functions. For example, check the operating system version. What is worthy of attention is the _cxa_atexit function, which is used to call the function specified by the parameter after the end of main, so the fini and rtld_fini of the parameters are both called after the end of main. Finally
result = main(argc,argv,__environ);
exit(result).
The exit function traverses the function linked list (__exit_funcs) registered by the __cxa_atexit and atexit functions. Finally, _exit (assembly implementation, platform dependent) is called, which calls the exit system call. It can be seen that whether the program explicitly calls exit or exits normally, it will enter the exit function.
hlt instructions after _start and _exit.
The hlt after _exit is to detect whether the AIDS system call is successful. If it fails, the program will not stop, and the hlt instruction can work to force the program to stop.
The hlt after _start is to prevent the exit function from being called. (For example, the exit at the end of _libc_start_main was accidentally deleted).
3. Details of the runtime library under Windows. The default entry function of MSVC's CRT is named mainCRTStartup (vs. crt0.c in 2003).
The process is as follows:
1. Initialize global variables related to the OS version
2. Initialize the heap.
3. Initialize I/O.
4. Get command line parameters and environment variables.
5. Initialize some data in the C library.
6. Call main and record the return value.
7. Check for errors and return the main return value.

4. MSVC initialization I/O
The concepts of file operations in Linux and Windows are similar to File, which are called file descriptors and handles respectively.
The platform for manipulating files in C language is the File structure, and the File structure must contain fd.
MSVC entry function initialization includes heap and I/O initialization.
Under 32-bit compilation conditions, MSVC heap initialization only calls the HeapCreate API.
I/O initialization is relatively responsible.
The open file table in user space is represented by ioinfo, and it is a two-dimensional array simulated by a pointer array (ioinfo __pioinfo[64][32]). It can accommodate a total of 2048 handles. The reason why a pointer array is used is because it saves space and does not require Allocate 2048 ioinfo at one time.
The _file of MSVC's FILE is fd, which indexes the open file table.
First initialize a second-dimensional open file table.
Then
obtain the handle inherited from the parent process according to the GetStartUpInfo API
and copy it to the currently open file table (a second-dimensional file table may be allocated).
Then initialize standard input and output.
The initialization work of MSVC is as follows:
Create an open work table.
If it is inherited from the parent process, copy it from the handle of the parent process to the open file table.
Initialize standard input and output.

5. The C language runtime library roughly includes the following functions:
startup and exit: including the entry function, that is, other functions on which the entry function depends, etc.
I/O: I/O initialization
Heap: Heap initialization
language implementation.
debug.
The C language standard library contains functions such as:
standard input and output, file operations, uniform operations, string operations, mathematical functions, resource management, format conversion, etc.
Variable-length parameters are implemented based on the calling conventions of the C language.

6. The runtime library is a platform-related C language runtime library. To a certain extent, it is an abstraction layer between C language programs and different operating system platforms.
glibc has evolved over time and developed into the c standard library under Linux.
The release version of glibc is mainly composed of header files, such as stdio.h, stdlib.h, located in /usr/include; the binary file part of the helibrary. The binary part is mainly the C language standard library, which has two static and dynamic versions. Version. The dynamic one is located in /lib/libc.so.6; the static one is located in /usr/lib/libc.a. In addition to the C standard library, glibc also has several runtime libraries for auxiliary program running. They are
/usr/lib/crt1.o /usr/lib/crti.o /usr/lib/crtn.o
crt1.o, which contains the entry function _start of the program.
In order to satisfy the construction and destruction of global objects in C. The .init and .fini sections are introduced in each object file. The runtime library guarantees that the code in these two sections will be executed before/after the main function. When linking, the linker will collect the .init and .fini in all input object files in order, merge them and output them into the corresponding two sections in the output file. However, the instructions contained in the two output segments also require some auxiliary code to help them start, such as calculating GOT, so crti.o and crtn.o were introduced to help implement the initialization function.
The two sections of crti.o are guaranteed to run at the beginning of the corresponding sections of the output file, and crtn.o at the end. Therefore, the input file sequence of the connector is generally:
ld crt1.o crti.o [user_object] [system_libraries] crtn.o
Since crt1.o does not contain the .init and .fini sections, it will not affect the final generated two The order of the segments.
The constructor and destructor of the global object in C are not placed directly in these two sections, but all calls to the constructor and destructor are placed inside.
In addition to global object construction and destruction, these two sections have other functions. For example, tools such as user monitoring programs and performance debugging are often used to perform some initialization and de-initialization. We can also use " attribute ((section(".init")))" to put the function into the .init section. But normally it will still destroy their structure, because its return instruction will cause _init to return early. Assembly instructions must be used.

7. Gcc platform related files
In addition to crt1.o crti.o crtn.o, when linking hello.c in Chapter 4, there are also
crtbeginT.o,
libgcc.a,
libgcc_eh.a, and
crtend.o
, which are all located in the gcc installation directory
/ usr/lib/gcc/i486-gnu/4.1.3/

crtbeginT.o and crtend.o are truly used to implement c++ global construction and destruction. Because glibc is just a C language runtime library, it does not understand the implementation of C++, and gcc is the real implementer of C++. These two cooperate with glibc to implement c++ global construction and destruction.
Since gcc supports multiple platforms, and some 32 platforms do not support 64-bit long long type operations, some auxiliary routines are needed. libgcc.a contains such functions, as well as floating point operations. And its dynamically linked version is named libgcc.so.
libgcc_eh.a contains exception handling platform-related functions supported by c++.

8.MSVC CRT looks more organized than glibc. MSVC provides multiple sub-versions according to different attributes, such as static/dynamic; single-threaded/multi-threaded; debug version/release version: pure c/support c++. Some of them can be combined, some of them can't.
The static version of CRT is located in /lib under the MSVC installation directory. Its naming rules are:
libc [p] [mt] [d] .lib.
The dynamic link version includes .lib for linking and .dll for runtime. They are named similarly to static versions, but include a version number.
By default, if you do not specify which CRT when compiling and linking, libcmt.lib will be selected by default.
MSVC provides additional corresponding C++ standard libraries, which only include the C++ part. When your program includes the header file of the C++ standard library, the MSVC compiler will save the corresponding C++ standard library link information in the .drectve of the target file.

9. Runtime library and multi-threading
For the standard library, thread-related parts do not belong to its content. But mainstream crts will have corresponding multi-threaded content. On the one hand, it is the multi-threaded operation interface, and on the other hand, the c runtime library itself must be able to run normally under multi-threads.
1) errno problem
2) printf/fprintf
3) malloc/free
In order to solve the problem of the C standard library under multi-threading, many compilers come with a multi-threaded version of the runtime library. Under MSVC, use the /MT and /MTd parameters to specify the use of a multi-threaded runtime library.

10. CRT improvement
: Use TLS
for errno, and the addresses returned by errno in different threads are different.
Lock
malloc and printf
to improve function calling methods.
Provide a thread-safe version of strtok, strtok_s
, but in many cases this method is not feasible. A better approach would be to not change any of the standard library prototypes, just improve their implementation.

11.TLS implementation.
Tls variable declaration
MSVC:__declspec(thread) int number;
GCC:__thread int number
Under Windows, the tls variable is placed in the ".tls" section. Every time a new thread is started, a piece of memory will be allocated in the heap and the ".tls" section will be allocated. The content in is copied to this space.
But for C++, it is not just as simple as copying. These objects need to be initialized and destructed one by one when the thread exits.
One of the 16 elements in the data directory (DATADIRECTORY) of the PE file stores the address and length of the tls table, which stores the constructor and destructor addresses of all tls variables. Windows starts or exits each time a thread is based on the tls table. The tls variable is constructed and destroyed at the same time. Tls tables are often located in the ".rdata" section.
So how does each thread access tls variables?
For each Windows thread, there will be a TEB (thread environment block), which saves the thread's stack address, thread ID, etc. One of the fields is the TLS array, and its offset in the TEB is 0x2C, and the segment pointed to by the FS register for each thread is the TEB, so the TLS array of a thread can be accessed through FS:[0x2C].
The tls array is generally 64 elements. The first element is the ".tls" section of the thread. Coupled with the offset of the variable in the ".tls" section, it is the address of the tls.
Display tls is used for applying, assigning and destroying tls variables through Windows API. Because of its many limitations, it is not used much now.
Under Windows, it is best to use _beginthread() and _endthread() provided by MSVC CRT for thread startup and exit. Using the Windows API will cause a memory leak because the _tiddata structure applied on the heap through some CRT functions such as strtok() or _beginthread() itself cannot be destroyed normally under static linking (this problem will occur when each thread starts during dynamic linking) /The DllMain of each dll that must be called when exiting is released).

12.Construction and destruction of C++ global objects.
Global construction and destruction of glibc.
As mentioned earlier, the entry point of glibc program _start, one of the parameters passed to __libc_start_main is __libc_csu_init: it calls
the _init() function, and this function is the .init of the executable file. By disassembling an executable file, we found that in its .init section, a function called __do_global_ctors_aux was called. It does not belong to glibc, but comes from a target file crtbegin.o provided by gcc. As mentioned earlier, some of the object files eventually linked by the linker come from gcc, and those are functions closely related to the language. View its source code:
All functions of an array of function pointers named __CTOR_LIST_ are called. Obviously what it saves is the constructor pointer of the global object.
Look back, when the compiler compiles each compilation unit (.cpp), it generates a special function, whose function is to initialize the global object of this file.
It not only calls the initialization function of each global object, but also registers a special function __tcf_1 with atexit.
After the compiler generates this function that initializes the global object of the current file, it places its pointer in the .ctor section of the target file.
In this way, the connector will merge the segments with the same name when linking these .o files, and .ctor will save the pointers of all global initialization functions. The .ctors sections of crtbegin.o and crtend.o are also linked before and after. crtbegin.o stores a four-word value and defines the starting address of this value as the symbol __CTOR_LIST. The connector is responsible for linking Fill in the number of global constructors. crtend.o just saves null and defines the symbol __CRT_END__.
destruct
In the early days of glibc destruction, almost the same method as the construction mentioned above was used. However, this must ensure that the order of global object construction and destruction is exactly the opposite, which increases the workload of the connector. Therefore, the process exit callback function is registered in the exit function in __cxa_atexit to implement destruction.
The mysterious function __tcf_1 mentioned earlier is the global destructor that is opposite to the global constructor. The order in which it calls the destructor is opposite to the order in which the global constructor calls the constructor.
Since the construction and destruction of global objects is completed by the runtime library, the "-nonstartfiles" or "-nostdlib" options cannot be used during construction. The
assemblers and linkers of some platforms do not support the .init and .ctor mechanisms. In order to implement main The code is executed before the function. When linking, the collect2 program will collect the special symbols of all .o files. These symbols indicate that they are global constructors or executed before main. collect2 saves the addresses of these symbols in an array and stores them in a temporary The .c file is compiled and linked with other .o files into the final output file.
On these platforms, the gcc compiler will generate a _main call before main, which is responsible for the functions collected by collect2. _main is part of the .o file provided by gcc. When using -nostdlib, you may get a _main definition error. In this case, you need to -lgcc links it.

13. Global construction and destruction of MSVC CRT.
There is a call to _initterm (__xc_a, __xc_z) in the entry function mainCRTStartup of MSVC, and the content of initterm is to call functions with these two pointers as the left and right boundaries. It can be seen that it looks exactly the same as __do_global_ctors_aux.
typedef void (__cdel *_PVFV)();
_CRTALLOC(".CRT KaTeX parse error: Expected group after '_' at position 13: XCA") _PVFV _̲_xc_a[]={NULL};… XCA",long,read)
above The code shows that it defines the .CRT section, XCA group in the object file, and saves the __xc_a function pointer array in this group of this section. When compiling, each
compilation unit will generate a file named .CRT $XCU group, and add its own global initialization function.
When linking, the segments will be merged, and the groups will be arranged in alphabetical order. In the end, these segments are often placed in the .rdata segment because they are read-only, and previously called The two pointers used by _initterm are one at the beginning and one at the end. Therefore, it is very different from glibc
. In terms of destruction, the process exit function is also registered with atexit in the global constructor. It is also similar to glibc.
14.
File IO fread->fread_s->_fread_nolock_s->_read->ReadFile
Except for the last one which is Windows API, the previous ones are all MSVC runtime libraries.
fread just calls fread_s
fread_s adds buffer overflow protection: locking
_fread_nolock_s loop reading, buffering
_read newline conversion
ReadFile Windows file reading API

Guess you like

Origin blog.csdn.net/weixin_45719581/article/details/123207828
Recommended