Linux C system programming (06) process management process environment

1 Process startup and exit

1.1 Process

Program start-> Program loading & address allocation-> Program exit

@ 1 Program start: For binary files:

  1. If the file is in the address specified in the PATH environment variable under the usr / bin or / bin folder, just enter the name of the binary file directly
  2. If it is not under the PATH environment variable, the way of using the program path and program name (for example, ./hello) is also acceptable.
  3. If you want to directly enter the binary name to execute the file if it is not under the PATH environment variable, add the program path to the PATH environment variable and move the binary file to the directory of the PATH environment variable.

@ 2 Program loading, address allocation:

The simple process of loading is as follows:

  1. Read enough header information from the target file to find out how much address space is needed.
  2. Allocate the address space. If the format of the target code has independent segments, divide the address space into independent segments.
  3. Read the program into a segment of the address space.
  4. Fill the bss segment space at the end of the program with 0 (if the virtual memory system department does this automatically).
  5. Create a stack (if the architecture requires it).
  6. Set operating information, such as program parameters, environment variables, etc.
  7. Start running the program, find main from the _start entry, and start executing the program sequentially.

@ 3 Program exit:

There are three ways to exit:

  1. The process quits voluntarily. This is reflected in the exit function and the return function. When exiting, you need to reclaim the resources allocated by the process (such as address space, file descriptors, etc.), and the operating system will treat each resource afterwards.    
  2. The process received a signal to exit. This situation is very common, and it is often the termination process of the parent process on its child process. This operation is actually the parent process sends a termination signal to the child process, the child process will also voluntarily exit after receiving the signal.
  3. The process exited after performing an operation that caused an exception. Both of the above cases exited as expected by the program, and the abnormal operation exited without the program being prepared. At this time, the operating system reclaims its resources, but may not deal with the aftermath of these resources. The exception is actually a special signal sent to the process, but it is not the process but the operating system itself that sends the signal.

1.2 Process termination processing function

In the Linux environment, it is allowed to call some user-defined functions when the process exits. These functions are called termination processing functions. Linux stipulates that up to 32 such process termination processing functions can be set. Under Linux, use the atexit function to set the process termination processing function. Prototype of atexit function:

#include <stdlib.h>
int atexit(void (*function)(void));

See the linux function reference manual for details . note:

  1. Successful execution of the function returns 0, and failure returns a non-zero value. (Note that if the atexit function fails, it does not return -1)
  2. The calling sequence of the process termination processing function is reversed when it is set. (Last call ends first, similar to the stack structure)         
  3. In fact, the process termination function is an auxiliary operation performed at the end of the process.

2 Linux process memory management

2.1 Big endian and little endian

The general PC uses a little-endian structure, while the server generally uses a big-endian structure. This difference in data storage is not caused by the operating system, big-endian and little-endian are reflected in the CPU architecture. Generally, when programming it, first determine whether it is big endian or little endian, and then operate it.

Little endian: high and low values ​​store high bits, low addresses store low bits. Big endian: the opposite of little endian.

2.2 Code section, data section and buffer section

@ 1 Code segment: Generally, write operations are not allowed, and the attribute is read-only. A program does not need to change the code segment in most cases, except for one case, which is to upgrade the program. For the server, it is necessary to complete the replacement of part of the code segment without stopping the program. In the past, the code segment was generally written and replaced directly, but the risk was also great. Currently, this problem is generally solved by using a shared library.

@ 2 Data segment:

  1. Initialization data section (.data): Contains global variables and static variables that are clearly given initial values ​​in the program.
  2. Block storage segment (.bss): The data stored in this segment is usually global variables and static variables that do not have a clear initial value.

The content in the @ 3 bss section is not part of the program file, that is, it is not included in the binary file, but is stored on the external memory. The system only marks some information of the bss section in the memory (initialization variable size , Attributes, etc.); in order to find the content in the bss section when running the program. If the global variable / static variable itself has a given value, and this value is 0 / NULL, the compiler will write its content to the bss section, not the data section.

2.3 Stack and Heap

There are 3 ways to store automatic variables:

  1. bss section: static local variables
  2. In the register: register variable
  3. Stack: General automatic variables

The most common mistake in programming is to return a pointer to a local variable as the return value of the function. Since the content pointed to by the pointer is still on the stack frame, the function simply returns its address. Therefore, if the stack frame is overwritten by another function, the value of the memory area pointed to by the returned pointer will be invalid.

The heap space is generally the memory space for storing user applications, and the operation on the heap is often malloc. The location of the stack and the heap are often relative, but the specific allocation depends on the processor's storage structure, and the difference between big endian and little endian is similar.

2.4 Constant storage

For a simple constant, it is stored in the code segment because the length of the simple variable is fixed. This can speed up the speed of fetching instructions, and can also improve the efficiency of the program. But that is a complex constant such as a string, and its length is indefinite. If the string is stored in the code segment, the code segment will be very large, and it is not conducive to the processor to read the code into the buffer processing, which greatly affects the program Execution efficiency. So in the end a separate segment is stored to store the string.

2.5 Dynamic memory management

The system uses the mem_control_block structure to manage all allocated memory blocks. The structure is as follows:

    struct mem_control_block{
      int is_available;     //该块是否可用
      int size;               //块的大小
    }

Through this structure, the malloc function can be implemented simply. The whole process is as follows:

The malloc function first adds the number of bytes to be allocated by the user to the size of a "memory control block" to obtain the actual number of bytes that need to be allocated.

  1. Then iterate through all the memory blocks in the heap in sequence, if the block is available and greater than the actual number of bytes required, then the first address of the memory block is returned and the block is set to be available, otherwise try the next memory block.
  2. If all memory blocks do not satisfy the condition, the sbrk function is called (if the sbrk function fails, there is no available memory in the system, and the malloc function returns NULL), and a block of memory is allocated through the operating system. The malloc function expands this memory in the heap, which is equivalent to a heap growth.
  3. Skip the "memory control structure" of this block of memory and reset the last address of the last block of memory.

Some notes about the free function: The main job of the free function is to set the memory control block to be available. When the malloc function is called next time, the memory block can be allocated as an allocable block. Therefore, after calling the free function, the content of the memory block will not disappear immediately, but that is because this content is no longer protected by the operating system, so the effective time is also random.


3 shell environment

Both the command line parameters and environment variables are obtained from the parent process, and they are obtained in different ways.
Command line parameters are transferred to the new process as parameters of the main function, and environment variables are used by the new process as a global variable.

3.1 Command line parameters and applications

argc:命令行参数的个数;
argv:指向参数的各个指针所构成的数组;

Argv [0] here represents the entire path name of the executable program, not just the file name of the executable program. (To get the file name through the path name, corresponding character processing is required); argv [argc] must be NULL.

3.2 Environment variables

Each program will have an environment variable table. Like the command line parameters, the environment variable table is also an array of pointers. Include the corresponding header file, write extern char ** environ in the program; by reading environ [i] and loop until environ [i] = NULL, you can get the environment variable table, that is, the value of each environment variable.

Note: It is meaningless to modify environment variables in this process, because it will not affect other processes.

The prototypes of setting, obtaining and deleting environment variables are as follows:

#include <stdlib.h>
char *getenv(const char *name);//获取环境变量,成功则返回环境变量的值,失败则返回NULL。
int put(char* str);            //将 name==value的字符串放进环境表,如果原来有值则覆盖。
int setenv(const char *name, const char *value, int overwrite);//设置环境变量,这里第3个参数rewrite的值为0则:不修改原来的值;非0值则:修改原来的值。
int unsetenv(const char *name);//删除一个环境变量的值,成功返回0,失败返回-1。
int clearenv();         //此函数会将整个environ这个指针置为NULL,成功返回0,失败返回-1。

See the linux function reference manual for details . The above functions that operate on these environment variables only affect their own processes and child processes, and have no effect on the parent process.

3.3 Get process end status

$? It is a built-in variable in the linux shell, which holds the return value of the most recently run program. There are 3 situations:

  1. The main function in the program ends, and the return value of the main function is saved in $ ?.
  2. Call the exit function to end the operation while the program is running, $? Save the parameters of the exit function.
  3. The program exits abnormally, and the error number of the abnormal error is saved in $ ?.

note:

  1. If the program runs incorrectly, the value in the $? Built-in variable is 1. So when writing code, if there is no problem with the code, do not return 1 (exit (1) or return (1)). So as not to cause unnecessary confusion.
  2. If the main function does not return a specified value, then the value in $? Is not random, remember!
  3. Since the value of the variable built into $? In the linux shell is actually the value of the eax register after the process ends (only under the X86 architecture), it is seen that the Linux system uses eax to save each function in this architecture return value. This value is different for different systems.

3.4 Debugging programs with errno

There are several ways to debug a program:

  • Use debugger
  • Use the output function directly in the program to output debugging information
  • View standard error file
  • Logs written when the program is abnormal

Some errors will occur when executing system calls under Linux. It is not enough to check the return value of these system calls. Developers often need more detailed information. The C language provides a global variable errno. When using it, the header file <errno.h> is added. This global variable makes up for the shortcomings of insufficient return value information.
If errno is 0, there is no error. If an error occurs, the error number is output. When using it, it must be cleared to 0 first, because it is a global variable.

3.5 Causes of output errors

errno is just an integer value, you must look up the table to know, to find errors more easily, you can use two functions, these two functions provide error number to information conversion: strerror and perror. Prototype of strerror function:

#include <string.h>
char *strerror(int errnum);

See the linux function reference manual for details . Prototype of perror function:

#include <stdio.h>
void perror(const char *s);

See the linux function reference manual for details .

Note: Do not add '\ n' to this string, the system will automatically add it. The advantage of this is that one less parameter can be passed; the disadvantage is that perror is unbuffered, and it is a function with side effects. Its function is to output the cause of the error of a system function closest to the function call.    

4 Global jump

The goto statement is a statement that can only be jumped within the function, that is, such a jump is local, and for a global jump, the goto statement is powerless. To make a global jump, you need that kind of global jump statement.

Under linux, use setjump function and longjump function to achieve global jump. The idea of ​​this jump is to first set a jump point and save the current function call stack frame. When the program performs a global jump and returns to the jump point, the stack frame of the report is used to overwrite the existing stack frame, thereby realizing the restoration of the function stack frame. Under linux, use the jmp_buf structure to save the current stack frame, and then restore the stack frame in the structure when jumping. Under Linux, use the setjmp function to set a global jump point. The function prototype is as follows:

#include <setjmp.h>
int setjmp(jmp_buf env);

See the linux function reference manual for details . Under Linux, it seems to use the longjmp function to perform a global jump. The function prototype is as follows:

#include <setjmp.h>
void longjmp(jmp_buf env, int val);

See the linux function reference manual for details . With global jumps, the structure of the program is better controlled and the code becomes compact. Using global jump is a relatively advanced application. Global jump requires the assistance of the operating system, while local jump does not. Local jumps are implemented at the language level, and note that goto is just a keyword in C.

Published 289 original articles · praised 47 · 30,000+ views

Guess you like

Origin blog.csdn.net/vviccc/article/details/105152197