[Reprint] On the Linux process model

On the Linux process model

EDITORIAL

  • Process foundation

    • Process concept
    • Process descriptor
    • Process Creation
    • Context switching
    • init process
  • Application Process

    • Interprocess communication
    • Signal Processing
    • Background processes and daemons
    • On nginx multi-process model
  • Common tools introduced

    • ps: view the process properties
    • lsof: View case files opened
    • netstat: View network connections
    • strace: View system call case

Process foundation

Basic concepts

The process is one of the basic concepts of the operating system, which is the basic unit of entities operating systems to allocate resources, but also the execution of the program. Program is a collection of code and data in itself is a static concept, but the process is a solid program execution, is a dynamic concept.

That the Linux operating system, is a process of how to describe it?

Process descriptor

To manage the process, the kernel needs of each process and the properties need to do, be a clear description of this process is the role of the descriptor, Linux in the process descriptor of task_structidentity.

task_structThe data structure is quite complex, it includes not only the field of process attributes, but also includes links to other data structure pointer. The following general structure:

  • state: description of process status
  • Basic information about processes: thread_info
  • mm:  mm_structpoint memory area pointer descriptor
  • tty:  tty_structterminal associated descriptor
  • fs:  fs_structthe current directory
  • files:  files_structa pointer to a file descriptor
  • signal:  signal_structsignal received description
  • A lot more. .

In summary, the process descriptor complete preservation of the data in the property and life cycle of a process, status and behavior of a complex data structures task_structto represent.

Process Creation

Linux creates a process, the process has gone as follows:

    1. Initialization process descriptor
    1. Applying a respective memory area
    1. Setting process state, adding scheduling queue and so on

For a complete description of the process, operating system design very complex data structures, also applied for a lot of memory space. But thanks to copy-on-write technology, these initialization operations, and did not create a significant reduction in the speed of the process.

Copy-on-write technology: When a new process (child process) is created, Linux kernel does not immediately copy the contents of the parent process to the child process, but only when the process space of the content changes, before the implementation of the copy operation. Replication technology allows parent and child to read the same physical page write, as long as both have an attempt to change the page content, the kernel will copy the contents of this page to a new physical page, and the page and give this piece of writing process .

There are three Linux system calls the process can create clone (), fork (), vfork ()

  • clone (): the process of creating the most basic system calls, you can specify the basic properties of the child process (by the identification of various FLAG), stack, and so on.
  • fork (): () is achieved by clone, which stack points to the stack of the parent process, the parent and child share the same so a user mode stack. fork child process requires full copy of the memory space of the parent process, but replication technology, this process is actually quite fast when writing thanks.
  • vfork (): also based clone () to achieve, is to optimize the fork () history, because fork () need to copy the parent process memory space, and fork (after) often perform execve () to load another program come in, when writing before copying technology, this unnecessary copy is the price is relatively high. So vfork () when implemented will indicate flag tells clone () virtual shared memory space of the parent process to speed up the creation process.

Context switching

Concept: After the process is created, the kernel must be able to suspend the process of being executed by the CPU, and switch on the CPU to perform other processes. This process is referred to as the process of switching, task switching or context switching.

This process includes hardware and software context switch context switch.

Hardware context switching: far jmp instruction primarily by assembling operation, a process descriptor pointer, a pointer to another descriptor replacing process, and change eip, cs, esp other registers to change the flow of execution of the program.

Software context switching:

    1. Switching memory address, switch pages global directory, install a new address space.
    1. Kernel mode switch stack.

Process switch occurs in the schedule()function, the kernel provides a mark of need_resched to indicate the need for a re-run scheduled. When a process is preempted or higher priority process into the executable state, the kernel will set this flag. What, the kernel will check this flag to reschedule the program do? That is switched from user mode to kernel mode, or returns from the interrupt.

Execution system call, the user will experience switching state and kernel mode and interrupt return. In other words, each execution of system calls, such as fork, read, write, etc., may trigger a new kernel scheduling process.

init process

Linux process is based on a tree structure of the organization, each process has a unique process ID, referred to as PID. 1 PID is often the init process, it is relative to the normal process, there are three special features:

  • There is no default signal processing, so if you send a signal to the init process, then it will be ignored, unless the display of the registered signal. If you are familiar docker students, will be observed docker of course, if you press ctrl-c reaction is nothing, because docker of the process they have separate pid namespace, the first hit of a new process, pid 1 It is not going to bother kill signal signal.

  • If a process exits, it also has the presence of the child, known as orphan process, these orphaned child process will again become the init process, turn the init process to manage these sub-processes, including recycling exit status from the process table remove and so on.

  • If the init process kneeling, then all the user process will be exited.

The process is similar to the orphan zombie process, method to clean up zombie process, it is constantly kill zombies on the parent process, then these will be called orphaned zombie process, taken over by the init process, recycle.

Application Process

Interprocess communication

We know that it comes to communication, communicating parties must be present in the medium which can carry information, for communication between computers, this medium may be twisted wire pairs, optical fibers, electromagnetic waves. And that for inter-process communication it? This media is, what does? In Linux, to meet the conditions of this medium, which can be:

  • The operating system provides a memory medium, such as shared memory, pipes, semaphores.
  • Media file provided by the file system, such as UNIX domain sockets, files, etc.
  • LAN media network equipment, such as socket socket.
  • and many more.

For the media for the operating system, commonly used

  • Semaphore mechanism
  • Anonymous pipe (only parent and child) and named pipes
  • SysV and POSIX
    • message queue
    • Shared memory
  • and many more

Advantages and disadvantages of introduction:

  • Semaphore: complex message is not transmitted, synchronization can only be used
  • Anonymous pipes: limited capacity slower, only the parent and child can communicate
  • Named pipes: can any interprocess communication, but a slower speed.
  • Message Queuing: system capacity is limited, there are characteristics queue FIFO.
  • Shared Memory: speed, capacity size can be controlled, but requires synchronization.

Their usage is relatively simple, you can consult the documentation required when using shared memory is a relatively common practice.

Signal Processing

Signal was first was introduced in Unix systems, it is mainly used for inter-process communication, and the process can take the initiative to register a signal handler to detect or respond to an event occurred in the system. For example, when a process accesses an illegal address space, the process will receive the operating system sends SIGSEGV signal processing mode is in default: This process will exit and the stack dump out, referred to as the core.

The main purpose of the whole signal:

  • Let the process know that a particular event has occurred.
  • Forced process of dealing with this particular event.

Linux currently supports signal has a default handler, can be found in the man pages, the shots are as follows:

The more common signals, explained as follows:

  • SIGCHLD: a process through  fork function to create, when it ends, it will send SIGCHLD signal to the parent process.
  • SIGHUP: hold signal, when the control terminal is detected, the control process or death. For example, when the user exits the shell terminal, all the processes of the shell starts, will receive this signal, the default is to terminate the process.
  • SIGINT: When the user presses a combination of Ctrl + C keys, the terminal sends this signal to the process, the default is to terminate the process.
  • SIGKILL: Commonly used kill -9 command will send the signal, unconditionally terminate the process, the signal can not be ignored.
  • SIGSEGV: the process of accessing an illegal memory address, the default behavior is to terminate the process and produce a core stack.
  • SIGTERM: program end signal, the signal can be blocked and ignored, usually identifies the program exits normally.
  • SIGSTOP: stop the implementation process, the signal can not be ignored, the default action is to suspend the process.
  • SIGPIPE: When writing to a closed end of the pipe connection socket or write data continuously cause a SIGPIPE initiator write SIGPIPE signal will set errno EPIPE. In TCP communication, when one of the communicating parties close a connection, if the other party and then send the data, in accordance with the TCP protocol, it receives a RST packet in response, if the server is then further transmitting data, the system issued a SIGPIPE signal to the process, the process of telling this connection has been disconnected, and can not write data.

In fact, in project development, often dealing and signal processing. For example, in an elegant exit handler, generally it needs to capture SIGINT, SIGPIPE, SIGTERM signal, etc., at a reasonable release resources to deal with the remaining links, programs to prevent accidental crash, caused some problems.

Background processes and daemons

When in contact with a Linux system, often encounter a background process and daemon, here a brief look at these two processes.

  • Background process: Typically, the process is placed in the foreground and occupy the current shell, before the end of the process, the user can not do other operations through the shell. For those who do not interact with the process, you can start it in the background, which is a plus when you start  &, then during this process is running, we can still operate other commands via shell. But when the shell exits, the daemon will exit .

  • Daemon: If a later stage of the process is always the way to start, and can not be influenced by shell exit and quit, then you can transform it as a daemon. Follow-up process is a long-running background process system, such as mysqld, nginx and other common service process.

So both so what difference does it make?

  • Daemon has been completely out of the terminal, while the background process is not completely out of the terminal, that is a background process still can be output to the terminal.
  • When the terminal is closed, a background process receives a signal to exit, but the daemon will not.

For example, through ./spider &the implementation of crawling tasks in the background, but before long, the terminal is automatically disconnected, causing spider process abort.

Prior to learn more about the daemon, you also need to understand some concept of a session and process group.

  • Process group: the process by a series of interrelated, identified by PGID, typically process group process PID. There is a process group is to facilitate the implementation of a unified operations on multiple related processes, such as sending a signal to all processes amount reunification process group.
  • Session: a plurality of process groups, each group belonging to a process of a session, a session corresponding to a control terminal, which is a process of session groups common to all processes, wherein only the foreground process group can interact with the terminal.

Then how to achieve a daemon?

  1. Running in the background: fork a child process A, the current process exits, the child process retains A.
  2. Out of control terminal: The aim is to get rid of the terminal, by setsid()re-setting a new session for the child A.
  3. A ban child process to re-open the terminal: A process as set up after the new session, the process group leader is, it is the ability to re-apply to open a control terminal. B by fork child process again, and exit processes A, B is no longer a process group leader, can not open a new terminal.
  4. Close an open file descriptors, change the working directory, and so on.
  5. SIGCHILD signal processing: Due to the daemon are generally long-running process, when a child process, need to be addressed SIGCHILD signal is sent when the child process exits, otherwise the child will become a zombie process, so as to occupy system resources.

In conclusion, the daemon is a long running processes in the background, which is out of the control terminal, without affecting the user terminal exit. You can nohupoperate, will become a daemon process execution. For example nohup ./spider &, even after this terminal is disconnected, spider process will continue.

On nginx multi-process model

nginx Web server is a high performance, due to its excellent performance, mature community, well documented by the majority of developers love and support. Its high performance and its architecture is inseparable, nginx frame as shown below:

Nginx is a classic multi-process model, after it started way daemon running in the background, a background process includes a master process and multiple worker processes. Which is equivalent to the master process control process, has the following functions:

  • Execution instruction receiving external signals, including loading configuration, send instructions to the worker, and the like elegant exit.
  • Maintenance worker status of the process, when the worker process exits, automatically starts a new worker.

Where the master process supports signal processing as follows:

  • TERM, INT: Quick Exit
  • QUIT: elegant exit
  • HUP: configuration changes, start with the new configuration worker, elegant and so close the older worker.
  • USR1: Reopen the log file
  • USR2: Upgrade Binaries (nginx upgrade)
  • WINCH: elegant worker process exit

Single worker process also supports signal processing, including:

  • TERM, INT: Quick Exit
  • QUIT: elegant exit
  • USR1: Reopen the log file
  • WINCH: terminal debugging, etc.

Each worker process to handle the request based on the asynchronous non-blocking mode, such non-blocking mode, greatly improves the speed worker process to handle the request. To improve performance as much as possible, nginx worker processes for each set CPU affinity, try to bind the worker processes to perform on the specified CPU, to reduce overhead context switch brings. Because of this pattern tied nucleus is generally recommended number of worker processes, the number-core CPU.

nginx uses a master <-> This multi-process model worker, what good is it?

  • Between worker processes rarely shared resources in dealing with their request, almost never be locked, eliminating the need for overhead lock brings.
  • Inter-worker process exceptions do not affect each other, hang up after a process, other processes are still working, you can improve the stability of the service.
  • Possible use of multi-core properties, to maximize the utilization of system resources.

Common tools introduced

Linux built a number of tools for troubleshooting system problems, and view resource usage, here are a few simple tools and processes related.

View the basic properties of the process: ps

Common parameters are as follows:

  • ps aux: See basic information about all processes
  • ps -p $ pid: View the specified process pid
  • ps -fp $ pid: process information about printing than the whole
  • Custom printing process information: For example ps -C nginx -o pid,ppid,rsz,vsz,pcpu: print pid nginx process, ppid, existential, virtual memory, cpu.
  • ps axjf: see the process tree: use pstree -p $pidmore intuitive
  • ps -T -p $pidOr ps -Lf $pid: the process threads View
  • and many more

In addition to information obtained by the process of ps, also through / proc file system to see basic information about the process:

  • / Proc / $ pid / cmdline: process command line arguments
  • / Proc / $ pid / cwd: the current working directory
  • / Proc / $ pid / environ: the environment variable value
  • / Proc / $ pid / exe: soft binary program execution.
  • / Proc / $ pid / fd: contains all file descriptors.
  • / Proc / $ pid / maps: memory mapped, including binary and lib files.
  • / Proc / $ pid / mem: memory processes
  • / Proc / $ pid / stat: Process Status
  • and many more

lsof: View case files opened by processes

There are two scenarios:

  • Scene One: the machine will not stop the growth of a file size, time and time again lead to full disk space, if you want to blame this time the process of writing a file is found, it should be how to do it?
  • Scene II: Discovery disk is almost full, delete some large files via rm -f, but did not significantly reduce disk space, this time should be how to do it?

For these scenarios, we can use the lsof command,

  • For a scene is: You can view the file is opened which process, the process to find the culprit, and then deal with them.
  • For Scene Two: if the file is opened by another process, by rm -f is not really delete a file, you also need to kill the process of opening the file in order to close the file descriptor, then the file can really be cleaned up.

Lsof common usage is as follows:

  • View a specific list of files opened by the user:lsof -u xxx
  • View a list of files specific ports open:lsof -i 8080
  • View a list of files in a specific range of ports open:lsof -i :1-1024
  • Based on TCP UDP or view the list of open files:lsof -i udp
  • View a specific process opens the file list:lsof -p $pid
  • Check the list of processes to open a specific file:lsof -t $file_name
  • Check the list of processes to open a specific directory:lsof +D $file_path
  • and many more

netstat: View network connections

netstat is a useful tool for monitoring TCP / IP network, it can display the routing table, network connections, the network interface device status and other information. Information type of the output is determined by the first parameter:

  • (None): By default, netstat displays a list of socket open.
  • -route, -r: displays the kernel routing table, the same output and -e.
  • -group, -g: Displays multicast group membership IPv4 and IPv6 identity.
  • -interfaces, -i: a network interface status display.
  • -statistics, -s: display statistics for each protocol.

Here are the use of various usage scenarios:

  • Show only numeric address: netstat -n
  • Show only tcp link: netstat -t
  • Show only udp link: netstat -u
  • Show only monitor socket link: netstat -l
  • Show the process name and PID: netstat -p

strace: View system call case

strace to trace system calls and signals received during the process of implementation. In Linux, the process can not access the hardware directly, when access to hardware devices, need to switch to kernel mode mode to access the hardware device via system calls.

strace can track the process to produce a system call, including parameters, return values, execution time consumed. The output of each row, the left is the system call function names and arguments, followed by return of a call. Usage is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
-c count time performed for each system call, the number and frequency of errors. 
-d strace output to standard error debug information about.
-f tracking child process calls generated by the fork.
-FF if provided -o filename then track the results of all processes in the output to the appropriate filename.pid, pid is the process ID of each process.
-F tries to track vfork called when the -f, vfork not being followed..
-h output a brief help message.
- entry pointer i output system calls.
-q disables the output from the message on.
-R & lt ,, printed on the relative time of each system call.
-t each row before adding the output time information.
-tt in the output each time plus information microsecond. forward
-ttt microsecond output, representing the time in seconds.
-T display time for each invocation consumption.
-v output of all system calls. Some calls on environmental variables , status, input and output calls due to the frequent use, the default is not output.
version information -V strace output.
-x output in hexadecimal form nonstandard string
-xx all strings output in hexadecimal form.
- a column
set the output value of the position is returned. Believes 40. The
-e expr
specify an expression that controls how the track format is as follows:.
[!] [Qualifier =] value1 [, value2] ...
qualifier only trace, abbrev, verbose, raw, signal, read, write .value where one is used to define the symbols or numbers qualifier is the default trace exclamation sign is negative, for example,:...
-eopen equivalent to - e trace = open, indicates that only track open call while -etrace! = open call represents another track in addition to the open. there are two special symbols all and none.
Note that some shell use! to execute commands in the history, so to use \\.
-e = the SET the trace
only the specified tracking system calls such as:.. -e trace = open, close, rean, write indicates that only four tracking system to call the default = All the SET.
-e the trace = file
only trace about file system calls.
-e the trace = process
only about the process control system to track calls.
-e the trace = network
tracking system for all network-related calls.
-e strace = signal
tracks all system-related signals system call
-e trace = ipc
keep track of all the processes related to the communication system calls
-e abbrev = set
set strace output result set of system calls.-v, etc. and abbrev = none. = All default is abbrev.
-e RAW = set
the parameters specified in hexadecimal system calls Shown.
-E Signal = SET
The specified tracking system defaults to all signals as signal = SIGIO (or signal = io!), Showing no trace SIGIO signals.!.
-E SET = Read
outputs the read out data from a specified file, for example:.
-E = Read 3,5
-e = SET write
output data written to the specified file.
-o filename
will be written to an output file strace filename
-p PID
tracking process specified PID.
-s the strsize
maximum size is specified output. the default is 32. the file name has been full output.
-u username
with username UID and GID execute commands tracked

When the server card immediately, you can call to see a specific process through the implementation of system calls strace:

1
strace -c -tt -o ./server.log -p 26844

Output is as follows:

1
2
3
4
5
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.170843 2512 68 epoll_wait
------ ----------- ----------- --------- --------- ----------------
100.00 0.170843 68 total

Guess you like

Origin www.cnblogs.com/jinanxiaolaohu/p/12551507.html