Detailed explanation of strace command

table of Contents

1. What is strace?

2. What can strace do?

3. How to use strace?

4. Strace problem positioning case

4.1. The positioning process exits abnormally

4.2, locate shared memory exception

4.3, performance analysis

5. Summary


1. What is strace?

According to the description of strace official website, strace is a Linux user space tracer that can be used for diagnosis, debugging and teaching. We use it to monitor the interaction between user space processes and the kernel, such as system calls, signal transmission, process state changes, etc.

The bottom layer of strace uses the ptrace feature of the kernel to realize its functions.

In the daily work of operation and maintenance, fault handling and problem diagnosis are the main content and necessary skills. As a dynamic tracking tool, strace can help operation and maintenance efficiently locate process and service failures. It is like a detective, telling you the truth of the anomaly through the clues of the system call.

2. What can strace do?

Operation and maintenance engineers are all practitioners, let's take an example first.

We copied a software package called some_server from another machine, and we just started it directly without changing anything. But when I tried to start it, an error was reported and I couldn't get up at all!

Start command:

./some_server ../conf/some_server.conf

Output:

FATAL: InitLogFile failed iRet: -1!
Init error: -1655

Why can't I get up? From the log, it seems that the initialization of the log file failed. What is the truth? Let's take a look at strace.

strace -tt -f  ./some_server ../conf/some_server.conf

We noticed that in the line before the output of the InitLogFile failed error, there is an open system call:

23:14:24.448034 open("/usr/local/apps/some_server/log//server_agent.log", O_RDWR|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = -1 ENOENT (No such file or directory)

It tries to open the file /usr/local/apps/some_server/log//server_agent.log to write (create if it doesn't exist), but it makes an error, the return code is -1, and the system error number errorno is ENOENT. Check the man page of the open system call:

man 2 open

Search for the explanation of the error number errno of ENOENT

ENOENT O_CREAT  is not set and the named file does not exist.  Or, a directory component in pathname does not exist or is a dangling symbolic link.

It is clearer here, because the open option in our example specifies the O_CREAT option, where errno is ENOENT because a certain part of the log path does not exist or is an invalid symbolic link. Let's look at which part of the path does not exist level by level:

ls -l /usr/local/apps/some_server/log
ls: cannot access /usr/local/apps/some_server/log: No such file or directory
ls -l /usr/local/apps/some_server
total 8
drwxr-xr-x 2 root users 4096 May 14 23:13 bin
drwxr-xr-x 2 root users 4096 May 14 22:48 conf

It turns out that the log subdirectory does not exist! The upper-level directories all exist. After the log subdirectory is manually created, the service can be started normally.

Looking back, what can strace do?

It can open the black box of the application process and tell you what the process is probably doing through the clues of the system call.

3. How to use strace?

Since strace is used to track system calls and signals of user space processes, before entering the topic of strace, we first understand what system calls are.

About system calls:

According to the explanation in Wikipedia, in a computer, a system call (English: system call), also known as a system call, refers to a program running in the user space requesting a service that requires higher authority to run from the operating system kernel.

The system call provides the interface between the user program and the operating system. The process space of the operating system is divided into user space and kernel space:

  • The operating system kernel runs directly on the hardware and provides functions such as device management, memory management, and task scheduling.

  • The user space completes its functions by requesting services in the kernel space through APIs-these APIs provided by the kernel to the user space are system calls.

On Linux systems, application codes use system calls indirectly through functions encapsulated by the glibc library.

The Linux kernel currently has more than 300 system calls. A detailed list can be viewed through the syscalls man page. These system calls are mainly divided into several categories:

File and device access classes such as open/close/read/write/chmod, etc. 
Process management class fork/clone/execve/exit/getpid, etc. 
Signal class signal/sigaction/kill, etc. 
Memory management brk/mmap/mlock and other 
inter-process communication IPC shmget /semget * Semaphore, shared memory, message queue, etc. 
Network communication socket/connect/sendto/sendmsg, etc. 
Others

Familiar with Linux system call/system programming can make us handy when using strace. However, for the problem location of operation and maintenance, knowing the tool of strace and checking the system call manual is almost enough.

Students who want to know more about it are recommended to read "Linux System Programming", "Unix Environment Advanced Programming" and other books.

Let's go back to the use of strace. strace has two operating modes.

One is to start the process to be traced through it. The usage is very simple, just add strace before the original command. For example, if we want to track the execution of the command "ls -lh /var/log/messages", we can do this:

strace ls -lh /var/log/messages

Another mode of operation is to track the process that is already running, and understand what it is doing without interrupting the execution of the process. In this case, pass the -p pid option to strace.

For example, if there is a running some_server service, the first step is to check the pid:

pidof some_server                      
17553

Get its pid 17553 and then use strace to track its execution:

strace -p 17553

When the tracking is complete, press ctrl + C to end strace.

Strace has some options to adjust its behavior. Here we introduce a few of the more commonly used ones, and then explain its practical application effects through examples.

Common options for strace:

From an example command:

strace -tt -T -v -f -e trace=file -o /data/log/strace.log -s 1024 -p 23489

-tt In front of each line of output, display the time in milliseconds
-T display the time spent in each system call
-v For some related calls, type out the complete environment variables, file stat structure, etc.
-f track the target process, and all subprocesses created by the target process
-e control the events and tracking behaviors to be tracked, such as specifying the name of the system call to be tracked
-o write the output of strace to the specified file separately
-s when the system call When one of the parameters is a string, the content of the specified length is output at most, and the default is 32 bytes
-p specifies the process pid to be tracked. To track multiple pids at the same time, repeat the -p option multiple times.

4. Strace problem positioning case

4.1. The positioning process exits abnormally

Problem: There is a resident script called run.sh on the machine that will die after running for one minute. Need to find out the cause of death.

Positioning: When the process is still running, get its pid through the ps command, assuming that the pid we get is 24298

strace -o strace.log -tt -p 24298

Looking at strace.log, we see the following in the last 2 lines:

22:47:42.803937 wait4(-1,  <unfinished ...>
22:47:43.228422 +++ killed by SIGKILL +++

It can be seen here that the process was killed by other processes with the KILL signal.

In fact, through analysis, we found that other services on the machine have a monitoring script that monitors a process also called run.sh. When it finds that the number of run.sh processes is greater than 2, it will be killed and restarted. As a result, our run.sh script was killed by mistake.

When the process is killed and exits, strace will output killed by SIGX (SIGX represents the signal sent to the process), etc. Then, what will be output when the process exits by itself?

Here is a program called test_exit, the code is as follows:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
       exit(1);
}

Let's see what traces can be seen on strace when it exits.

strace -tt -e trace=process -f ./test_exit

Explanation: -e trace=process means only trace system calls related to process management.

Output:

23:07:24.672849 execve("./test_exit", ["./test_exit"], [/* 35 vars */]) = 0
23:07:24.674665 arch_prctl(ARCH_SET_FS, 0x7f1c0eca7740) = 0
23:07:24.675108 exit_group(1)           = ?
23:07:24.675259 +++ exited with 1 +++

It can be seen that when the process exits by itself (calling the exit function or returning from the main function), the final call is the exit_group system call, and strace will output exited with X (X is the exit code).

Some people may be wondering, the code clearly calls exit, how can it be displayed as exit_group?

This is because the exit function here is not a system call, but a function provided by the glibc library. The call of the exit function will eventually be converted into an exit_group system call, which will exit all threads of the current process. In fact, there is a system call called _exit() (note the underscore before exit), which will eventually be called when the thread exits.

4.2, locate shared memory exception

An error was reported when a service started:

shmget 267264 30097568: Invalid argument
Can not get shm...exit!

The error log probably tells us that there is an error in obtaining shared memory. Look at it through strace:

strace -tt -f -e trace=ipc ./a_mon_svr     ../conf/a_mon_svr.conf

Output:

22:46:36.351798 shmget(0x5feb, 12000, 0666) = 0
22:46:36.351939 shmat(0, 0, 0)          = ?
Process 21406 attached
22:46:36.355439 shmget(0x41400, 30097568, 0666) = -1 EINVAL (Invalid argument)
shmget 267264 30097568: Invalid argument
Can not get shm...exit!

Here, we use the -e trace=ipc option to make strace only trace system calls related to process communication.

From the strace output, we know that the shmget system call is wrong, and errno is EINVAL. Similarly, look up the shmget manual page and search for the explanation of the EINVAL error code:

EINVAL A new segment was to be created and size < SHMMIN or size > SHMMAX, or no new segment was to be created, a segment with given key existed, but size is greater than the size of that segment

Under translation, the reason for shmget setting EINVAL error code is one of the following:

  • The shared memory segment to be created is smaller than SHMMIN (usually 1 byte)

  • The shared memory segment to be created is larger than SHMMAX (kernel parameter kernel.shmmax configuration)

  • The shared memory segment of the specified key already exists, and its size is different from the value passed when shmget is called.

From the strace output, the shared memory key we want to connect to is 0x41400, and the specified size is 30097568 bytes, which obviously does not match the first and second cases. Only the third case remains. Use ipcs to see if there is really a size mismatch:

ipcs  -m | grep 41400
key        shmid      owner      perms      bytes      nattch     status    
0x00041400 1015822    root       666        30095516   1

As you can see, the key 0x41400 already exists, and its size is 30095516 bytes, which does not match the 30097568 in our call parameter, so this error occurred.

In our case, the reason for the inconsistent shared memory size is that one of the programs is compiled to 32 bits and the other is compiled to 64 bits. The code uses the variable-length int data type of long.

Compiling both programs to 64 solves this problem.

In particular, here is the -e trace option of strace.

To trace a specific system call, -e trace=xxx is enough. But sometimes we need to track a type of system call, such as all calls related to file names, all calls related to memory allocation.

If you manually enter each specific system call name, it may be easy to miss. So strace provides several types of commonly used system call combination names.

-e trace=file Trace calls related to file access (the file name is in the parameter)
-e trace=process Calls related to process management, such as fork/exec/exit_group
-e trace=network Calls related to network communication, such as socket/sendto/connect
-e trace=signal is related to signal sending and processing, such as kill/sigaction
-e trace=desc is related to file descriptors, such as write/read/select/epoll, etc.
-e trace=ipc process is related to classmates, Such as shmget etc.

In most cases, we can use the above combination name. When you really need to track specific system calls, you may need to pay attention to the differences in C library implementation.

For example, we know that the fork system call is used to create the process, but in glibc, the fork call actually maps to the lower-level clone system call. When using strace, you have to specify -e trace=clone, and specify -e trace=fork to match nothing.

4.3, performance analysis

If there is a requirement, count the number of lines of code in the Linux 4.5.4 kernel (including assembly and C code). Here are two Shell script implementations:

poor_script.sh:

#!/bin/bash
total_line=0
while read filename; do
   line=$(wc -l $filename | awk ‘{print $1}’)
   (( total_line += line ))
done < <( find linux-4.5.4 -type f  ( -iname ‘.c’ -o -iname ‘.h’ -o -iname ‘*.S’ ) )
echo “total line: $total_line”

good_script.sh:

#!/bin/bash
find linux-4.5.4 -type f  ( -iname ‘.c’ -o -iname ‘.h’ -o -iname ‘*.S’ ) -print0 \
| wc -l —files0-from - | tail -n 1

The purpose of the two pieces of code is the same. We use the -c option of strace to separately count the two versions of the system call and the time spent (use -f to count the status of the child process at the same time)

As can be seen from the two outputs, good_script.sh only takes 2 seconds to get the result: lines 19,613,114. Most of its calls (calls) overhead is file operations (read/open/write/close), etc. The counting of lines of code originally does these things.

Poor_script.sh took 539 seconds to complete the same task. Most of its call overhead is in process and memory management (wait4/mmap/getpid...).

In fact, from the number of clone system calls in the two figures, we can see that good_script.sh only needs to start 3 processes, while poor_script.sh actually started 126335 processes to complete the entire task!

The cost of process creation and destruction is quite high, and the performance is not bad.

5. Summary

When a process or service is found to be abnormal, we can use strace to track its system calls, "see what it is doing", and then find the cause of the abnormality. Familiar with common system calls, able to understand and use strace better.

Of course, the omnipotent strace is not really omnipotent. When the target process is stuck in user mode, strace has no output.

At this time we need other tracking methods, such as gdb/perf/SystemTap, etc.

Remarks:

1. Perf reason kernel support

2. ftrace kernel supports programmable

3. The systemtap function is powerful, RedHat system support, user mode, kernel mode logic can be explored, and the scope of use is wider

Guess you like

Origin blog.csdn.net/wangquan1992/article/details/108384648