Perfect debugging method and monitoring management

Debugging method (perfect debugging method and monitoring management)
1. Monitoring management (monitoring process)
1. Memory detection

detects the memory every 10 seconds. If the available memory is less than 15%, release the system cache, improve the memory utilization, and record Log.
Use sysinfo to get the memory size, or free command to get

struct sysinfo s_info;
sysinfo(&s_info);

System cache release

nret = system("sync;sync");//先同步,防止数据丢失
nret = system("echo 3 > /proc/sys/vm/drop_caches");
nret = system("echo 1 > /proc/sys/vm/overcommit_memory");
nret = system("free");

2. CPU detection
Check the CPU status every 10 minutes, and find that it exceeds 80% for 3 consecutive times
. The calculation formula of the log CPU is
first obtained from /proc/stat at the time t1 of the overall system user, nice, system, idle, iowait, irq , The value of softirq, get Total CPU time since boot (denoted as total1) and Total CPU idle time since boot (denoted as idle1) at this time.
Second, get the total CPU time since boot (denoted as total2) and Total CPU idle time since boot (denoted as idle2) of the system at t2 from /proc/stat. (The method is the same as the previous step)
Finally, calculate the total CPU usage of the system between t2 and t1. That is:
CPU percentage between t1 and t2 = ((total2-total1)-(idle2-idle1))/(total2-total1)* 100%

3. Detect whether the specified thread exists

ps -T|grep thread_name|grep -v grep|grep app$|wc -l

thread_name is the name of the specified thread, and the result is the number of thread thread_name calculated in the app process. If it is 1, it means it exists, otherwise, it does not exist.

4. Thread stuck state monitoring
process or thread switching is divided into voluntary switching (Voluntary) and forced switching (Involuntary). If the number of times of their switching remains the same, it means that the process or thread is stuck, so as to detect whether the thread is stuck. .

pid is the pid of the process. The following command can print out all the thread names of the process, the pid of the thread, the number of voluntary switching, and the number of forced switching.

for file in /proc/pid/task/* ;do oneline=`grep -w -E 'Name|Pid|voluntary_ctxt_switches|nonvoluntary_ctxt_switches' $file/status` ;echo $oneline|awk '{print $2,$4,$6,$8}';done

In addition, if a process's voluntary switching is in the majority, it means that its demand for CPU resources is not high. If the forced switching of a process accounts for the majority, it means that CPU resources may be a bottleneck for it. Here we need to rule out the situation where the process frequently calls sched_yield() to cause forced switching.

5. Service-related detection,
such as IPC service, can detect whether the vi interruption and venc frame rate are normal

2. Perfect debugging method
1. Background description
A complete project will have some bugs more or less, and some bugs are implicit, some are low probability, and are not easy to be discovered, so preventive measures are very important . Common bugs in programs include probabilistic program crashes or segfaults, program deadlocks, memory leaks, and file handle leaks.
(1) Probabilistic program crash or segmentation fault investigation
coredump positioning

stack backtrace

Cross stack traceback

GDB tool debugging

The above introduces four methods to deal with segfaults. The gdb tool is the best segfault debugging tool. It prints complete information, but it is only suitable for the debugging stage, not for the release stage. The stack backtrace is suitable for debugging and release stage, but the printing information is worse than gdb, and the line number cannot be printed. It is not applicable when the memory and flash are very small. The cross-stack backtrace is basically the same as the stack backtrace. It is better than the stack backtrace in that it can be applied to programs with relatively small flash and memory, and the line number cannot be printed.
(2) Prevention of memory leaks and file handle leaks
In development, malloc and free are often used to allocate and release heap memory, open, close or fopen, fclose are used to open and close files, and sockets and closes are used to create and close sockets. etc.
when these unpaired use, can cause problems of memory leaks, which may cause the device to reboot or crash abnormal phenomenon.
Here we can use the method of secondary encapsulation to prevent memory leaks. Take malloc and free as examples. The main performance is to record the number of times malloc and free are called in different places to determine whether malloc and free are paired. If they are not paired, you can Help locate which place is not paired to use, so as to achieve the effect of memory leak location.
Prevention of memory leaks and file handle leaks

(3) Deadlock troubleshooting in the program. If a
process or thread is deadlocked, it will get stuck. As mentioned earlier, the thread stuck state can be monitored to determine which thread is stuck and where it is stuck. You can switch to the corresponding thread by debugging GDB online. If you then print the stack information, you can locate the thread stuck. s position.
GDB tool debugging

3. Standardized management of
debugging information There are generally two types of debugging information, one is controllable and the other is uncontrollable. Uncontrollable debugging information is generally used in the debugging phase, like printf, which is directly printed. Controllable debugging information is generally used in the maintenance phase. A switch is used to control whether the information needs to be printed, so as to avoid all debugging information from being printed, causing the trouble of unchanged viewing and difficult selection of key information.
1. The printing of debugging information should be convenient and the detailed
printf package can add the corresponding file name, function name, and line number to facilitate debugging and positioning.

#define DH(fmt, args...)  
printf("%s-%s-%d:" fmt,  __FILE__,__FUNCTION__,__LINE__, ## args)

2. Divide and control the printing information according to modules. The debugging information of each module does not affect
the control of debugging information. There are two ways to control the debugging information. One is the general control, and the general control controls the printing of information of all modules. The other is module control, each module has its own control switch for debugging information.

3. Important debugging information needs to be saved to disk, such as restart log, error log, reminder log, etc.

Guess you like

Origin blog.csdn.net/weixin_40732273/article/details/109261376