Watchdog implementation on Linux system

Watchdog implementation on Linux system


In order to meet the requirements of " high availability ", people designed "watchdog", commonly known as "watchdog".

"Watchdog" can be a hardware circuit or a software timer in implementation, which can automatically restart the system when the system fails.

Hardware
Search "watchdog card" and "watchdog card", you can find relevant information, the common ones are PCI interface and USB interface, which are small in size.

Software
There is a lot of related software used to do "watchdogs".

Linux comes with a watchdog implementation for monitoring the operation of the system, including a kernel watchdog module and a user space watchdog program.

The kernel watchdog module communicates with user space through the character device /dev/watchdog. Once the user space program opens the /dev/watchdog device, it will cause a 1-minute timer to start in the kernel. After that, the user space program needs to ensure that data is written to this device within 1 minute, and each write operation will cause a restart. Set a timer. If the user space program does not write within 1 minute, the expiration of the timer will cause a system reboot.

Userspace programs can stop timers in the kernel by closing /dev/watchdog.

User space watchdog daemon:
In user space, there is also a daemon called watchdog, which can periodically detect the system, including:

* Is the process table full?
* Is there enough free memory?
* Are some files accessible?
* Have some files changed within a given interval?
* Is the average work load too high?
* Has a file table overflow occurred?
* Is a process still running? The process is specified by a pid file.
* Do some IP addresses answer to ping?
* Do network interfaces receive traffic?
* Is the temperature too high? (Temperature data not always available.)
* Execute a user defined command to do arbitrary tests.

If a test fails, it may cause a soft reboot (simulating the execution of a shutdown command), which can also trigger the running of the kernel watchdog via /dev/watchdog.

The main difference between the kernel-level "watchdog" and the user-space "watchdog" is that the kernel-level "watchdog" has strong anti-interference ability and stable operation.


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325161898&siteId=291194637