Troubleshooting Linux system operation and maintenance

1. Ideas
1. Handling problem requirements
2. General ideas

2. Specific problems
1. Network problems
(1) Network failure
(2) The network is very slow
2. Hardware problems
3. Operating system problems
(1) The system cannot be started normally
(2) The system runs slowly or crashes
4. Service or program problems
5 .other



1. Ideas
1. Requirements for dealing with problems: clear ideas, clear problems, quick solutions, and long-term accumulation to form one's own set of "reflex arcs" to solve problems

2. General ideas:
(1) Pay attention to error information: locate the basic direction of basic problems
(2) Query log files: error information is sometimes just the appearance of the problem. If you want to understand the problem more specifically, you generally need to check the corresponding logs, such as system log files. (/var/log), application log file
(3) Analysis, locating problems: Combining error information and log files and the corresponding environment (code, system disk, system memory, running status of each process, etc.) or own and others experience Positioning the problem
(4) Solving the problem: If you find the problem, you can basically solve the problem

System log
http://c.biancheng.net/cpp/html/2783.html
https://www.cnblogs.com/yingsong/p/6022181.html

 

2. Specific examples
1. Network problems
What is the network problem, is it blocked, or is it slow?

  1). If the network fails, you need to locate the specific problem. Generally, try to eliminate the impossible fault and finally locate the root cause of the problem. Generally need to see

    whether to connect to the link

    Whether the corresponding network card is enabled

    Whether the local network is connected

    DNS failure

    Can route to the target host

    Whether the remote port is open

  2). If the network speed is slow, there are generally the following ways to locate the source of the problem:

    Is DNS the source of the problem

    See which nodes are bottlenecks in the routing process

    View bandwidth usage
Source: http://www.cnblogs.com/Security-Darren/p/4700387.html

2. Check /var/log/dmesg for hardware problems
or use the dmesg command
http://www.askmaclean.com/archives/%E5%9C%A8linux%E4%B8%8A%E5%88%86%E6%9E %90%E7%A1%AC%E4%BB%B6%E6%A3%80%E6%B5%8B%E6%97%A5%E5%BF%97.html

3. Problems with the operating system
1) The system cannot be started normally:
(1) The file system is damaged, for example, the linux root partition file is damaged (usually caused by a sudden power failure or illegal shutdown of the system)
(2) The file system is improperly configured, such as /etc/inittab and /etc/fstab files are incorrectly configured or lost, resulting in system errors (usually human configuration errors)
inittab file
https://www.linux178.com/linux/inittab.html
http://blog.51cto. com/leejia/788895

fstab file:
http://blog.itpub.net/26723566/viewspace-753700/

(3) Kernel files are missing or crashed (caused by kernel upgrade or a bug in the kernel)
https://www.jianshu.com/p/e1f550ba164d
Kernel upgrade http://seanlook.com/2014/10/24/upgrade-centos6_kernel- to-3.10.x/
(4) There is a problem with the system boot program, such as grub is lost or damaged (manually modified configuration errors or file system failures)

Boot Loader is a small program that runs before the operating system kernel runs. Through this small program, we can initialize the hardware device and establish the mapping map of the memory space, so as to bring the software and hardware environment of the system to a suitable state, so as to prepare everything for the final call to the operating system kernel. There are several types of Boot Loaders, among which Grub and Lilo are common Loaders. The system reads the grub configuration information (usually menu.lst or grub.lst) in the memory, and starts different operating systems according to this configuration information.

Replenish:

The system guides the overall process:

 


Refer to https://blog.csdn.net/zhaodedong/article/details/47711499
http://www.runoob.com/linux/linux-system-boot.html

The Linux system cannot start the fault solutionhttp://www.voidcn.com/article/p-wnhepalc-gs.htmlThe following related articles

4 Best Linux Bootloaders https://linux.cn/article-7788-1.html#3_515

2)
The reason why the system is running very slowly or the system is crashing is that the
CPU usage, memory usage and IO usage are relatively high
https://www.linuxidc.com/Linux/2011-10/44274.htm

Generally, the reasons that cause the Linux system to crash are:
system hardware problems (SCSI card, motherboard, RAID card, HBA card, network card, hard disk, etc.)
peripheral hardware problems (network, etc.)
software problems (system, application software)
Driver bug (find new driver)
core system bug (check in LKML, or replace the core and try again)
system settings (restore to default state, turn off firewall, etc.)

Supplement:
linux performance optimization value cpu, memory, IO optimization https://blog.csdn.net/ZYC88888/article/details/79027944
linux performance tuning guide https://legacy.gitbook.com/book/lihz1990/transoflptg/ details (pdf file inside)
Linux performance and tuning guide: process management http://blog.jobbole.com/105135/

4. Service or program problems
Common exceptions in redis client https://blog.csdn.net/li396864285/article/details/76951278
Nginx common errors and solutions http://blog.51cto.com/riverxyz/1961151
nginx Common exceptions https://www.jianshu.com/p/e72f2ea12eae
nginx or other websites report error 502 error Summary of common problemshttp:
//www.21yunwei.com/archives/3724 Mysql common problems such as insufficient connections, deadlock, SQL Statement is too slow
https://mp.weixin.qq.com/s/rvfRzGe2GB1OkQ_zQVQElA
http://www.ttlsa.com/mysql/mysql-common-error-analysis-and-solution-methods/

5.
A summary of 33 skills for troubleshooting and handling common Linux operation and maintenance problems https://mp.weixin.qq.com/s/hLaVQC3FPChGoEnrBtUJ3Q
6 typical Linux operation and maintenance problems, Daniel's analysis and solutions are here https:/ /mp.weixin.qq.com/s/4oZqkcs8LQ-_X6SmsRe-yw
To be a Linux operation and maintenance manager from scratch, you must manage 23 details https://mp.weixin.qq.com/s/24lNkVbO419G6gX52Xr7bQ

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324725039&siteId=291194637