Linux是各种服务器甚至各种基础设施的关键载体。对于Linux的维护者或者说使用者,快速检测其故障原因至关重要。
一、检测硬件相关信息
首先我们要检测硬件的相关信息,排除硬件故障才可以进一步去检测程序运行错误。
可以使用lsblk,lscpu来输出硬件信息,这里我们使用lsblk来举例
lmh@ubuntu:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
loop0 7:0 0 44.9M 1 loop /snap/gtk-common-themes/1440
loop1 7:1 0 14.8M 1 loop /snap/gnome-characters/399
loop2 7:2 0 91.3M 1 loop /snap/core/8592
loop3 7:3 0 54.7M 1 loop /snap/core18/1668
loop4 7:4 0 3.7M 1 loop /snap/gnome-system-monitor/127
loop5 7:5 0 4.2M 1 loop /snap/gnome-calculator/544
loop6 7:6 0 91.4M 1 loop /snap/core/8689
loop7 7:7 0 14.8M 1 loop /snap/gnome-characters/296
loop8 7:8 0 3.7M 1 loop /snap/gnome-system-monitor/100
loop9 7:9 0 1008K 1 loop /snap/gnome-logs/61
loop10 7:10 0 160.2M 1 loop /snap/gnome-3-28-1804/116
loop11 7:11 0 42.8M 1 loop /snap/gtk-common-themes/1313
loop12 7:12 0 956K 1 loop /snap/gnome-logs/81
loop13 7:13 0 149.9M 1 loop /snap/gnome-3-28-1804/67
loop14 7:14 0 54.4M 1 loop /snap/core18/1066
loop15 7:15 0 4M 1 loop /snap/gnome-calculator/406
sda 8:0 0 70G 0 disk
└─sda1 8:1 0 70G 0 part /
sr0 11:0 1 2G 0 rom /media/lmh/Ubuntu 18.04.3 LTS amd641
sr1 11:1 1 2G 0 rom /media/lmh/Ubuntu 18.04.3 LTS amd64
一般这时候我们就可以查看到相关硬件错误。
二、从日志中发现错误和警告
Linux系统在运行时会储存日常运行的日志,我们可以通过日志来分析错误原因。使用dmesg | more可以查看日志中的报错和警告
lmh@ubuntu:~$ dmesg | more
[ 0.000000] Linux version 5.3.0-40-generic (buildd@lcy01-amd64-024) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #32~18.04.1-Ubuntu SMP Mon Feb 3 14:05:59 UTC 2020 (Ubuntu 5.3.0-40.32~18.04.1-ge
neric 5.3.18)
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.3.0-40-generic root=UUID=e7ca2622-528b-400f-9b21-ac56ff834cd2 ro find_preseed=/preseed.cfg auto noprompt priority=critical locale=en_US quiet
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Hygon HygonGenuine
[ 0.000000] Centaur CentaurHauls
[ 0.000000] zhaoxin Shanghai
[ 0.000000] Disabled fast string operations
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[ 0.000000] BIOS-provided physical RAM map:
三、分析网络正常与否
Linux作为以网络为中心的系统,分析其网络连接正常与否也是我们一大检查点。可以使用ip addr、dig、ping等来分析网络情况。我们使用ping localhost来分析网络
lmh@ubuntu:~$ ping localhost
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.033 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.029 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.037 ms
64 bytes from localhost (127.0.0.1): icmp_seq=4 ttl=64 time=0.035 ms
64 bytes from localhost (127.0.0.1): icmp_seq=5 ttl=64 time=0.032 ms
64 bytes from localhost (127.0.0.1): icmp_seq=6 ttl=64 time=0.035 ms
64 bytes from localhost (127.0.0.1): icmp_seq=7 ttl=64 time=0.038 ms