Linux system troubleshooting ideas and common cases

Linux system troubleshooting ideas and common cases

1. Linux system log and classified
kernel and system log.
This log data is managed by the system service syslog. According to the settings in its main configuration file "/etc/syslog.conf", it is decided to
record kernel messages and various system program messages to What position.

User log:
This log data is used to record the relevant information of Linux system user login and logout, including user name, login terminal, login time, source host, and process operation in use.

Program log:
Some application programs will choose to independently manage a log file (instead of handing it over to the syslog service management) to record various event information during the program's operation.

2. Interpretation of log files under Linux: The log files of the
Linux system itself and most server programs are placed in the directory /var/log by default.
/var/log/messages: Public log files, record Linux kernel messages and public log information of various applications, including startup, IO errors, network errors, program failures, etc.
For applications or services that do not use a separate log file, you can generally obtain related event record information from the file.

/var/log/cron: Record event messages generated by crond scheduled tasks.

/var/log/dmesg: Contains kernel ring buffer information (kernel ring buffer). When the system starts, a lot of hardware-related information will be displayed on the screen. The information recorded in this file is the information from the last startup.
And with the dmesg command, you can view the hardware-related information when the system is started, and the kernel buffer information.

/var/log/maillog: Records email activity entering or sending out the system.

/var/log/boot.log: Record software log information when the system is started.

/var/log/secure: Record event information during remote login and authentication of users.

/var/log/wtmp: Record all login and logout records of the system. You can execute the last command to view.
/var/log/btmp: Record the log information of the wrong login into the system, and you can execute the lastb command to view it.
/var/log/lastlog: Record the latest successful login events and the last unsuccessful login events. You can execute the lastlog command to view.

export LANG=en_US (Change the shell terminal to English.)

#lastlog (View recent login information)
#locale (View system encoding format)
2. Forgot Linux root password failure and solutions

1.
The probability of single-user mode is very high. However, it is also very simple to solve this problem under linux. Just restart the linux system and then boot into linux single-user mode (init 1),
due to single-user mode There is no need to enter a login password, so you can log in to the system directly and modify the root password to solve the problem.

Entering single user mode centos6.x is different from centos7.x.

Centos6.x enters the single-user mode:
there is a welcome interface when it is started, press the up, down, left and right buttons immediately. Enter the GRUB interface, and then press the "e" key on the keyboard. Enter the new interface. Select the "kernel..." line (kernel) and press the "E" key on the keyboard.
Enter the new interface, you can enter characters at this time. Type "single" and press Enter. Enter the "kernel..." interface again, continue to select this line, and press the "b" key. Start automatic boot and enter single user mode automatically.
You can modify the password at this time: #passwd root (representing the modification of the root user password). Enter the password as prompted, complete, reboot. Complete the password modification.
#cd /etc/grub.cfg (setting of centos6.9 boot file)

Centos7.x enters single-user mode to change the password:
In the boot interface at startup, select the "CentOS Linux..." kernel, press the "E" key on the keyboard to enter the new interface, and move the cursor to "Linux16 /..." . root=UUID=..." line, add content in this line, put the cursor at the end of this line,
...UTF-8 here, enter "init=/bin/sh" and continue to boot after adding it, Press Crt and x. Enter the shell terminal and start to reset the password. To make the root partition writable, enter: # mount -o remount,rw / (remount the root partition to make it readable and writable).
Change password # passwd root (change root user password), the modification is complete. Enter #cd /etc/selinux/ to view #more config. A common file in the root directory: # touch /.autorelabel Or change
the value of SELINUXTYPE in the config file under /etc/selinux to disabled. At this time, the password modification is completed and the system restarts. Use # exec /sbin/init (use this command to restart in centos7 single-user mode.) #cd /etc/grub2.cfg (setting of centos7
boot file)

Third, the system can not start failure cases
1. The root file system is damaged, causing the system to fail to start failure cases.
In this case, the file system structure is inconsistent due to abnormal power failure and abnormal shutdown. When this kind of problem occurs, when the system starts, the screen will display:
checking root filesystem
/dev/sdb5 contains a file system with errors, check forced
/dev/sdb5: UNEXPECTED INCONSISTENCY;RUN fsck MANUALLY
....
press enter for maintenance
(or type Control-D to continue):
give root password for maintenance

From this error, it can be seen that there is a problem with the root partition file system of the system, and the system cannot be automatically repaired when it is started, and then enters an interactive interface, prompting the user to repair the system.

Solution: Enter the system repair mode after entering the root password. In the repair mode, you can execute the fsck command, such as:
#fsck .ext4 -y /dev/sdb5 (If the file system is ext4, use .ext4 format to repair)

If there is a problem joining the boot partition, first uninstall the partition. # umount /boot
repair # fsck /dev/sda1 (/dev/sda1 is the path corresponding to the boot partition)

2. The /etc/fstab file is missing, causing the system to fail to start. The
/etc/fstab file stores information about the file system in the system. When Linux starts, the system will read this file and automatically mount each partition of Linux. If this file is misconfigured
or lost, it will cause the system to fail to start. The specific fault phenomenon occurs when the mount partition is detected: starting system logger. After that, the system starts and stops.

Solution:
Use Linux rescue mode to log in to the system to obtain partition and mount point information, and reconstruct the /etc/fstab file.

A general solution for the Linux system cannot start.
1. Enter single user mode or rescue mode (rescue), repair partition errors or backup data, and then repair or reinstall the system.
Enter the rescue mode: through the iso disk or U disk, select it to boot, and select troubleshooting. Enter the new interface, select Rescue a CentOS system to enter,
wait to enter the new interface, select according to the situation, continue, the first one is continue, enter 1. to the new interface, you can enter commands. #df -h. Enter #cd /mnt/sysimage/ and then enter #cd /etc.
#vi /etc/fstab.
If the fstab file is missing, the system cannot start normally.

4. "Read-only file system" errors and solutions

如现象: java.lang.RuntimeException:Cannot make directory:file:/www/data/html/2021-01-24

Idea: The server disk may be faulty (the disk space is full or the disk cannot be written)

Reason: There is a problem with the disk partition, which causes the file system structure to be inconsistent. The file system has the write function disabled, and the file system structure needs to be repaired:

#umont /www/data (First unmount the problematic partition)
#fsck -y /dev/sda7 (fix the hardware partition address corresponding to the problematic partition)
#mount /dev/sda7 /data1 (after the repair is complete, remount The partition)

Five, the problems caused by the su command to switch users

故障现象: su:warning:cannot change directory to /home/oracle: Permission denied

Solution:
user directory /home/oracle permission problem
su program execution permission problem
program dependent shared library permission problem
selinux problem causes
system and space problems

Cause: Root directory permission problem is caused, just modify the root directory permission. #chmod 555 /

#ldd /bin/ls (View the library files that the ls command depends on). The ldd command is used to view the library files that the system command depends on.
#more /etc/selinux/config (Check the selinux configuration file.)
#stat / (Check the permission information of the directory or file)

6. "Too many open files" error and solutions:
such as phenomenon: java.io.IOException: Too many openfiles
Idea: This case involves the use of ulimit commands under linux, ulimit is mainly used to limit the use of resources by processes Yes, it supports various types of restrictions.
ulimit -a
-a: Display all limit resource information of the current system.
-H: Set the hard resource limit. Once set, it cannot be increased.
-S: Set the soft resource limit, which can be increased after setting, but cannot exceed the hard resource setting.
-c: The size of the largest core file, in blocks.
-f: The maximum number of files a process can create, in blocks.
-d: The size of the largest data segment of the process, in kbytes.
-m: Maximum memory size, in kbytes.
-n: The maximum number of file descriptors that can be opened.
-s: Thread stack size, in Kbytes.
-P: The size of the pipeline buffer, in kbytes.
-U: The maximum number of processes available to the user.
-v: The maximum available virtual memory of the process, in kbytes.
-t: Maximum CPU usage time, in seconds.
-l: The maximum lockable memory size, in kbytes.

#ulimit -n 655360 (Modify the maximum number of file descriptions that can be opened to 655360) This modification is temporarily effective.
If you need to save it permanently, you need to enter the configuration in the /etc/security/limits.conf file, and set the system-level resource configuration here.

vi /etc/security/limits.conf

There is also a /etc/security/limits.d/20-nproc.conf file for processing the /etc/security/limits.conf file, which is in centos7.
The two files before and after are subject to the settings of the latter file.

Guess you like

Origin blog.51cto.com/12772149/2604584