Operation and maintenance interview questions with answers

1. How to realize that the node access log of Nginx proxy records the client's IP instead of the proxy's IP?

 在nginx代理文件中怎加一行配置文件:proxy_set_header  X-Real-IP $remote_addr;

2. Kernel: nf_conntrack: table full, dropping packet appears in the /var/log/messages log. What is the reason?

Caused? How to solve?

This is the error message of iptables "The connection tracking table is full and packet loss started". Then I think about changing the memcached connection to a short connection on the website. Because iptables will record the tracking information of each connection, the connection is closed and closed too frequently. The connection tracking table is full and packet loss occurs.

Solution:

First, change the connection method of memcached to a long link, and then modify nf_conntrack. There are mainly the following ways:

1. Turn off the firewall

2. Increase the size of the iptables tracking table and adjust the corresponding system parameters

3. Use a bare table without adding tracking marks

4. Delete the connection tracking module

3. In the nginx php environment of the linux system, it is found that the PHP-FPM process occupies a high CPU. What are the possible reasons and
how to solve it?

1. Process tracking

top //Find the process PID with high CPU usage

strace -p PID //Tracking process

ll /proc/PID/fd //Check which files the process is processing

There will be suspicious PHP code modification, such as: file_get_contents does not set a timeout period.

Two, memory allocation

If the process tracking cannot find the problem, and then find the cause from the system, is it possible that the memory is not enough? It is said that a relatively clean PHP-CGI opens about 20M-30M of memory, depending on how many PHP modules are opened.

View the memory usage of the PHP-CGI process through the pmap command

pmap $(pgrep php-cgi |head -1)

According to the output result, combined with the memory size of the system, configure the number of PHP-CGI processes (max_children).

Three, monitoring

Finally, you can also ensure the normal operation of the service through monitoring and automatic recovery scripts. Here are some scripts I used:

As long as the memory occupied by a php-cgi process exceeds %1, kill it

#!/bin/sh

PIDS=ps aux|grep php-cgi|grep -v grep|awk’{if($4>=1)print $2}’

for PID in $PIDS

do

echo date +%F….%T>>/data/logs/phpkill.log

echo $PID >> /data/logs/phpkill.log

kill -9 $PID

done

Detect the php-fpm process

#!/bin/bash

netstat -tnlp | grep “php-cgi” >> /dev/null #2&> /data/logs/php_fasle.log

if [ “$?” -eq “1” ];then #&& [ netstat -tnlp | grep 9000 | awk '{ print $4}' | awk -F ":" '{print $2}' -eq “1” ];then

/usr/local/webserver/php/sbin/php-fpm start

echo date +%F….%T “System memory OOM.Kill php-cgi. php-fpm service start. ” >> /data/logs/php_monitor.log

be

Detect php execution through http

#!/bin/bash

status=curl -s –head “http://127.0.0.1:8080/chk.php” | awk ‘/HTTP/ {print $2}’

if [ $status != “200” -a $status != “304” ]; then

/usr/local/webserver/php/sbin/php-fpm restart

echo date +%F….%T “php-fpm service restart” >> /data/logs/php_monitor.log

be

4. One master with multiple slaves, the master library is down, how to switch to the slave library, and how to deal with other slave libraries?

1. Make sure that all relay logs have been updated, and execute stop slave io_thread; show processlist on each slave library; until you see Has read all relay log, it means that the slave library update has been executed.

2. Log in to all slave libraries, check the master.info file, and compare and choose the one with the largest pos as the new master library.

3. Log in to 192.168.1.102, execute stop slave; and enter the database directory, delete the master.info and relay-log.info files, configure the my.cnf file, open log-bin, if there is log-slaves-updates and read-only To comment out, execute reset master

4. Create a user for synchronization and authorize the slave, the same as the fifth step

5. Log in to another slave library and execute stop slave to stop synchronization

6. Connect to the new main library according to the seventh step

7. Execute start slave;

8. Modify the new master data to test whether the slave is updated synchronously

5. Misoperation of the drop statement leads to data destruction. Please provide recovery ideas and practical steps.

thought:

法1:  1、通过防火墙禁止web等应用向主库写数据或者锁表,让数据库停止更新。



              ##检查全备及binlog日志 ;



       2、将全备恢复;



          mysqlbinlog -d databasename mysql-bin.000014 > bin.sql



       3、将所有binlog汇总,转成sql语句,剔除drop语句,恢复数据;



          mysql -uroot -p123456 databasename < bin.sql



          (注意数据的备份,不要破坏原始数据)



       4、后续:(数据无法写入)所以无需恢复。



          5、如果是update语句(也需要停止访问)



法2:1、如果主库持续有数据写入;



     2、停止一个从库;然后在主库刷新binlog;



      3、把mysql-bin.000014恢复成bin.sql(去掉drop语句);



      4、把全备数据sql及操作前的增量bin.sql恢复到从库。



      5、停止主库;把主库刷新后的binlog解析为sql恢复到从库;



      5、切换为从库提供服务;



      #法2可能会有主键冲突等其它的问题,可以通过修改id或者延迟解决,尽量使用法1停库解决;



     #平时工作要注意数据库的权限管理及流程管理,防患于未然。

6. Please give a practical example in production. The website opens slowly due to the slow database.

High database load, slow query, do joint index case

The database load is high, there are slow queries, web logs are analyzed, there may be crawlers, and its ip is blocked

  1. Killing the database brutally and rudely through kill -9 leads to database startup failures. Provide troubleshooting methods or experience.

8. IDC computer room bandwidth suddenly increased from 100M to 400M, please analyze the problem and solve it.

. Really suffered from DDOS attacks (I have encountered it several times, but the impact is rare, and there are cases of hackers extorting).
b. The internal server is poisoned and a large amount of outgoing traffic (the old boy receives the police more than 5 times for this problem)
c. The website elements (such as pictures) are stolen and are promoted on the portal page, resulting in a large amount of traffic (more than 3 alarms received)
d. Cooperation The company comes to capture data, such as: API data interface is provided to the cooperative unit (friends of the cooperative company understand this)
e. After purchasing the CDN business, the CDN grabs the source station (this is also a lot of times).

In order to follow me for self-study, some apprentices are about to interview and find a job, and some of the interview answers are compiled as follows: I want to see the document, contact me, want to watch the video, and share the interview format. You can follow me 51cto college and search for the address of teacher Zhang Kai https://edu.51cto.com/sd /9529c
Insert picture description here
Insert picture description here
Insert picture description here

Guess you like

Origin blog.csdn.net/qq_39418469/article/details/114777112