Notes for Linux operation and maintenance engineers?

1. Online operation specification

1. Test use ---> learn the use of Linux, from the basics to the service to the cluster are all done in the virtual machine (snapshot of the virtual machine), when I get the operating authority of the server, the leader gave me the root password, because only I can use putty, so I want to use xshell, so I quietly log in to the server and try to change to Xshell+ key login, because there is no test, and there is no SSH connection left, so after restarting the SSHD server, I am blocked from the server, fortunately At that time, I backed up the sshd_config file, so let the staff in the computer room cp it.

2. The second example is about file synchronization. Everyone knows that rsync syncs very quickly, but it deletes files much faster than rm -rf. There is a command in rsync to synchronize a file based on a certain directory (if The first directory is empty, the result can be imagined) the source directory (with data) will be deleted. At the beginning, I wrote the directory backward because of misoperation and lack of testing. The key is that there is no backup data.. .Production environment data has been deleted ----- the importance of backup

3. Confirm again and again before Enter ---> Regarding the error of rm -rf/var, I believe that people with quick hands, or when the network speed is relatively slow, the probability of occurrence is quite high. When you find that the execution is completed, it will be numb. Operation and maintenance accidents are caused by carelessness!

4. Do not operate with multiple people ---> Operation and maintenance management is quite chaotic. The operation and maintenance who have left for several jobs have server root passwords. Usually, our operation and maintenance will conduct a simple check when we receive a task m. If it cannot be solved, ask others for help. When the problem is operated by multiple people, the result is not ideal! For example: debug a server ---> modify the server configuration file.

5. Back up first and then operate --> develop a habit, when you want to modify the data, back up first, such as the configuration file of .conf. In addition, when modifying the configuration file, it is recommended to comment the original option, and then copy and modify.

2. Involved data

5. Use rm -rf with caution ---> There are many examples on the Internet, various rm -rf /, various deletions of the main database, and various operation and maintenance accidents.

6Backup is greater than everything ---> Scheduled backup of data

7. Stability is greater than everything ---> not just data, but availability in the entire server environment, not the fastest, so it has not been tested, do not use new software after the server hangs --> restart and it will be fine , Or change x!

8. Confidentiality is above all -> data confidentiality, various router backdoors

3. Involving safety

9.SSH .

1. Change the default port (of course, if the professional wants to hack you, it will come out after scanning)

2. Prohibit root login

3. Use ordinary users + key authentication + sudo rules + ip address + user restrictions

4. Use explosion-proof cracking software similar to hstdeny (more than a few attempts to block directly)

   Filter users logged in in /etc/password.

10. Firewall

The firewall production environment must be opened, and must follow the minimum principle, drop all, and then release the required service ports.

11. Fine permissions and control granularity 

    The services that can be started by ordinary users absolutely do not need root, and the various service permissions are controlled to the minimum, and the control granularity should be fine.

12. Intrusion detection and log monitoring

   Use third-party software to monitor the changes of key system files and various service configuration files at all times, such as: /etc/passwd, /etc/my.cnf, /etc/httpd/con/httpd.con, etc.

Use a centralized log monitoring system to monitor alarm and error logs such as /var/log/secure, /etc/log/message, ftp upload and download files, etc.

In addition, for port scanning, you can also use some third-party software, and if it is found to be scanned, it will be directly pulled into host.deny. This information is very helpful for troubleshooting after the system has been invaded. -----> Improve system security.

4. Daily monitoring

13. System operation monitoring---->Large companies generally have professional 24-hour monitoring operation and maintenance. System operation monitoring generally includes hardware occupancy rate, common ones include memory, hard disk, CPU, network card, OS including login monitoring, system key files monitor.

Regular monitoring can predict the probability of hardware damage and bring useful functions to tuning.

14. Service operation monitoring 

Service monitoring generally refers to various applications, web, DB, LVS, etc., which generally monitor some indicators. When a performance bottleneck occurs in the system, it can be quickly analyzed and resolved.

15. Log monitoring

The log monitoring here is similar to the security log monitoring, but here are generally hardware, OS application storage and alarm information.

Monitoring does not play a big role when the system is running stably, but once a problem occurs, you have not done monitoring and cannot locate the problem, so you are very passive.

5. Performance tuning

16. In-depth understanding of the operating mechanism

I will update before optimizing the software. Before optimizing the software, for example, you need to have a deep understanding of the operating mechanism of a software, such as nginx and apache. Everyone says that nginx is fast, so you must find out why nginx is fast, what principle it uses, and how to process requests better than apache. If necessary, you must be able to see Understand the source code, otherwise all documents with parameters as the object of tuning are nonsense.

17. Tuning framework and sequence

   If you are familiar with the underlying operating mechanism, you must have a debugging framework and sequence. For example, if there is a bottleneck in the database, many people directly change the configuration file of the database. My suggestion is to analyze according to the bottleneck first, check the log, and write out the tuning direction , and then start, the first should be the hardware and operating system, the current database server is released after various tests.

 Applicable to all operating systems, should not start with him.

18. Only adjust one parameter at a time

    Everyone knows that only one parameter is adjusted at a time. If you adjust too much, you will get lost.

19. Benchmarking

    To judge whether debugging is useful, and to test the stability and performance of a new version of software, benchmark testing is necessary, and testing involves many factors.

  Whether the test is close to the real needs of the business depends on the experience of the tester. My teacher once said that there is no one-size-fits-all parameter, any parameter change and any tuning must meet the business scenario, so don’t Google any more tuning No, it has no long-term effect on your promotion and the improvement of the business environment.

Six operation and maintenance mentality

20. Control your mind

   A lot of rm -rf /data are a few minutes before work (try to avoid dealing with key data environments when you are upset, the more stressed you are, the calmer you are) Most people have the experience of rm -rf /data/mysql, and found that after deleting But there is no backup? For MySQL, after deleting the physical file, some tables will still exist in memory, so disconnect the business, but do not close the MySQL database, which is very helpful for recovery, and use dd to copy the hard disk, and then restore , Of course, most of the time you can only find a data recovery company.

   Just imagine, the data is deleted, you do various operations, close the database, and then repair it, not only may the file be overwritten, but the table in the memory will not be found.

21. Take responsibility for your data

   The production environment is not a child's play, and the database is not a child's play either. You must be responsible for the data, and the consequences of not backing up are serious

22. Go back to the source

    The index also needs to be cured? Reasons for inexplicable damage to the database table: one is a Myisam bug, the other is a mysql bug, and the third is that mysql is killed during the writing process. Finally, it is found that the memory is not enough, which leads to OOM killing the mysqlId process and there is no swap Partitioning, background monitoring memory is sufficient, and finally upgrade the physical memory to solve it.

23. Test and production environments

    Be sure to look at the machine you are on before important operations, and try to avoid opening more windows.

      

     

Guess you like

Origin blog.csdn.net/qq_45635347/article/details/131583868