It is said that Murphy’s Law of Operations and Maintenance has ensured that millions of operation and maintenance people will keep their jobs!

Preface

Murphy's Law of Operations and Maintenance, please read the following content every day to reflect on yourself.

  1. Nothing is as simple as it seems
  2. Everything takes longer than you expect
  3. What can go wrong will always go wrong.
  4. If you worry something will happen, it's more likely to happen
  5. If it works the first time, you've obviously done something wrong
  6. When everything is going in one direction, it's best to take a deep look in the opposite direction
  7. Problems that disappear automatically will come back automatically
  8. If everyone has similar ideas, it is obvious that no one is thinking seriously.
  9. A good start may not necessarily lead to good results; a bad start may lead to worse results.
  10. You must always assume that your assumptions are invalid
  11. Education cannot acquire intelligence

We will not talk about specific technologies and processes here, but we will discuss how to reduce human-made accidents, avoid unknown risks, and develop practical processes. One thing that leaders often say at work is that "operation and maintenance is no small matter." A small operational error may cause huge losses. What operation and maintenance personnel need to do is to be careful, careful, and more careful, more careful, more careful, and more careful.

As an operation and maintenance personnel, reputation is the foundation of the company. It is really not easy for operation and maintenance to make achievements in the company. In the face of sudden failures, technical support from various departments, as well as huge expenditures on server fees, it is necessary to It is indeed not easy to find some highlights in your work.

Run your own name as a brand. As an operation and maintenance personnel, we encounter countless troubles in our daily work, and there are many departments that need to communicate. How you manage yourself in the company is very important. Only by having a good reputation and highlighting your own importance will you be in an invincible position in the company and have rising capital.

Therefore, it is not only important to specialize in technology, but it is also very important to communicate and work. Sometimes we solve the problem, but we do not communicate well, and it does not translate into results in the end. Sometimes we encounter problems that cannot be solved, but if we communicate well, we will eventually get recognition from others. We must ensure that attitude leads to results, communication leads to follow-up, and in short, we must have a beginning and an end.

Operation and maintenance goals: safety, stability, efficiency, and savings

Security, the company's operation and maintenance should first put security first. Security vulnerabilities and information leaks will affect the company's future development and even life and death. There are many information leaks that have occurred in Internet companies, which have caused great harm to these companies. Large negative impacts, the financial expenditure to recover these impacts will be huge. So safety is a top priority.

Stability, ensuring the stable operation of the business under the premise of security is a serious consideration for our operation and maintenance personnel. The stability of the system is related to the user experience, and its importance is self-evident, so I will not go into details here.

Utilize all resources efficiently and effectively to maximize their value.

Savings, hardware cost expenditures are the bulk of a company's expenditures. How to save costs on hardware is something we should consider. We can't make money, but we can save money.

process management

Process is a must in our work. There are many processes in our work, but there are only a few that strictly follow the process. I believe everyone will smile knowingly. Many processes are used to settle accounts after the fall. When you make a mistake in your work, your leader will pull out the process and criticize you. Of course, the leaders are not to blame for this, because many processes are drafted and formulated by ourselves, so we should pay more attention when formulating the process, consider the feasibility of the process, and make the leaders accept it.

So what kind of process is a good process? There is a short story here. There is a famous architectural design master who designed Disneyland. After three years of careful construction, it will be opened to the public soon. However, the final plan for how to connect the various attractions has not yet been determined.

This master asked the construction department to sow grass seeds on the ground of the park and open it in advance. After the grass grows, the park will be opened and tourists can walk on the grass as they please. During the half year of the early opening of Disneyland, many paths were trampled on the grass. , these paths are wide or narrow, elegant and natural, and then the master has people lay sidewalks according to these tread marks. In the end, the master won a world award for this trail.

daily operations

As an operation and maintenance, daily maintenance operations of the server are very frequent, and it is necessary to keep good operation records. If repetitive things need to be templated and process things need to be automated, this can greatly reduce the probability of errors.

For some special operations, you need to write down the operation steps before the operation. The more detailed the better. You cannot just go to the server and do whatever you want when you have an idea in your mind. Having a clear purpose and envisioning it in your mind will greatly reduce the chance of making mistakes. After the operation is completed, be sure to record the operation results in the form of screenshots.

Monitoring alarm

We will not discuss the advantages and disadvantages of each monitoring tool here. The working methods of each monitoring tool are similar. The alarm methods are nothing more than text messages, emails and other common methods. But in my work, there are thousands of servers, and there are many types of alarms. Dozens or hundreds of alarms may be received at a certain time. It is easy for operation and maintenance personnel to ignore them one by one. Therefore, we need to perform secondary processing, merger and classification of the alarm information sent out. Combine alarms of the same type and classify them according to their urgency.

Also, I think the alarm form should be adopted in a way that makes people passively accept it, such as using a large screen display, loudspeaker alarm, sending the alarm to the operation and maintenance WeChat group, etc. This can greatly reduce omissions and neglect by operation and maintenance personnel, and can also prompt operation and maintenance personnel to take the next action based on the alarm level.

Troubleshooting

For operation and maintenance, handling faults is commonplace. The time and method of handling faults are important indicators of operation and maintenance capabilities. The more experience you have, the faster and more accurate the method will be to handle faults. The experience here also includes the skills of using search engines.

In my opinion, intuition is also very important. It may not be useful in some fault problems with obvious prompts, but when you encounter some vague log prompts, it will become apparent. Intuition will allow you to cut through the fog and find the fastest solution to the problem. method. How to improve your intuition? Intuition comes from experience, and experience comes from continuous self-learning and experimentation. Don't run away from problems, you can't escape them, so face them head on and gain experience.

One more thing I want to say here is the email reply after the problem is solved. Since we want to operate ourselves as a brand, then what we hand over should be a product. What is a good product? To become a good product, it should be perfect. It's impeccable and makes people feel comfortable. Then the email we reply should contain the following points: problem solving results, problem cause, problem solving process, problems you may encounter in the future, suggestions, etc.

Technology reduces man-made accidents

People always make mistakes. As an operation and maintenance operator, the best way to reduce the chance of making mistakes is to use technology to solve them, such as changing command line operations to selection operations and increasing the approval process. These require us to improve the automated operation and maintenance platform. Operation and maintenance personnel no longer need to log in to the server to perform operations. Every step of the operation is audited, fault-tolerant, and recorded. This can greatly reduce man-made accidents.

-END-


I have also compiled some introductory and advanced information on Python for you. If you need it, you can refer to the following information.

About Python technical reserves

Learning Python well is good whether you are getting a job or doing a side job to make money, but you still need to have a learning plan to learn Python. Finally, we share a complete set of Python learning materials to give some help to those who want to learn Python!

1. Python learning route

Insert image description here

Insert image description here

2. Basic learning of Python

1. Development tools

We will prepare you with the necessary tools you need to use in the Python development process, including the latest version of PyCharm installation and permanent activation tool.
Insert image description here

2. Study notes

Insert image description here

3. Learning videos

Insert image description here

3. Essential manual for Python beginners

picture

4. Python practical cases

Insert image description here

5. Python crawler tips

picture

6. A complete set of resources for data analysis

Insert image description here

7. Python interview highlights

Insert image description here

Insert image description here

2. Resume template

Insert image description here
Insert image description here

Data collection

The complete set of Python learning materials mentioned above has been uploaded to CSDN official. If you need it, you can scan the CSDN official certification QR code below on WeChat and enter "receive materials" to get it.

Insert image description here

Guess you like

Origin blog.csdn.net/xiqng17111342931/article/details/133895991