4 Disciplines of Operation and Maintenance DBA 9 Points of Attention

Abstract: Friends joked that operation and maintenance is a job of pinning one's head to the belt of one's trousers, and some people even said that operation and maintenance is a job of pinning their head to the belt of others' pants. No one recognizes the hard work. ! The classmates who tested said, "It is difficult for people who eat melon to perceive the effort behind operation and maintenance, but when something happens, it can better reflect our professionalism.

Friends joked that operation and maintenance is a job that puts their heads on their belts, and there are more people. Said that operation and maintenance is a job that puts one's head on someone else's waistband. No one recognizes the hard work. If you have a pot, you will have to bear it!



The students who tested said, "It is difficult for the people who eat melon to perceive the effort behind the operation and maintenance. Things better reflect our professionalism. "Sample, you haven't fallen into the pit yet.



Therefore, it is best to reduce the appearance of the pot.



However, when the pot comes, everyone has to carry it, regardless of whether you are in operation and maintenance, product, testing or development, there must be Let’s go out for a walk, right?



Today we will talk about how the operation and maintenance DBA can avoid the blame.



The situation of the operation and maintenance DBA is very bad, but no matter how bad it is, it is no better than the Red Army in the past. The Red Army relied on three disciplines and eight Items pay attention to get through the difficulties, and if the operation and maintenance DBA implements it earnestly, they can also overcome the difficulty of taking the blame.



Four disciplines of the operation and maintenance DBA


1. Follow orders in all actions



Regardless of whether you are a team or a gang, the requirements are the same. Follow the command in all actions! Whose command? Follow the command of the operation and maintenance manager, the operation and maintenance director, the CTO, and the CEO.



When Mozi was a tycoon, he had 180 subordinates, well-trained, with one heart and one mind. , death will not come.” Such a team to engage in operation and maintenance has the basic requirements. In the



operation and maintenance team, the most taboo is the person who has three-legged cat kung fu, contempt for the experience of the predecessors, and is impetuous. The team leader must be corrected or even eliminated in a timely manner, otherwise this is the biggest source of your blame. I have been pitted a few times because there are such people in the team who are not firm enough when they want to do something. Bowing down to give customers and leaders doomed criticism. What is this called, a mouse feces ruined a pot of soup.



Therefore, when selecting operation and maintenance members, it is necessary to choose young people who are down-to-earth, alert, motivated, and have strong communication skills.



2. Two red lines cannot be committed The



so-called red lines are the sky bars. The first is to act according to the command. In fact, it is alive, and it may be necessary to ask for instructions and report. This second one is dead, just like a high-voltage line, and it's over when you touch it.



All changes must be done: all changes must have a plan, all plans must be reviewed before they can be implemented, all implementations must strictly follow the plan, and major changes need to be verified by someone.



This one is actually to avoid misoperation, which is a human fault. The proportion of human failures in all failures has always been high.



All failures affecting the business, whether it is a hardware failure, a software failure or a human failure, must be notified to the department manager as soon as possible.



This is to avoid it. Technicians love to dig into the horns. When they see a fault, they can't get out. This delays the opportunity and wastes the opportunity to quickly restore business.



3. Capacity planning before the holiday I



remember that there was a team outing in a certain year. During the meeting, a DBA said sleepily that he was alerted at 3 o'clock in the middle of the night. This is not to mention, when he was playing the escape room, he received another alarm call from the computer room, and a serious alarm was issued when the utilization rate of a certain business table space exceeded 85%. Are you blind?



If you want to easily celebrate the festival, or go out to play, in addition to making backups, the most important thing is to do capacity planning. The most basic table space, file system space, historical alarms and other basic conditions are swept through, at least you can safely wait until you come back from vacation.



For some special e-commerce systems, holidays may be the peak period, so it is not only about space, but also about performance forecasting and solution plans.



Fourth, backup and restore every year to do



backup to do, restore more to do. If you are a manager, you must think that your DBA will do it for you.



Not surprised, the desensitization data of the real case:



20170223095357656.jpg



If the enterprise lacks corresponding backup equipment or software, the DBA is obliged to urge leaders to purchase the hardware and software equipment required for recovery drills. Because in the event of an accident, the direct leadership of the DBA often cannot take this responsibility. After all, the data cannot be protected. How can users trust your company, whether you are a central enterprise or a state-owned enterprise.



Nine points of attention for operation and maintenance DBA The


three disciplines are rules-Rules, and the eight points are guiding principles-Guidance.



People who do operation and maintenance can't always say that we didn't expect this, oh, we didn't expect this. This is climbing a snowy mountain, crossing the grass, and sinking into it if you don't pay attention. Where will you leave time for your blind BB?



1. Be in awe of the production environment.



You may not have heard of "a tnsping to turn over 6 P595s", you may not have heard of "a cp command to stop the use of the business system for 30 minutes", you may not have heard of "build an index to make All underwriting services are unavailable", you may not have heard "I originally wanted to shut down my virtual machine, but did not want to shut down the production library"...



There are many things you haven't heard of, and more things you haven't done. , because you are still young.



But be sure to be in awe of the production environment.



All operation commands can not be used by searching on the Internet. You need to understand the side effects of this command as much as possible. What is the worst possibility of this command going down? If you don’t understand, just ask for advice humbly. There are so many big cows in the DBAplus community, I’m really embarrassed, so I’ll just throw a big red envelope and ask.



2. Keep it on for 24



hours . There is no such thing as a complete vacation for operation and maintenance. Don't think that you will be shut down when you are on vacation. It is not far from your closure. Well, so some companies list this as one of the disciplines.



I have encountered such a situation, a DBA took a leave of absence, just happened to have an environment password that only he knew, and this environment is now a problem. Can you imagine how anxious people were at that time? Well, that DBA left the scene for a long time after returning from vacation.



3. Ask more people who use the app to chat



A DBA who doesn't understand business at all is not a qualified architect.



If you want to understand business, applications, and services, you must chat, eat, and smoke with the people who use the application. You should always respect others, and if they are willing to tell you, you will become more and more familiar with the business. Over time, you can adopt a more appropriate architectural solution for driving your business.



4. Don't make ordinary changes during working hours.



What is ordinary change? That's a change you could have made a day earlier.



For example, expanding table space, increasing user permissions, creating indexes... are not changes caused by emergency failures.



Plan for changes well in advance, and try to make all important changes every time you are exempt from assessment.



5. Regularly check the database. If the



database does not fail, it does not mean that the DBA is doing a good job, but the failure itself has not occurred. It is not that it is not reported, and the real time has not arrived.



Therefore, determine the inspection rules, regularly check the database, and carry out rectification. Corrections involving other cooperating parties must be copied by email and confirmed by phone.



6. Minimize permissions for database deployment



Installing the minimum necessary components and granting the necessary minimum permissions is an effective way to actively avoid pitfalls. A lot of data recovery, operation problems, if you can check from the permissions, you can save a lot of things later.



7. All guarantee methods must be verified for their continuous feasibility. A high availability system is



deployed , and a high availability switching test is performed before going online.



A disaster recovery system is deployed, and regular disaster recovery drills are required.



An emergency system is deployed, and regular emergency drills are to be carried out.



Do database backups, and do regular database recovery tests.



easy to say, hard to do. 90% of systems in the country do not do this. That's why you often hear cases of abnormal recovery. Especially those that use storage for disaster recovery, or use OGG for emergency. It's not that the technology itself is not good, but the management is not good.



8. Make every effort to implement automated operation and maintenance



Before seeing this, you may have been scolding secretly in your heart, what is the era, and it is still so old-fashioned.



In fact, no matter whether you have already started automated operation and maintenance, each of the preceding items is worth doing well, and is beneficial to you.



However, doing automated operation and maintenance is an inescapable path for operation and maintenance DBAs. Just like from Kunming to Shanghai, at first it was only possible to rely on caravans, and then gradually the expressway was connected, and now the Shanghai-Kunming high-speed rail is the same.



How to do this automated operation and maintenance? It's obviously not entirely reliable to reinvent the wheel entirely on your own. If you are not BAT or JD Xinmei, the best way is to find an automated operation and maintenance platform developed by a professional operation and maintenance company. Take a mule or a horse for a walk, and you will like it.



9. The beginning begins with communication, and the gains come from sharing. People who have



been lecturers will have such a consensus that after speaking, they will actually gain more than the "students" who listen to the lectures. Internet companies do this very well. Whether it is BAT or new giants, they have established technical colleges one after another, often led by industry leaders, and organized the technology sharing within the enterprise.



As a DBA of a traditional enterprise, an enterprise often does not have such a college, but there are many platforms on the Internet, such as the DBAplus community, and even some other communities provide such opportunities.



Why is it that newcomers who have worked in our team for one year can have the capabilities of DBAs who have worked in other companies for four or five years? In addition to the complex hardware environment, monthly sharing also contributes.



There is no end to operation and maintenance, and there is no end to precautions. If you have better suggestions, you might as well talk about it.

The original release time is: 2017-02-23




This article is from the Yunqi community partner DBAplus

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326228111&siteId=291194637