1:00 unexpected fatal accidents, labor multithreading to breaking!

A reader asked me: Do you think a programmer to have what kind of ability, can only be counted on is a powerful programmer?

I replied: programmers have the ability to solve problems.

The answer seems a bit abstract, it does not matter see the following article you will slowly understand.

First, the ability to solve problems

Many years ago, when I was a little rookie, I often tell my leadership, time to solve the problem, not limited to the technology itself, and the image of give me an example.

Once two programmers have been discussing how to determine whether the network connection between two servers normal, quarrel for a long time. A test the next to say, Ping does not know about yet? So they realized the Ping Java code to solve this problem.

Years later, although I know there is a more elegant way to solve this problem, but I still think that before testers is very smart. We fought a year later continued dealings, her ability to really strong, functional equivalent of a small company, product manager + testing.

We need to explain is: problem-solving skills and technical ability are two intervals ability, I have seen many programmers source play a slip, when production problems still do not know how to solve the problem.

When production problems arise, the highest level is a test of a programmer, in the face of high strength and high pressure, no deformation action, able to think calmly, analyze and solve problems, to achieve this level of the programmer, this can worship in ancient times to the general.

I have been very fond of programmers to quickly solve the problem, I am also happy when various production problems, the first time to study to analyze. Say unkind words, good programmers are programmers in solving problems through practice, especially in the event of problems in the production environment, able to stand out.

Second, the technology to share a late-night story

01. Old platforms and new platforms

The company has an old system, a new system.

Old system used for many years, had already exceeded the limit it can support, the first time in 2013 on the line of the system, estimated daily trading volume in one or two million a day now actually have ran out of the 4 billion the trading volume.

From 2013-2017 years, the technical team made a lot of effort, the old system using an Oracle database to support the largest trading volume, separate read and write various sub-library sub-table + on most hardware out; split system, reconstruction optimized many times, still can not meet the company's growing trading volume.

To tell the truth, the whole team can an old system in this way, but also really difficult. The original architectural design is unreasonable, mend after all can not solve a big problem. The new R & D platform to become the company's development have to do one thing, a new platform designed to billions of daily trading volume can support, the most important thing is to support billions of late day extension.

After the brothers of hard Fen station, the new platform finally on the line. The new platform is on the line less than half the success of data migration back is the focus of most things.
Run the old system for several years, using the traditional vertical architecture (Architecture Evolution can refer to this article: ? Spring Cloud talk from the perspective of the evolution of architecture have done ), a variety of business, policy, activities, wind control are rubbing together.

The new platform uses micro-service architecture, micro-light service will engage in hundreds, database Mysql HA. Both systems in architectural design across a generation, designed to be compatible with some of the features of the old system, but also made some redundancy, anyway, these two systems is not a product of the times.

Migration requirement is that the migration from the old platform to the new platform, we can not affect the normal trading business. Analogy is equivalent to, you drove on the highway run, replace the wheels in motion the process, and this process will have to let the car you can not feel anything.

So we developed a migration system, originally planned the migration batch to batch, and a new platform to cut one or two billion in trading volume, and then slowly look at the results go according to the rhythm, but suddenly came a policy ( activities) disrupted the rhythm.

02. The new policy changes brought about by

For third-party payment companies concerned, often introduce some new policies (activity) As the market environment, some policies is relatively simple, but most policies are complex and often require a large amount of development.

At that time the new platform has been switched for a while, we slowly have a certain confidence in the new platform, it was decided in this new policy in the new platform. It plans to implement policies of the evening, to which the remaining business on a platform to migrate all the old to the new platform.

After the program laid down, the department began to carry out their duties, Foreign Operations Center issued a notice, we have to have a big New Year's Day action at the time, what kind of change might be; marketing center responsible for contacting the agency in batches training; Business department began the company's official documents, issued by the branch.

We advance and customer service operations to communicate well, what problems might face make plans in advance; public numbers, the company's official website, App, Mail Foreign inform policy change, announced the beginning of the implementation date; Product Center for Policy floor needs combing, R & D center the new policy established program.

By far the most important thing is to ensure that the New Year's night can put the remaining few million merchants, disposable smooth migration to the new platform.

03. Migration at midnight

Before the migration program has performed many times, so we are relatively assured of this, but still mainly responsible for migration and colleagues confirmed many times, the development environment must be tested two weeks in advance is completed, UAT environment needs to be finished before the migration test week development and testing dual verification.

Until the migration distance of three days, I found a special charge of migration that programmers understand the progress, ask simulation tests have not been in production. After confirming that there is no problem, according to the time of the main person in charge of the feedback estimate thirty-four hours can be moved over, so 1:00 am start, 4:00: You can migrate between 00 finish: 00-5.

Before actually performing the migration day, they took all departments to do a communication will, we discuss the various situations that may arise, and personnel departments need to stay together. After meeting to end, I feel pretty good, quiet at night, etc. This is a war!

That night stay and two dozen development testing, as well as some of my colleagues in other departments, about two dozen people, before 12:00 everyone laugh, play games and other quiet moment at 1:00 migrate, just because New Year's Day, office a feeling festive.

Time flies, 1:00 in Beijing, little star out of the window, the office of a tension.

A dozen colleagues around the side is mainly responsible for the migration of these programmers, programmers can clearly feel this great pressure (ha ha, I guess who is who put this kind of thing, there will be pressure). But after he was skillful in accordance with the test as many times before, many times to verify the data, click the button Migration.

First, migration agents in a production environment to see if the data is correct, after the implementation of the relevant personnel to start verification data. Operation and maintenance personnel audit log, the developer confirmed that the relevant node is normal, check the database engineers migrate data; testers verify the operating platform query data, test Pos swipe test, everything is normal!

Tried two agents have no problem, ready to All In the following, leaving millions of businesses, thousands of agents planned a shuttle up. The programmer is responsible for the migration, all agents will be numbered, configured to execute the program, click the Execute button, a little production tracking log, everything is normal.

Leaving several people monitoring data, others on the loose, and other follow-up work after the migration is complete. I returned to the station, lit a cigarette, thinking still relatively smooth tonight.

04. incidents

Relatively sleepy early hours of the night, when I lit a third cigarette, is responsible for the migration of the programmer, ran a hurry to come to me.

"Strong brother, the question arises!"

Hearts surprised, puffed a cigarette, stubbed out the cigarette, to hurriedly asked: "What problems arise?"

After the original programmer after the migration program execution, we have been tracking the progress of the migration, found over half an hour to migrate 10 million businesses, the old platform, a total of several million businesses, according to this speed, all after executing a few days .

This thing is so big!

If you do not get this thing before 8:00 am, it is entirely a major accident.

I will not speak how to deal with old and new data is fragmented platform, if the company delayed the implementation of the policy, how in such a short period of time the notification information to the millions of businesses, thousands of agents, is an impossible workload.

Imagine the next day what kind of situation will arise, the customer service 400 phone was ringing off the hook, operational staff to communicate to vomit blood, due to the delayed implementation of the company policy could result in the loss of compensation for acts of agents ...

If we do not solve this problem out within an hour, you need to immediately report the company's deputy general manager, and then estimate the night all the company's management, the company will need to meet to discuss follow-up treatment program.

Although the migration of brain flashed serious consequences of failure, but the immediate need to pressure all the ideas, first analyze in the end where there is a problem, what kind of downgrade or no remedy.

Analyze the reasons:

After the query log, check the basic data to identify the cause of the developer in production testing, when use is to test small and medium sized agencies conducted; but ignores the great differences between different sizes agency company, the largest core agency a provider of data, may account for 5% -6% of the overall trading volume platform.

So depending on the time of small and medium agency assessment is certainly not accurate, things are now I will not speak whom. The key issue is how to quickly solve the next, we all want to come with solutions, what measures can make migration a little faster.

Remedy:

For example, to synchronize core data, other content subsequent re-processed to protect trade the next day; such as Can all to deal with the use of artificial guide table, database engineers to hear this program when almost cried fainted, more than one thousand tables, relationships are complex; various other programs ..

When everyone rushes to discuss the optimization program, only to find the main flow of the migration process does not use multiple threads to migrate.

After the migration program provides an interface, each migration when developers will need to be migrated to fill in the page numbering agency, backstage passes received parameter page, starting for the migration cycle.
While businesses in the agency uses multi-threaded migration, but migration agents main entrance, but did not use multithreaded, so we wonder if this is also the agency to speed up the migration with multi-line speed.

05. Multi-threaded artificial save games

After the discussion, I feel multithreading to migration agents should now be a good program, but if you write to the scene, has not been tested directly on production execution, the risk is relatively large.

Then what do not change this program can be achieved concurrent agents migration effect? Indeed!

We all know that we usually develop Web applications, each request will be assigned to the rear end of the front desk a Servlet to handle the response, the Servlet is actually a separate thread. Then each open more than a few pages, while the migration request is not to achieve the effect of multi-threaded migration agents do?

He went ahead, after the program stopped before the migration, select a dozen agents multithreaded migration test, while open four pages, each enter a different agency, began the migration test and found everything after the test normal.

Started to increase the amount of testing, using dozens of agents in the different pages input, click the migration program has, in the course of the second concurrent migrations suddenly found some errors will be reported from time to time.

Stop the migration program, start looking for the cause, there was found to be sharing data in accordance with the reason being given.

We know Servlet is not thread-safe, when there is multi-threaded access, and if there will be a global shared variable thread-safety issues.

This problem easy to solve, using ThreadLocal to modify on the line, ThreadLocal provide a copy of each independent variable using the thread of the variable, so each thread can independently change their copy without affecting other threads corresponding copy.

After this issue is resolved to continue to perform multiple pages open, but with a more than six parallel threads Tomcat when the machine load will be higher, because each thread will be called again from another thread pool to handle business, business migration logic members.

So we immediately arranged operation and maintenance personnel, to find ten servers in a production environment, in which ten servers are deployed on the migration of the primary scheduler. In order to prevent developers tremor problems, I make operation and maintenance gave me permission.

So on my computer (I use multiple screens), respectively, to open the migration program page on ten servers, the need to migrate all of the agents in accordance with each fifteen packets, each time a page to enter a group agents to migrate, and so in order to start the migration agent on each server.

When I perform a cycle of six times, database engineers detected migrate significantly faster data speeds up, so I spent two hours in the pages of all the agents were carried out migration.

Probably time to 4:00, and basically do my job done, and the rest slowly let the program run; 5:00, when most business data has been migrated, leaving only two servers are still continue to run; to 6:00, when the migration program to finish all ten servers.

After arranging for all relevant data were checked one by one, we long Shu mouthful.

7:00 am down together to eat breakfast, when everyone was saying, almost feeling last night to make life difficult. He joked that if 2, 3 o'clock in the morning, give us a call the boss, the boss look like feel.

At that time I thought such a big accident, the boss we fired all the little things, how to end what we are most concerned about. Lost his job can find things in any case all we need these people to solve process.

9:00 After opening the transaction, and after another there have been some minor problems, but they are a small area, the problem does not affect the transaction, the overall scope of a controlled group of people to migrate at night, almost all insist there is not much in the afternoon problem and we have to go back to sleep.

When retrospect, everyone a feeling of surviving.

Third, Event Review

Later we will open the recovery disk when summed up a lot of points there is oversight, but these are not the focus of this article. We back to the beginning of the article, what is the powerful programmer?

We can see that this problem is not particularly complex, requiring the processing technology is relatively simple, but the most critical is the time to solve the most pressing problems. So technology is no gap between them, but also the nature of learning technology in order to solve a variety of problems, not to fans of technical self-confidence, it is the best it can be.

Technical people to learn to enjoy the pressure, because the pressure is the driving force, the pressure is to allow you to grow, the sooner encountered grow faster. People in high-strength high-pressure environment, even if it may be a very simple action will be deformed, which may lead to greater secondary accidents.

Maintaining a stable heart calm analysis at high intensity, high-pressure environment, just calm down you really discover and solve problems. Many technical people, problems arise when you see him busy, in fact, is not thinking in that blind operation.

Calm down, a careful analysis of the entire chain, imagine what the place may be a problem, and then query log or a related command, step by step to the investigation, where the root of verification problems in the only really found the source of the problem, you deal with it when it can confidence.

That evening left behind migration programmers, are our core group of programmers, but who has the ability to lower on who has the capacity, in the evening it is easy to find good programmers like gold, key the time it will emit light.

Many people will encounter problems naturally stepped back, some people have problems just like to charge ahead. Whether you usually study the source code is much more Niubi, no matter how well written your PPT, companies need is the time to experience problems, it was able to solve the problem out on top.

Who can top up at a critical time programmer, basically back very easily to a managerial position. And the fact that people continue to build confidence in the run-in, in the choice of leaders promoted employees, the main consideration is that we can not be assured to get things to you.

So we usually study techniques, do not go astray, source code, design patterns should study these things, but should consider how to apply the research, and more focus on some of the practical type of knowledge, these things can save you a critical moment life (workplace).

Fourth, how to be a competent programmer

So as a programmer, how to cultivate the ability to solve their own problems? practice! practice! practice! Usually just a powerful learning technology input, if not practice, these capabilities will soon lost.

How that practice it, do project if the project company with less than this technique, you can own spare time to write your own code to debug something; other times there is a problem of multi colleagues to help solve the problem, when the company's problems, initiative to help solve the problem; to solve a variety of problems, is the fastest way to enhance the capability.

After the completion of the practice, but also to re-set the best summary of some, the contents of the summary record as a log, or blog. The recorded content will become one of your knowledge base, after experiencing similar problems when retrieving it can be solved, so constantly enrich their experience to solve the problem.

Finally, you would like to become a real big get technical!

Guess you like

Origin www.cnblogs.com/ityouknow/p/11301036.html