Java fresh business platform -SpringCloud micro Services Architecture combat high concurrency parameter optimization

Java fresh business platform -SpringCloud micro Services Architecture combat high concurrency parameter optimization

 

I, EDITORIAL

I believe in fresh Java platform for electronic business platform lot of my friends are using Spring Cloud framework in their own company to build a micro-service architecture, after all, this is a very fire of a technology.

If only a small amount of users of traditional IT systems, using Spring Cloud may be exposed not see any problem.

If the amount is more users, the peak per second, the Internet company's request for tens of thousands of concurrent systems, using Spring Cloud technology there are some issues that need attention.

Second, the introduction of a scene, the early problems

A friend of the company to do business like the Internet, set up a small research and development team, up on the use of Spring Cloud technology stack to build a micro-service architecture of the system.

Overtime day and night for some time, finally the core business system to do it, and usually normal QA test did not find any major problems, I feel pretty good performance, everything is perfect.

Then the system so on the line, a user starts small scale, a small amount of hundreds of thousands of registered users, Nikkatsu thousands of users.

Every day new data into the table in the database, so over time, did not think the size of the data actually grow slowly and to a single table millions.

This time it seems also not much wrong, is to have users complained that some operating systems, will feel Caton few seconds, the brush out page.

This is why it?

The core reason is that a number of large single-table data, to several million.

Individual service, run SQL is more complex, multi-table associated with a lot of

And not design a good index, or the design of the index, but the frustration of some big brother wrote hundreds of lines of SQL, SQL is too complex, then a SQL ran out of several seconds is certainly normal.

If we understand a little bit of micro-services framework, it should know, for example Feign + Services Invocation Framework Ribbon composition, there is the interface to call a timeout that said, there are some parameters can be set timeout interface calls.

If you call an interface for several seconds brush does not come out, people will return timeout exception, users do not brush out the page.

 

Third, Young Tom just boiling, harm than good

Generally encountered this kind of thing, like a big lump of feces out there SQL, SQL write their own people a month have not read, 80% of engineers looking at are not willing to spend time rewriting and optimization.

First, the modified labor costs are too high, and second, who would burden of this responsibility?

System running properly, it is the slow point, the results you simply veered through a reconstruction, the system core business processes hung out how to do?

So, those brothers first reaction was: increase the timeout ah! Interface can be slow, but do not respond to overtime ah!

Let's make the interface to perform a few seconds to return results, the user can not brush out the page! No reconstruction system, ah! Easy + happy!

How to increase it? Very simple, look at the following parameters to know:

 

If you read the previous article, you should know, Spring Cloud in general will be to perform the requested interface calls with hystrix thread pool.

So set the timeout generally set two places, feign piece of ribbon and overtime, as well as hystrix piece timeout. The latter piece generally must be greater than the former time-out.

Spring Cloud play good brothers, do not look at these configurations can laugh, because I really seen a lot of not so slip buddies Spring Cloud play, and really did so.

 

Well, life goes on. . .

After optimization of the parameters, the effect looks pretty good, although I think some user page is slow slow, but at least a few seconds to brush out.

This time, thousands of users daily living, simply did nothing at all complicated by the peak of a decade or two at most nothing more concurrent requests per second.

 

Fourth, the issue broke out, a scourge

Over time, the rapid development of the company's business ......

The brother of the company, the system grinding mature, after tens of thousands of pilot users are ok, the boss immediately get tens of millions financing round.

The company will definitely spirited ah! Then set up operations team is, to push the team to promote the country's large range.

Anyway, three words : Push! Push! Push!

The push does not matter! R & D personnel in the background system find that their subscribers Cengceng linear growth rub.

Registered users increased by several times, breaking the million level, Nikkatsu users also turned a few times, at the peak of that kind of, actually reached millions of users daily living!

 

Happy trouble. . .

Why do you say that? Because the amount of users up, something tragic happened.

The peak of concurrent requests per second actually reached nearly ten thousand degree, R & D team of brothers where dare neglect! In this process, first of all kinds of nervous expansion of service, one becomes two, two become four.

Then hang database master-slave architecture, separate read and write is required, how can a single database server or bearer request big! More than a few out from the library, carrying about a large number of read requests, so basically Kang Zhu.

Koucha is preparing to sit down, relax, things are more tragic happened.

In the process, those brothers often find that a function of the peak of the page, the system suddenly hang the whole died, and then just can not respond to any request! All users refresh this page are all unable to respond!

Why is this? The reason is simple ah! A example of a service, the calling thread pool thread dedicated to the service B, a total of probably dozens. Each thread calls service B will be stuck for 5 seconds.

What if hundreds of requests per second over this service instances it? All of a sudden that thread pool thread to hang all the dead, then not respond to any requests.

We take a look at this picture below, then intuitive feel helpless in this process!

 

This time we supposed to? Brothers only resorted to the programmer's oldest magic, restart the machine!

Encounter pages do not brush out, can only restart the machine, equivalent to a bit short initialization resources within the machine.

Then followed a period of operation, and stuck, reboot again! It is really collapse ah! They are extremely poor user experience, the boss's mood is angry!

Egaioto:

In fact, this is not a problem in itself, but if there is no real experience high concurrency scenarios for Spring Cloud, may indeed like a bunch of brothers, come up with some strange problems.

For example, the company, obviously should go to optimize the performance of the service interface, the result just cranked up the timeout. Resulting in high concurrency, and direct calls to the service of the dead hang, the system does not come out of the brush core pages, user experience, and this blame?

Fifth, bent over backwards to underlying causes

I can not do anything, band of brothers, who can only find someone to help. Here is the guide of the whole course of their complete system optimization.

 

first step

The key point to optimize the performance figures of B core services . Internet companies, the core business logic, user-oriented C-terminal high concurrent requests, do not use SQL hundreds of lines of large, multi-table association, as a single table millions of rows of data amount, then it will lead to the implementation of several seconds.

In fact, the best way is, or is working on a database to perform simple single-table queries and updates, and complex business logic entirely on the java system to perform, such as some related to the calculation and the like.

After this step is finished, and the response speed of the core service B has been optimized to tens of milliseconds, it is not very happy? From a few seconds into a few tens of milliseconds!

The second step

That timeout, that is, those above and ribbon timeout hystrix settings.

Students advise you, not because of poor performance of the system and the interface is too lazy to mess seconds or tens of seconds of overtime, general timeout defined within one second, it is more common and reasonable.

Why do you say that?

Because an interface, the theoretical optimum response speed should be less than 200ms, or on the interface of several hundred milliseconds slow.

If an interface response time of one second + recommended to consider caching, indexing, NoSQL and other techniques you can think of, look to optimize performance.

Otherwise, if you casually set timeout a few seconds, or even tens of seconds, the downstream response service in case there is some chance a matter of time trifle generous it? Then you this thread pool thread flew all stuck!

Specific hystrix thread pool and best production practices timeout, see the next article: "How to protect the micro Services Architecture 99.99% availability at double 11 carnival"

 

After these two steps to solve, in fact, the performance of the system to normal, B core services respond quickly, but also within one second timeout, the situation hystrix frequently stuck thread pool does not appear.

third step

Thing is not finished, you have to really think two steps to get it, it was lack of experience.

If you wanted to become a one second timeout, because if the network jitter incidental, resulting in a call to the interface is in the 1.5 seconds it? This is often the case, because the issue of network interface accidental calls timeout.

Therefore, at this time with the timeout period, usually a reasonable set retry, as follows:

 

After setting this retry, Spring Cloud in combination Feign + when the Ribbon, making service calls, if you find a machine timeout request fails, it will automatically retry this machine, or if not will change another machine Retry.

Since such occasional timeouts due to network requests, can not be retried automatically avoid?

 

the fourth step

In fact, the thing is not finished, if the retry parameter configuration, and the results you actually let go, and that was not responsible for other people ah!

Your system architecture, it comes to retry, then idempotency must safeguard mechanism on the interface .

Otherwise, Imagine, if you pair an interface to retry several times, the results others repeated insertion of pieces of data, how to do it?

In fact Idempotence guarantee itself is not complicated, according to the business, the common scenario:

You can build a unique index in the database, insert data when the unique index if the conflict will not insert duplicate data

Or put through redis in a unique id value, then each time you want to insert data through redis determine what, if that value already exists, then do not insert duplicate the data.

There are some programs like this. In short, to ensure that when an interface called multiple times, and can not insert duplicate data.

 

Guess you like

Origin www.cnblogs.com/jurendage/p/11406867.html