(Rpm) much suitable database connection pool in the end

Original Post: https://blog.csdn.net/weixin_35794878/article/details/90342263

 

table of Contents

  • First, the author Foreword

  • Second, entrees start

  • Third, assuming your service 10,000 concurrent access

  • Fourth, why have this effect?

  • Fifth, other factors should be taken into account

  • Six connections formula

  • CONCLUSIONS: you need is a small connection pool, and a queue of waiting threads connected

  • Eight additional points need to pay attention

I. Introduction

Basically, most of the projects we need to do to interact with the database, then the size of the database connection pool is set to be much better?

Some developers might veterans will tell you: it does not matter, try larger settings, such as setting to 200, so that the database will be higher performance, throughput will be bigger!

You might nod, is it really? After reading this article, you may subvert your cognitive Oh!

 

Second, entrees start

Can be very straightforward to say, about the size of the database connection pool settings, each developer may fall into a pit on a link, it in fact, most programmers will likely rely on their own intuition to set its size, is set to 100? After ponder for a long time, just keep thinking about it should?

Third, assuming your service 10,000 concurrent access

May wish to look obscenity, your hands have a website, although concurrent pressure not to Facebook that level, but what? There are also 10,000 concurrent volume up and down! That is almost about 20,000 TPS.

So the question is! The site database connection pool should be set to how much better?

In fact, this question is asked in itself is a problem, we need to turn to ask the right question is asked should be:

"The site's database connection pool should be set to more than a small better?"

PS: Here's a short video released by Oracle performance team, the link address is http://www.dailymotion.com/video/x2s8uec, Tips, need to access .... Oh!

 

Oral look at the video of the Oracle database stress tests to simulate 9600 concurrent threads to operate the database, sleep 550ms between every two database operations, pay attention to the thread pool size in the beginning of the video set for 2048.

Let's look at the size of the database connection pool for the 2048 performance test results of ghosts:

33ms each request in the connection pool to wait in a queue, then the connection is obtained, require time-consuming SQL execution 77ms, CPU consumption remained at about 95%;

Next, we'll connection pool size piecemeal point, set to 1024, other test parameters unchanged, the result Zeyang?

"Here, get a long time waiting for a connection essentially unchanged, but the implementation of SQL reduces the time-consuming!"

Whoops, there grow Oh!

Next, we set up smaller, reduce the size of the connection pool to the other 96 parameters, the number of concurrent change and see what happens:

Each request queue connection pool average waiting time of 1ms, SQL is a time-consuming 2ms.

I go! What the hell?

We did not adjust anything, just the size of the database connection pool is reduced, so that will be able to 100ms before the average response time to 3ms. Throughput exponential rise ah!

You slip it too!

 

Fourth, why have this effect?

We might think about, why Nginx internal use only four threads, its performance is far beyond the Apache HTTPD 100 of processes it? The reason the words accountability, recall the basics of computer science, the answer is very obvious.

You know, even a single-core CPU computer can "simultaneously" run hundreds of threads. But in fact we all know that this is only a trick fast switching time slice operating system, to play with us nothing.

A core CPU can only execute one thread at a time, then the operating system context switching, fast scheduling CPU core, code execute another thread, constantly repeated, it caused us all processes running simultaneously illusion.

In fact, on a core CPU machine, the order of execution A and B is always better than slicing through the time switch "simultaneity" A and B to be fast, which causes the operating system learned in this course of children's shoes should be very clear. Once the number of threads exceeds the number of CPU cores, and then the system will only increase the number of threads more slowly, not faster, because this involves an additional cost of context switching performance.

Here, you should suddenly realized ......

 

Fifth, other factors should be taken into account

Speaking on the section of the main reasons, but in fact is not so simple, we also need to consider other factors.

When we look for performance bottlenecks in the database, roughly grouped into three categories:

  • CPU

  • Disk IO

  • Network IO

Perhaps you will say, there is the memory of this factor? Memory is indeed to be considered, but compared to disk and network IO IO, somewhat insignificant, there is not added.

Suppose we do not consider the disk and network IO IO, very good conclusion, and on an 8-core server, database connections / 8 threads to be able to provide optimal performance, if we increase the number of connections, but because of the context switching performance degradation.

As we all know, usually the database on the disk, and the disk, usually by some rotating metal disc and a stepping motor mounted on the head of a data storage component. Read / write head can only occur at the same time a position, when it needs to perform read and write operations again, it must be "addressed" to another location to complete the task. So? Here there will be a time-consuming addressed, in addition to the rotation time-consuming, head need to wait for data on the target disk disc "spin in place" in order to read and write operations. Of course, to be able to use caching to improve performance, but these principles still apply.

Within this period ( "I / O wait") time, the thread is in the "blocking" wait state, that is not doing get down to business! The operating system can use this idle CPU core service for other threads.

Here we can summarize, when your threading is an I / O-intensive business, you can make a thread / CPU core is larger than the number of connections of some, this can be in the same amount of time to complete more work enhance throughput.

So the question again?

Sized to how much better?

It depends on the disk, if you are using the SSD solid state hard drive, it does not need addressing, you do not need to rotate the disc. Come to a halt halt! ! ! You do can not take for granted that: "Since the SSD faster and bigger, we set the number of threads the size of it !!"

Conclusion contrary! No need addressing, and not cycle means less time consuming indeed blocked, so fewer threads (closer to the number of CPU cores) will play a higher performance. Only when the blocking intensive, more number of threads can play better performance.

We have said above the disk IO, then we talk about network IO!

Network IO is actually very similar. It can also cause obstruction to read and write data interface via Ethernet, 10G Bandwidth than 1G blocking bandwidth consuming less, while less than 1G of bandwidth will clog some of 100M bandwidth. Normally, we put on the network IO third place to consider, but some people will ignore the impact brought by the network IO performance calculations.

 

The figure is PostgreSQL benchmark performance test data, we can see from the figure, TPS slow start to reach 50 in the number of connections. Want to come back to the next, on top of Oracle performance test video, testers will have the number of connections from 2048 down to 96, 96 in fact still too high, unless your core server CPU number is 16 or 32.

Six connections formula

The following formula is provided by PostgreSQL, but the underlying principle is the same, it applies to the vast majority of database products on the market. Also, you should simulate expected number of visits, and by the following formula to set a reasonable value bias, then the actual test, by fine-tuning to find the most suitable size connections.

Connections = ((2 * cores) + effective number of disks)

Cores should not include Hyper-Threading (hyper thread), even if the open Hyper-Threading, too, if the whole hot data is cached, then the actual number of effective disk is 0, with the decline cache hit rate, the number of effective disk is gradually becoming close to the actual number of disks. Also note, this formula on how to effect the role of the SSD, is unknown.

Well, according to this formula, if your server CPU is a quad-core i7, the connection pool size should be  ((4*2)+1)=9.

Take a whole, we are set to 10 bar. You okay ah? 10 too small for it!

 

If you do not think the line, you can run the personality test can see, we can guarantee that it can easily support 3000 users at a rate of 6000 TPS concurrent scene to perform simple queries. You can also connect the pool size over 10, then you will see a long response time begins to increase, TPS began to decline.

CONCLUSIONS: you need is a small connection pool, and a queue of waiting threads connected

Let's say you have 10,000 concurrent access, and you set up a connection pool size is 10000, you are afraid to rock music blog Oh.

Into 1000, too? Into 100? Or too much.

You only need a size 10 database connection pool, and then let the rest of the thread business can wait in the queue.

The number of connections in the connection pool size should be set to: the number of queries the database task can be effectively carried out simultaneously (it is usually no more than 2 * CPU cores).

You should always see some users is not large web applications, in order to cope with about a dozen of concurrent database connection pool is set to shift the case 100, 200. Do not over-provisioning the size of your database connection pool.

Eight additional points need to pay attention

In fact, the size of the pool or to set up a connection with the actual business scenarios for something.

For example, your system while a mix of long and short transactions affairs, then, is calculated according to the formula above it difficult to get. The correct way is to create two connection pools, a service in the long transaction, a service in the "real-time" inquiry, which is short transaction.

In another case, say, a system to perform a task queue requires the same time the business is only allowed a certain number of tasks, then, we should let the number of concurrent tasks to fit the connection pool connections, rather than the number of connections size to fit the number of concurrent tasks.

Guess you like

Origin www.cnblogs.com/refuge/p/11249499.html