Nine, performance and scalability

The main purpose of the program is to improve the thread running performance e. Threads can make the program more fully to the available processing power of the system, thus improving the resource utilization of the system. In addition, the thread may make the program immediately start processing a new task in the case of operating an existing task, thereby improving the responsiveness of the system Gao.

一、对性能的考虑

Enhance performance means doing more with fewer resources. The meaning of "resource" is very broad. For a given operation, often they lack a particular resource, for example CPU时钟周期、内存、网络带宽、I/O带宽、数据库请求、 磁盘空间, and other resources. 当操作性能由于某种特定的资源而受到限制时,我们通常将该操作称为资源密集型的操作,例如,CPU密集型、数据库密集型等。
Although the use of multiple threads goal is to improve the overall performance, but compared with single-threaded approach, using multiple threads will always introduce some performance overhead. The overhead caused by the operation comprising: a coordination between the thread (e.g. lock, a trigger signal synchronous memory and the like), h increased following the switching, thread creation and destruction, and thread scheduling. If the excessive use of threads, then these costs even more than the due mention Gao throughput, or response of computing power brought about by the performance. On the other hand, a concurrent poorly designed applications, its performance even worse than the same function for performance serial program.
In order to achieve better performance through the concurrent need to strive to do two things: more efficient use of existing sources and make the program consultative process in the emergence of new processing resources as much as possible with these new consultative assassination sources. From the perspective of performance monitoring view, CPU need to keep busy as much as possible. (Of course, this does not mean that the CPU clock cycles wasted on useless calculations, but do some useful work.) If the program is computationally intensive, it can raise performance by increasing processor Gao. Because if the program can not make the existing processor to keep busy, so no amount of increased processor will not help. By the application into a plurality of threads to execute, such that each processor performs some work, so that all CPU remains busy.

  • 性能与可伸缩性

应用程序的性能可以采用多个指标来衡量,例如服务时间、延迟时间、吞吐率、效率、可 伸缩性以及容量等。Some of these indicators (service time, Chui wait time) is used to measure the program "run rate", that is a given task unit needs to "fast" to complete the process. Other indicators (production, throughput) for a program of "processing power", that is, in the case of certain computing resources, to complete the "how much" work.
可伸缩性指的是:当增加计算资/源时(例如CPU、内存、存储容量或I/O带宽),程序的呑吐量或者处理能力相应都增加。

  • 评估性能权衡因素

In almost all engineering decisions will involve some form of trade-offs. When the construction of the bridge, use thicker steel can be improved load capacity and safety of Gao bridge, but will also increase construction costs. Although not usually involve money and personal safety in the decision-making software engineering, but not in time to make the right trade-offs often lack the appropriate information. For example, the efficiency "quick sort" algorithm on large data sets very Gao, but for small-scale datasets, the "bubble sort" is actually more efficient. If you want to achieve an efficient sorting algorithm, we need to know the size of the data set to be processed, as well as a measure of optimization of indicators, including: the average computation time, the worst time, predictability. However, writing in a library sorting algorithm developers often can not know these information needs. This is one reason why most optimization measures are not mature: they often can not get a clear set of demands.
避免木成熟的优化。首先使程序正确〆然后再提高运行速度——如果它还运行得不够快。

二、Amdahl 定律

In some problems, if more resources are available, then the problem resolution faster. For example, if more workers to participate in the harvest, then the quicker completion of harvest. And on some tasks are serial in nature, for example, the increase in the workers no matter how much it is impossible to increase the growth rate of the crop. If a thread is mainly to play the processing power of multiple processors, then there must be a reasonable parallel decomposition issues, and makes the program can effectively use this potential parallelism.
Most concurrent programs have many similarities with farming, they are a series of parallel and serial work consisting of work. Amdahl's Law is described: in the case of increasing the computing resources, the program can be realized theoretically most Gao speedup, this value depends on the program in parallel with a serial group with the proportion of component occupied. F portion is assumed to be serially executed according to Amdahl's law ^ So, a Chinese group comprising processing machine, maximum acceleration ratio:

加速比 <= 1/(F+(1-F)/N)

When N approaches infinity, the maximum speedup tends to 1 / F. Therefore, if the program needs to calculate 50% of serial execution, then the most Gao speedup only 2 (regardless of the number of threads available); if 10% of the computing needs in a serial program execution, then the maximum speedup will be close to 10. Amdahl's law also quantifies the efficiency of serialization overhead. In the system has a processor 10, if 10% of the program portion required serial execution, then the most Gao speedup of 5.3 (53% utilization), the system has a processor 100, the acceleration ratio can reach 9.2 (9% usage). Even with infinite number of CPU, it is impossible to speedup was 10.

三、线程引入的开销

Single-threaded program neither thread scheduling, there is no synchronization overhead, and does not require the use of locks to guarantee data structure
consistency. In the process of scheduling and coordination of multiple threads requires some performance overhead: In order to improve the performance and the introduction of
the thread, the parallel performance gains must exceed the cost of concurrent cause.

  • 上下文切换

If the main thread is the only thread, then it basically does not go out Chen scheduling. On the other hand, if the number of threads that can run larger than the number of CPU, the operating system will eventually be a running thread scheduling out, so that other threads can use the CPU. This will result in a context switch to save the execution context of the currently running thread during this process, and the new schedule execution of the thread context is set to the incoming current context.

  • 内存同步

Performance overhead comprises a plurality of synchronous operations aspects. We may use special instructions to ensure visibility provided by the synchronized and volatile, i.e., the memory barrier (Memory Barrier). Fences can flush the cache memory, the cache is invalid, the refresh hardware write buffer, and stop the execution pipeline. Memory barrier may be equally consequential impact on performance because they will suppress some compiler optimization operations. In memory fence, most operations are not be reordered.

  • 阻塞

When the thread can not acquire a lock, or a condition due to the blocking or waiting on the I / O operation, it needs to be suspended, during this process comprises two additional context switches, as well as all the necessary operating system and operation cache operation: blocked thread execution before its time slice was swapped out has not run out, and when to get in and then lock or other resources available, and again switch back. (Due to obstruction caused by lock contention, thread holding the lock when there will be some overhead: When it releases the lock, you must tell the operating system to resume operation blocked thread.)

四、减少锁的竞争

In the concurrent program, the main threat to scalability is the exclusive resource lock.
There are three ways you can reduce the degree of competition in the lock:

  • Reducing the holding time lock.
  • Reducing the frequency of the lock request.
  • Use an exclusive lock with coordination mechanisms, these mechanisms allow a higher concurrency.
  • 缩小锁的范围
  • 减少锁的粒度

Another way of reducing the hold time latch is to reduce the frequency of thread requests the lock (to reduce the likelihood of contention). This may be achieved by the lock and the lock segment decomposition techniques, the use of multiple independent locks in the art to protect these independent state variables, these variables so as to change the situation prior to protection by a single lock. These techniques can reduce the particle size of the lock operation, and can achieve higher scalability, however, the more locks used, then the higher the risk of deadlock.

  • 锁分段

  • 避免热点域

Guess you like

Origin blog.csdn.net/qq_27870421/article/details/90583338