Java architect growth path-Java Amdahl's Law

Luban College Java Architect Growth Path- Java Amdahl's Law

Amdahl's law can be used to calculate the efficiency of the processor after parallel computing. Amdal's law is named after Gene Amdal proposed this law in 1967. Most developers who use parallel or concurrent systems have a feeling that concurrency or parallelism may bring speed, and they don't even know Amdahl's law. In any case, it is useful to understand Amdahl's law.

I will first introduce Amdahl's laws in arithmetic, and then demonstrate them with diagrams.

Amdahl's law definition

A program (or an algorithm) can be divided into the following two parts according to whether it can be parallelized:

The part that can be parallelized

Parts that cannot be parallelized

Suppose a program processes files on disk. A small part of this program is used to scan paths and create file directories in memory. After doing this, each file is handed over to a separate thread for processing. The scanning path and the creation of the file directory cannot be parallelized, but the process of processing files can.

We denote the total time of serial (non-parallel) execution of the program as T. The time T includes the time that cannot be parallelized and that can be parallelized. We mark the part that cannot be parallelized as B. Then the part that can be parallelized is TB. The following list summarizes these definitions:

T = total time of serial execution

B = Total time that cannot be parallelized

T- B = total time of the parallel part

It can be drawn from the above:

T = B + (T - B)

First of all, this may seem a little strange, the parallel part of the program does not have its own mark in the above formula. However, since the parallel can be expressed by the total time T and B (non-parallel part) in this formula, this formula has actually been simplified conceptually, which means that the number of variables is reduced in this way.

T-B is a part that can be parallelized, and executing in parallel can increase the running speed of the program. How much speed can be increased depends on how many threads or CPUs execute. We mark the number of threads or CPUs as N. The fastest time for the parallelizable part to be executed can be calculated by the following formula:

(T – B ) / N

Or in this way

(1 / N) * (T – B)

The second method is used in the wiki.

According to Amdahl's law, when the parallel part of a program is executed using N threads or CPUs, the total execution time is:

T(N) = B + ( T – B ) / N

T(N) refers to the total execution time when the parallelism factor is N. Therefore, T(1) is the total execution time of the program when the parallelism factor is 1. Using T(1) instead of T, Amdahl's law looks like this:

T(N) = B + (T(1) – B) / N

The meaning of expression is the same.

A calculation example

In order to better understand Amdahl's Law, let us look at a calculation example. The total time to execute a program is set to 1. The non-parallelizable program accounts for 40%, and the total time 1 is 0.4. The parallelizable part is 1 – 0.4 = 0.6.

In the case of a parallelism factor of 2, the execution time of the program will be:

T(2) = 0.4 + ( 1 - 0.4 ) / 2

= 0.4 + 0.6 / 2

= 0.4 + 0.3

= 0.7

In the case of a parallelism factor of 5, the execution time of the program will be:

T(5) = 0.4 + ( 1 - 0.4 ) / 5

= 0.4 + 0.6 / 6

= 0.4 + 0.12

= 0.52

Diagram of Amdahl's Law

To better understand Amdahl's law, I will try to demonstrate how this law was born.

First, a program can be divided into two parts, one part is non-parallel part B, and the other part is parallel part 1-B. As shown below:

Java architect growth path-Java Amdahl's Law

The straight line with the dividing line at the top represents the total time T(1).

Below you can see the execution time with a parallelism factor of 2:

Java architect growth path-Java Amdahl's Law

When the parallelism factor is 3:

Java architect growth path-Java Amdahl's Law

optimization

It can be seen from Amdahl's law that the parallelizable part of the program can run faster by using more hardware (more threads or CPU). For the non-parallelizable part, the speed can only be achieved by optimizing the code. Therefore, you can improve the running speed and parallelism of your program by optimizing the non-parallelizable parts. You can make a little change in the algorithm for non-parallelizable, and if possible, you can move some to the part that can be parallelized.

Optimize serial components

If you optimize the serialization part of a program, you can also use Amdahl's law to calculate the optimized execution time of the program. If the non-parallel part is optimized by a factor of O, then Amdahl's law looks like this:

T(O, N) = B / O + (1 - B / O) / N

Remember, the non-parallelizable part of the program now takes up B/O time, so the parallelizable part takes up 1-B/O time.

If B is 0.1, O is 2, and N is 5, the calculation looks like this:

T(2,5) = 0.4 / 2 + (1 - 0.4 / 2) / 5

= 0.2 + (1 - 0.4 / 2) / 5

= 0.2 + (1 - 0.2) / 5

= 0.2 + 0.8 / 5

= 0.2 + 0.16

= 0.36

Run time vs. acceleration

So far, we have only used Amdahl's law to calculate the execution time of a program or algorithm after optimization or parallelization. We can also use Amdahl's law to calculate the speedup (speedup), that is, how much faster the optimized or serialized program or algorithm is than the original.

If the execution time of the old version of the program or algorithm is T, then the rate of increase is:

Speedup = T / T(O , N);

In order to calculate the execution time, we often set T to 1, and the speedup is a fraction of the original time. The formula is roughly as follows:

Speedup = 1 / T(O,N)

If we use Amdahl's law instead of T(O,N), we can get the following formula:

Speedup = 1 / ( B / O + (1 - B / O) / N)

If B = 0.4, O = 2, N = 5, the calculation becomes as follows:

Speedup = 1 / ( 0.4 / 2 + (1 - 0.4 / 2) / 5)

= 1 / ( 0.2 + (1 - 0.4 / 2) / 5)

= 1 / ( 0.2 + (1 - 0.2) / 5 )

= 1 / ( 0.2 + 0.8 / 5 )

= 1 / ( 0.2 + 0.16 )

= 1 / 0.36

= 2.77777 ...

The above calculation result shows that if you optimize the non-parallelizable part by a factor of 2 and parallelize the parallelizable part by a factor of 5, the latest optimized version of this program or algorithm can be up to 2.77777 times faster than the original version.

Measurement, not just calculation

Although Amdahl's law allows you to parallelize the theoretical speedup of an algorithm, don't rely too much on such calculations. In actual scenarios, when you optimize or parallelize an algorithm, there can be many factors that can be taken into consideration.

Memory speed, CPU cache, disk, network card, etc. may all be a limiting factor. If the latest version of an algorithm is parallelized, but causes a lot of waste of CPU cache, you may no longer use x N CPUs to get the desired speedup of x N. If your memory bus (memory bus), disk, network card, or network connection is under high load, the same situation is true.

My suggestion is to use Amdahl's law to guide us in optimizing the program, not to measure the actual speedup brought by optimization. Remember, sometimes a highly serialized algorithm is better than a parallelized algorithm, because the serialized version does not require coordination and management (context switching), and a single CPU works on the underlying hardware (CPU pipeline, CPU cache) Etc.) The consistency may be better.

Guess you like

Origin blog.51cto.com/14993817/2553764