Parallel and Distributed Computing Chapter 9 Algorithm Design

Parallel and Distributed Computing Chapter 9 Algorithm Design

9.1 Design process

9.1.1 PCAM design process

Four stages of parallel algorithm design process
• Partitioning breaks large tasks into small tasks and tries to exploit concurrency;
• Communication determines the data exchange between tasks and monitors the rationality of task division;
• Agglomeration optimizes communication costs or improves performance based on the locality of tasks, and combines some small tasks when necessary Tasks are combined into larger tasks
• Mapping assigns each task to the processor for execution and monitors its execution performance for the next round of iterative optimization.

Please add image description

9.1.2 Division

Task division
• It is to divide the original calculation problem into some small calculation tasks to fully exploit the parallelism existing in the algorithm;
• Perform data decomposition (domain decomposition) first, and then perform calculation function decomposition (functional decomposition)
• The key point of division is to try to avoid data duplication and calculation duplication, so that the data set and The calculation sets are disjoint;
• The number of processors and the architecture of the target machine are ignored in the partitioning phase

domain decomposition

The object of division is data, which can be the input data, intermediate processing data and output data of the algorithm;

• 1 Give priority to dividing the largest data and decompose the data into roughly equal small data pieces;
• 2 Consider the corresponding calculation operations on the data when dividing, and calculate the difference in the stage, different data structures may need to be operated, or the same data structure may need to be decomposed differently;
• 3 If a task requires data from other tasks, inter-task communication will occur. communication;
• The main principle of 4 division is the principle of spatiotemporal locality, pay attention to the granularity of the spatiotemporal window;

Functional decomposition

• The object of division is calculation, and all calculation processes are divided into different tasks;
• After division, if the data required by different tasks do not intersect, the division is successful of;

9.1.3 Communication

Task communication
• Communication is the data transmission between tasks in order to achieve parallel computing.
• The tasks generated by division generally cannot be executed completely independently and require data exchange between tasks, thus resulting in communication;
• Communication is usually the data from The flow of "producer" to "consumer", "production" and "consumption" are both operations, and have a temporal relationship, so the division of communication operations cannot be determined through domain decomposition, but can only be determined through functional decomposition;
• Tasks are executed concurrently, but communication limits this concurrency;

Communication patterns
• Local/global communication (spatial locality)
• Structured/unstructured communication (topology)
• Static/dynamic communication (identity and role)
• Synchronous/asynchronous communication (blocking or not)

Communication Criteria
• Do all tasks perform approximately the same amount of communication? • Are global communications transformed into local communications as much as possible?
• Can each communication operation be executed in parallel? • Are the communication operations at the appropriate distance from the synchronization point? Facilitates asynchronous parallel execution

9.1.4 Combination

Combination of tasks
• Combination is a process from abstract to concrete, so that the combined tasks can be effectively executed on a type of parallel machine;
• Consolidate small-sized tasks to reduce the number of tasks. If the number of tasks is exactly equal to the number of processors, the mapping process is also completed;
• By increasing the granularity of tasks and repeating calculations, communication costs can be reduced;
• Maintain flexibility in mapping and expansion, reduce software engineering costs, and pay attention to load balancing

Granularity Control
• A large number of fine-grained tasks may not produce effective parallel algorithms, but may instead increase communication costs and task scheduling costs.
• Surface-volume effect: The communication requirements of a task are proportional to the surface area of ​​the data subdomain it operates on, while the computational requirements of the task are proportional to the volume (=data subdomain) it operates Surface area * depth of calculation operation) is proportional to;
• Under the premise that the volume remains unchanged, increasing the depth of calculation operations (also known as repeated calculations or redundant calculations) can reduce the surface area of ​​the subdomain, thereby reducing Traffic volume;

T total = T calculation + T communication

Combination Criteria
• Does increasing granularity through composition reduce communication costs?
• Are the benefits of double computation weighed against the cost of communication?
• Is flexibility and scalability maintained when combined?
• Is the number of combined tasks proportional to the problem size? • Does the combined tasks maintain similar computational and communication costs?
• Are there opportunities to reduce parallel execution?

9.1.5 Mapping

Task mapping
• The goal of mapping: each task must be mapped to a specific processor to reduce the execution time of the algorithm; mapping is actually a trade-off and is an NP-complete problem ;
• When the number of tasks is greater than the number of processors, domain decomposition introduces load balancing problems, and functional decomposition introduces task scheduling problems;
• Concurrent tasks are placed in different Processor, frequently communicated tasks are placed on the same processor

load balancing algorithm

  • Load balancing classification
    • Static: determined in advance;
    • Probabilistic: randomly determined;
    • Dynamic: dynamically determined;
  • Based on domain decomposition:
    • Recursive bisection: recursively select one dimension in multi-dimensional space for bisection;
    • Local algorithm: and Neighbor comparison determines whether to transfer the task to the neighbor;
    • Probabilistic method: a random number generator that meets a certain probability condition is released;
    • Circular mapping: equal probability special case;

8.2 Design methods

8.2.1 Partitioning technology

• 1.1 Uniform division method
Please add image description

• 1.2 Square root division
Please add image description

• 1.3 Functional division
Please add image description

9.2.2 Divide and Conquer

The divide-and-conquer strategy is a problem-solving methodology whose idea is to decompose the original large problem into several sub-problems with the same characteristics to divide and conquer. If the obtained sub-problem is still too large, the divide-and-conquer strategy can be used repeatedly until the sub-problem is easily solved. When using the divide-and-conquer technique, the decomposed subproblems are usually of the same type as the original problem, and can be easily solved using a recursive process.

Divide and Conquer vs. Division
• Different emphases: Division is oriented to the needs or processes of solving problems; Division and Conquer is oriented to the simplicity and standardization of solving problems.
• The difficulties are different: the difficulty of partitioning is the determination of the dividing point; the difficulty of divide and conquer is the problem of synchronous communication between problems and the merging of (recursive) results.
• The scale of sub-problems is different: the division is based on the solution needs, and the result is not necessarily equal; divide and conquer is generally divided into equal parts based on 1/k

Steps of parallel divide-and-conquer method
• (1) Divide the input into several sub-problems of equal size;
• (2) Simultaneously Solve these sub-problems recursively (in parallel);
• (3) Combine the solutions to the sub-problems in parallel until the solution to the original problem is obtained.

9.2.3 Balanced tree technology

The Balanced Tree method uses the input elements as leaf nodes to construct a balanced binary tree, with the intermediate nodes as processing nodes. It traverses layer by layer from leaf to root or from root to leaf, and each node is calculated in parallel at the same depth of the tree. . Balanced binary trees can be generalized to balanced multi-trees.
Please add image description

9.2.4 Multiplication technology

Doubling technology is also called pointer jumping technology. When called recursively, the distance between the data to be processed is gradually doubled. After k steps, the calculation of all data with a distance of 2^k can be completed. Particularly suitable for processing data structures such as linked lists or directed trees

Please add image description

9.2.5 Assembly line technology

Pipelining divides a computing task T into n consecutive subtasks T[1:n] through temporal overlap and spatial parallelism, so that the output of tk serves as the input of t(k+1), and After tk is completed, t(k+1) can start immediately and be calculated at the same speed.

Please add image description

9.2.6 Symmetry breaking technology

• Symmetry Breaking is to break the symmetry of certain problems. It is often used in graph theory and random algorithm problems.
Please add image description

Guess you like

Origin blog.csdn.net/weixin_61197809/article/details/134529792