Knowledge point two nine: Parallel Computing

Foreword

Time complexity is a standard measure of the efficiency of the algorithm. However, the time and complexity can not be directly equated with performance. In the case of real software development, even without reducing the complexity of the time can be optimized through a number of means to improve the efficiency of the code. After all, the actual software development, it is even like 10%, 20% performance increase so small, is very impressive.

The purpose is to improve the efficiency of the algorithm code execution. That when the case can no longer continue to optimize the algorithm, how can we further improve the efficiency of it? To address this problem there is a very simple but very useful optimization method, that is, parallel computing .

Parallel Computing

Parallel Computing (Parallel Computing) refers to the use of computing resources to solve a variety of process computing problems, is an effective means for calculating speed and processing power of the computer system is improved. The basic idea is to use multiple processors to solve the same problem synergistic, about to be solved problem is decomposed into several parts, each part by a separate processor to parallel computation . Simply put, parallel computing is computing done on a parallel computer. Generally parallel computing can be divided into: Computation and the like, such as large engineering science; data-intensive 2:: 1 intensive computing. The digital library, data warehouse, data mining and visualization; 3 network-intensive: The remote diagnostics and collaborative computing.

For the use of parallel computing, typically calculation showed the following characteristics: (1) the decomposition calculation task into subtasks, help solve simultaneously; (2) at the same time, a plurality of sub-tasks may be performed simultaneously by different execution units; (3) multi-computing resource-consuming to solve the problem of time-consuming than under a single computing resource.

Here, we have a few examples to analyze, how to deal with the aid of the algorithm to transform ideas parallel computing?

Parallel sorting

Suppose we give the size of the data 8GB of sorts, and the machine's memory can hold so much data at once. For ordering, the three most commonly used sorting algorithm is the time complexity of O (nlogn), and merge sort, quick sort , heap sort . In theory, this scheduling problem has been difficult to optimize the algorithm level. However, the efficiency, the use of parallel processing idea, we can very easily this 8GB data to improve scheduling problems many times. There are two concrete realization of ideas below.

  1. The first is a parallel merge sort processing . 8GB of these data we can be divided into 16 small data sets, each set contains 500MB of data. We use 16 threads in parallel on this 16 500MB of data collection sorted. After that 16 small collections were sorting is completed, we then ordered collection of the 16 merger.
  2. The second is the rapid sort of parallel processing . First over the data, find data in which the range interval. Then, we put the range from small to large divided into 16 small sections. We 8GB of data divided by size relationship to the corresponding interval. This data for 16 small intervals, starting 16 threads in parallel sort. After 16 execution threads until the end, the resulting data is ordered the data.

Comparing the two roadmap, they are using the first partition idea, the data fragment, and parallel processing. The difference is that the first processing idea is to randomly fragmented data, and then combined after the sorting. The second idea is treated, according to the size of the data division section, and then sorting, drained do not need to process the order. In fact, this is exactly the same with the difference between fast and merge row.

However, also note the point is, if we want to sort the data size is not 8GB, but 1TB, the focus of the problem that the efficiency of the algorithm is not, but reading efficiency data. Because the data is certainly there 1TB hard drive can not be read into memory one time, so that sort of process, there will be frequent disk read and write data. How to reduce disk IO operations, reduce disk reads and writes data of the total, it became the focus of optimization.

Parallel to find

We know that the hash table is a data structure suitable for very fast lookup. If we are to build dynamic data indexing, data continue to join in time, the load factor of the hash table will be increasing. In order to ensure the performance of the hash table does not drop, we need to hash table for dynamic expansion. For such a large hash table for dynamic expansion, on the one hand is time-consuming, on the other hand is relatively consume memory. For example, we give the size of the hash table a 2GB capacity expansion, expanded to 1.5 times, which is 3GB in size. This time, the data is actually stored in the hash table is less than 2GB, so the memory utilization of only 60%, with 1GB of memory is idle.

In fact, we can use the partition parallel computing + roadmap, the first random data is divided into k parts (e.g., 16 parts), each of the data only the original 1 / k, then the data for the k small we set of hash tables were constructed. Thus, the maintenance costs of the hash table becomes low. When a small load factor of the hash table is too large, we can separate this small hash table for expansion, while others do not require a hash table expansion. Or just that example, suppose we have 2GB of data, we put 16 in the hash table, the data in each hash table is about 125MB. When a hash table capacity is needed, we only need an extra 125 * 0.5 = 62.5 MB of memory (expandable to assume or 1.5 times the original). Therefore, regardless of the efficiency from the expansion or memory utilization, processing methods such multiple small hash table, to be more efficient than large hash tables. When we want to find some data, we, in parallel to find this data in the hash table 16 by 16 threads. This lookup performance, compared to the practice of looking at a large hash table, did not fall, and have actually likely to increase. In addition, when data is added to the hash table, we can choose to load the new data into the hash table in the smallest factor, which would also help to reduce hash collisions.

Parallel String Matching

A keyword in the text to find such a function can be implemented by a string matching algorithm, such as the BF, the RK , the BM, the KMP algorithms. When you will search in a not very long text when any one of these string matching algorithms, can be performed very efficiently. However, if we are dealing with a super large text, and that the processing time may become very long, and that there is no way to speed up the speed to match it?

Similarly, we can use the partition + parallel computing processing ideas, we large text into text k small. Assuming that k is 16, we will start 16 threads, look for keywords in this 16 small text in parallel, so that the performance of the whole look would be increased 16-fold. 16-fold improvement in efficiency, from the theoretical point of view, it is not much. However, the real software development, this is clearly a very significant optimization. However, there is one detail here to deal with that keyword if originally contained in the large text is divided into two, split into two small text, although this will lead to large text contains the keyword, but Find it at less than the 16 small text. However, this problem is actually not difficult to solve, we only need for this particular case, to do some special treatment on it. Assuming that the length of the keyword to be searched is m, m we depicting characters at the end of each text and small at the beginning, at the end of m characters before and after a small text begins with a small text characters m, consisting of a string of length 2m. We then let's use keywords in this string of length 2m in rediscover it again, just to fill up the loopholes.

Parallel search

There are a lot of search algorithms on the map, such as depth-first search algorithm, breadth-first search algorithm , Dijkstra shortest path algorithm , A * heuristic search algorithm . For breadth-first search algorithm, we can transform it into a parallel algorithm.

Breadth-first search is a search strategy for layer by layer search. This layer is based on the current vertex, we can start multiple threads in parallel search for the next layer of vertices. In addition, in order to solve the problem of concurrent multi-threaded, in terms of code implementation, the code of the original BFS achieve, through a queue record has been traversed but does not extend to the apex. Now, after a parallel breadth-first search algorithm after the transformation, we use two queues to complete the expansion of the vertices of work. Suppose these two queues are queues A and B. queue Multi-threaded parallel processing queue vertex A, vertex and extend stored in the queue obtained in B. A queue after the other vertices are in complete expansion, A queue is empty, then we extend in parallel in the vertex queue B, and extend out of the vertices stored in a queue A. Such two queues recycling, can be achieved in parallel breadth-first search algorithm.

summary

Parallel computing is the realization of ideas on a project, although little relationship with the algorithm, however, in the actual software development, it does very subtly enhance the efficiency of the program, it is a very useful performance optimization tools. Especially when, after the size of the data to be processed to a certain extent, can not continue to optimize the algorithm to improve the efficiency, we need to achieve to make a fuss on the idea of ​​using more hardware resources to speed up the efficiency of the implementation. So, in many ultra-large-scale data processing, the idea of ​​parallel processing, is widely used, such as MapReduce is actually a parallel computing framework.

reference

"Data Structures and Algorithms beauty"
King of contention
before Google engineer

Parallel Computing and Parallel Algorithms: https: //blog.csdn.net/lulu950817/article/details/80686126
known on the topic of almost parallel computing: https: //www.zhihu.com/topic/19582194/top-answers

Published 45 original articles · won praise 2 · Views 1724

Guess you like

Origin blog.csdn.net/Mr_SCX/article/details/104213507