Parallel Computing: Supercomputer !!

Video Source: Hsinchu Tsinghua University: parallel computing and parallel programming course

 

The concept: a high level of computing power compared to the conventional computer supercomputer. Evaluation criteria FLOPS performance computing capacity, the number of processing floating-point operations per second.

Fast reasons: (1) the latest hardware technologies (2) optimization software library (3) to configure a custom system, (4) the consumption of resources and money

Communication is mainly used infiniband, computing power objectives: to reach 1EFlop (10 ^ 18) / s i.e. exascale computing.

CPU limit: the exchange of data between the division and the card will be a performance delay.

 

 GPU architecture is as follows (do vector computing)

 

 Each processor's unique memory, the memory between different processor only be shared through the global memory. But very different memory speeds gap between the two.

CPU architecture is as follows (element to do the calculation)

  

 

 TPU (do matrix calculation)

 

 Parallel computing network technology: performance parallel computing in a network communication card most part. At the same time, the synchronization is very difficult.

 

Network topology:

 

 

 

  Factors to be taken into account: the scale, performance, flexibility, cost

   Physical bottom: a network device (Cable, Switch, Adapter) Bandwidth: bits transmitted per second. Latency: packing, unpacking, the time required for message transmission. Scalability: Interface adapter and the switch.

   Software Network topology: Network diameter, the node farthest distance (worst case), a node cut stability of the system. Each node fan-in and fan-out.

   Application: MPI communication model and protocol.

           

 

                        

 

 Because Hypercube takes too much of the cost, at present, the most widely used is still Mesh form.

 

 

 Dragonfly Topology: partially connected denser, similar dragonfly wings, dense lines, preferably local performance, by a single line contact with the global.

 InfiniBand: currently the best high throughout low latency data transmission technology.

  

 

 InfiniBand比Ethernet快的原因:二者prototcal 十分不一样,Ethernet传资料时,直接向channel扔,而channel只有一个,易发生冲突,冲突后将资料扔掉,浪费了资源,随着冲突增加,性能降低。而InfiniBand首先是开辟一个空间, 再传资料,没有冲突发生,因此效率会高。同时,可以绕过CPU/OS将资料一次写入远端,节省了中间很多的复制工作。

二者比较图如下

                      

 

 

 

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/fourmi/p/11922666.html