Introduction to Parallel and Distributed Computing (6) Introduction to MPI

Section 6 Introduction to MPI

Tucao: Teacher Luo’s teaching logic is really confusing. It takes twice as long to read his PPT and half of the knowledge in the textbook (the total amount is too large, but the organizational form can be used for the effective value of the brain. Half).

6.1 Why MPI?

(Because of homework)

6.1.1 MPI given OpenMP

The goal of MPI is high performance, large-scale, and portability.

The OpenMP we introduced earlier is mainly a parallel system based on shared memory, suitable for parallel program development on multi-core CPUs. MPI is a tool for multi-host networking and collaboration (of course, it can also be relatively inefficiently used for multi-core computing on a single host).

Although the writing of MPI is relatively troublesome and the parallel efficiency is relatively low due to the coordination of inter-process communication, but because it can coordinate parallel computing between hosts, it is extremely scalable in parallel scale-from PC It can go to a supercomputer.

All in all, OpenMP is designed for multi-threaded computing on a single host (only works on a single host), while MPI is designed for multi-host collaboration.

6.1.2 Why you should understand MPI

MPI is the most important MP library standard. While it may be more convenient to have some of the more modern library, learning MPI still help make you a better understanding of collective communication ( group communication concept) of

6.2 MPI basic explanation

Let's have a HelloWorld first

#include <mpi.h>
int main(int argc, char *argv[])
{
    int npes, myrank;
    MPI_Init(&argc, &argv);//初始化MPI
    MPI_Comm_size(MPI_COMM_WORLD, &npes);//npes获得通信域中的进程总数
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);//myrank获得本进程的序号
    printf("From process %d out of %d, Hello World!\n",myrank, npes);
    MPI_Finalize();//结束MPI
    return 0;
}

Unlike OpenMP, MPI does not disassemble the task of a certain code block, but is a tool for the programmer to direct the execution of different processors after the task is split. All processors will execute the same code, and it is the process number that distinguishes the specific task form.

  • For example, when the sieve method finds prime numbers, the array of length 1000 is equally divided into 10 segments, and the kth process processes the kth segment, that is, the subscript is from (k − 1) ∗ 100 + 1 (k-1)*100+ 1(k1)100+1 tok ∗ 100 k*100k1 0 0 part

6.2.1 Header file

#include<mpi.h>

6.2.2 Basic library functions

int MPI_Init(int*argc,char***argv);//初始化MPI环境,必须出现在其他MPI函数之前
int MPI_Finalize();//结束MPI环境(执行各种clean-up tasks)
int MPI_Comm_size(MPI_COMM comm, int *size);//size对应位置获得通信域中的进程总数
int MPI_Comm_rank(MPI_COMM comm, int *rank);//rank对应位置获得当前进程的序号

(The so-called communication domain is a collection of processors that can transmit messages to each other)

6.2.3 Compile

mpicc -o name example.c
mpic++ -o name example.cpp

Compile the code in the example and save the executable program into the name file

6.2.4 Execution

Specify the number of processes

mpirun -np number name

Generate number processes to run name

Specify the host processor

Consider using --host and --hostfile when needed

Multiple Program Multiple Data(MPMD)

mpirun-np 1 master : -np 7 slave

In the above example, the cmd command expands a total of eight processes for two completely different execution programs. These execution programs will be in the same communication domain (that is, they can communicate with each other through MPI)

6.2.5 Send & Receive

Non-buffered blocking

For information transmission, the simplest (and the most secure) mode of sending and receiving is that sender and receiver can execute after determining each other's status.

The sender will send a send request, and the receiver will send a message that allows the request. Only when the two are connected, will the data be sent. In other words, whether it is a receiver or a sender,What is executed first must be executed after waitingof.

The so-called non-buffered blocking refers to

  • After the sender starts to execute, it must wait for the receiver's permission to send. After obtaining permission, the sender can return after completing the sending. The original process stalls at the sender during the period from the start of the sender to the return.
  • After the receiver starts to execute, only waiting for the sender to be ready to send the request, the receiver can return after completing the receiving. The original process stalls at the receiver during the period from the start of the receiver to the return.

Disadvantages: a lot of idling overhead will be generated (unless the two are executed exactly at the same time)

buffered blocking

There is also a sending and receiving mode: sender directly copies data to the buffer of the receiving process, and receiver directly copies data from the buffer of the process.

Cache (buffering) actually replaces idling overhead with buffer copying overhead.

  • When the receiving end does not have relevant communication hardware, the sender willinterruptThe process of the receiving end copies the data to the buffer of the receiving process
  • The receiving end must still wait for the sender to send something to continue

Non-blocking

Regardless of whether the non-blocking mode is safe or not, the subsequent statement will continue to be executed after the request is sent, and the subsequent statement will be executed after the request is sent. When the request and permission match, send and receive are formally executed

  • For the unbuffered non-blocking mode, the potential dangers are mainly in two aspects: the subsequent program at the sender mayChange the data to be sent before sending, Which leads to the fact that the officially sent data is actually wrong; subsequent procedures on the receiving end mayVisit before receivingData to be received.

Using non-blocking mode requires programmers to understand and master the content of the program well

Message Passing Libraries generally provide both blocking and non-blocking primitives

MPI's send mode

int MPI_Send(
    void *message,//被传送数据的起始地址
    int count,//被传送的数据项个数
    MPI_Datatype datatype,//被传送的数据类型
    int dest,//接收进程号
    int tag,//信息的标记(可以说明用途)
    MPI_Comm comm//通信域
);

Follow the blocking mode,Only send buffer can be usedWhen (for example, data has been safely sent to the receiving buffer or intermediate buffer) before returning. Generally speaking, the system will copy these messages to the system buffer.

Some other sending methods that can be considered (understand the relevant methods, and then consider the details when using them)

  • MPI_Send:
  • MPI_Bsend: return immediately
  • MPI_Ssend: return only after matching the receiver
  • MPI_Rsend: This sentence is used only when it is determined that the corresponding receiver must be on standby; in this special case, the sender will provide some information to the system, which can reduce some overhead
  • MPI_Isend: Comply with non-blocking mode, but cannot use the send buffer again immediately

MPI's receive mode

int MPI_Recv( 
    void *message,//要被存放的起始位置
    int count,//能接收的最大数目
    MPI_Datatype datatype,//接收的数据类型
    int source,//发送进程号
    int tag,//信息的标记(可以说明用途)
    MPI_Comm comm,//通信域
    MPI_Status *status//一条类型为MPI_Status的记录,Recv完成后,可以访问status获得函数相关的状态信息
);

In particular, the following items can be obtained from the status

  • status->MPI_source send process number
  • status->MPI_tag The tag of the received message
  • status->MPI_ERROR error condition
Why is there a status?
  • The source in MPI_Recv is generally set to the constant MPI_ANY_SOURCE to receive information from any source
  • The tag in MPI_Recv is generally set to the constant MPI_ANY_TAG to receive any tag information

In the case of not restricting the sending source and sending tag, it is necessary to check the status record (status) to determine the sending process or tag value of a specific information

In addition, there is a helper function

int MPI_Get_count(MPI_Status* status,MPI_Datatype datatype,int* count);

The auxiliary function can give the specific number of data items
received (the receiving function only gives the maximum capacity to the receiving end, so determining the specific number of data received is also a matter to be solved)

Guess you like

Origin blog.csdn.net/Kaiser_syndrom/article/details/105361466