MPI Parallel Optimization of Gaussian Naive Bayes Algorithm - Inferring Gender through Height, Weight and Lung Capacity (Machine Learning)

Table of contents

MPI Parallel Design

Using MPI_Scatterv to scatter chunks of data of different sizes

Introducing the MPI_Scatterv function

MPI_Scatterv parameter selection

Generate Sendcounts array

Generate Displs array

MPI functions cooperate to complete parallel computing

MPI parallel program

result


This project consists of the following four parts, this section is Section 3:

1. Deduce gender by height and weight

2. Estimate gender based on height, weight, and vital capacity

3. MPI optimization

4. OpenMP optimization

MPI Parallel Design

Use serial when reading data, and use MPI parallel for the rest of the for loop. The specific steps are as follows:
1. Process 0 reads the data file and generates a dataSet array.
2. Divide the dataSet array into multiple data blocks of different lengths through the MPI_Scatterv() function, and scatter the data blocks to each process.
3. Each process is summed, and the result is returned to 0 process accumulation through the MPI_Gather() function to obtain the average value.
4. Process 0 broadcasts the average value (r) to all processes through the MPI_Bcast() function.
5. Part of the content of calculating the standard deviation of each process: .
6. The remaining standard deviation is calculated after the 0 process is completed to obtain the standard deviation SD.
7. Obtain the probability density function f(x)=[1/ sqrt(2*π*δ²)]*exp(-(x-μ)²/2δ²) of Gaussian distribution 8. Obtain P(height=170|female
) , P(height=170|male), P(weight=60|female), P(weight=60|male) and other probabilities.
9. Multiply all the corresponding probabilities, and take the feature corresponding to the maximum value as the prediction result.

 

Using MPI_Scatterv to scatter chunks of data of different sizes

Introducing the MPI_Scatterv function

This experiment uses MPI_Scatterv to scatter and send and receive data blocks. This method can not only solve the problem of large memory overhead for broadcasting, but also send and receive data blocks of different sizes. The MPI_Scatterv function can be found in mpich_MPI_Scatterv as follows: int MPI_Scatterv(const void *sendbuf, const int *sendcounts, const int *displs,

                 MPI_Datatype sendtype, void *recvbuf, int recvcount,

                 MPI_Datatype recvtype, int root, MPI_Comm comm)

Input parameters inside the function:

Sendbuf: array pointer, the address of the array sent by scattering.

Sendcounts: Integer array, each element in the array corresponds to the length of data sent by each process.

Displs: an integer array, each element in the array corresponds to the starting position of each process sending and receiving data from sendbuf.

Sendtype: MPI data type, the data type of the sent data.

Recvbuf: an array of integers, an array that receives chunked data.

Recvcount: integer, corresponding to Sendcounts, indicating the length of data accepted by the process.

Recvtype: MPI data type, the data type of the received data.

Root: integer, the process of sending data, the process of filling scattered data, generally fill in 0.

Comm: Communicator, usually fill in MPI_COMM_WORLD.

MPI_Scatterv parameter selection

Generate Sendcounts array

The Sendcounts array is used to save the length of the data received by each process. The length of the array is equal to the number of processes. Since the array pointer is passed in MPI_Scatterv(), the memory address of each number in the array must be continuous, which can be declared directly or Arrays can be declared using the malloc() function.
The length of data assigned to each process can be different. In order to facilitate the creation of the Displs array later in this project, this method is to let all processes receive (dataLen/comm_sz) data, and the last process receives more (dataLen%comm_sz) ) data.
code show as below:

int *Sendcounts; //对每个进程分发的数据长度
Sendcounts = (int *) malloc(comm_sz * sizeof(int));//分配内存
for(i=0;i<comm_sz;i++){
    if(i==comm_sz-1){
Sendcounts[i]=(int) (dataLen/comm_sz+(dataLen%comm_sz))*EIGEN_NUM;}
    else{Sendcounts[i]=(int) (dataLen/comm_sz)*EIGEN_NUM;}
}

Generate Displs array

The Displs array is used to save the memory offset of the data received by each process. The length of the array is equal to the number of processes. Since the array pointer is passed in MPI_Scatterv(), the memory address of each number in the array must be continuous, which can be used directly statement, you can also use the malloc() function to declare an array.
The data assigned to each process is different from the same array. The first process receives the data at dataSet[0] - dataSet[Sendcounts[0]-1], and the second process receives dataSet [Sendcounts[0]] - data at position dataSet[Sendcounts[1]-1]. The initial position received by the first process is 0, the initial position received by the second is Sendcounts[0], and the initial position received by the nth array is displs[n-1]+Sendcounts[n-1], according to this rule , we can use the following code to generate an array of memory offsets:

int *displs;     //相对于dataSet的内存偏移量
    displs = (int *) malloc(comm_sz * sizeof(int)); //分配内存
    displs[0]=0;
    for(i=1;i<comm_sz;i++)
    {
        displs[i]=displs[i-1]+Sendcounts[i-1];
        //printf("displs[i]=%d",displs[i]);
        //printf("分发给进程%d的内存偏移量:%d\n",i,displs[i]);
    }

MPI functions cooperate to complete parallel computing

After using the MPI_Scatterv() function to divide the array into multiple sub-arrays named receiveBuf, each process calculates the sum of the receiveBuf array in a for loop, the sendSum array receives the calculation results of the process, and sends all the processes to the sendSum[feature number] array through the MPI_Gather() function The process of returning 0 becomes reciveSum[comm_sz*feature number], and the array is accumulated and divided by the corresponding total to obtain the average value (r).
The 0 process broadcasts the average value (r) to all processes through the MPI_Bcast() function, each process calculates, and returns the result to the 0 process through the MPI_Gather() function. After accumulating the returned data, it is equivalent to completing the serial Sigma=,
standardDeviation =sqrt(Sigma/sexNum).
The remaining steps are similar to 2.2.1 and will not be elaborated here. So far, all the MPI calculation processes are completed.
In order to use MPI as much as possible, MPI can also be used in the subsequent data model evaluation. The MPI_Scatterv() function is also used to distribute the verification data set to all processes. Each process finds the number of correct and incorrect ones, and returns the result to the 0 process. , 0 process to get the accuracy rate.

MPI parallel program

#include <iostream>
#include <vector>
#include <cstdlib>
#include <time.h>
#include <mpi.h>
#include <cassert>
#include <cstring>
#include <cmath>

#define PI 3.1415926535898

//单条数据的长度
#define MAX_LINE 20
//数据集的长度(从1开始计算)
#define DATA_LEN 11000000

#define EIGEN_NUM 4

//float dataSet[DATA_LEN * EIGEN_NUM];	//数据集
float (*dataSet)=(float(*))malloc(sizeof(float)*DATA_LEN*EIGEN_NUM);  

int dataLen;//数据集的行数
double maleNum=0;//男性总数
double femaleNum=0;//女性总数

int main(int argc, char** argv) {


	int i=0;
	int j=0;

	int my_rank;       //当前进程id
    int comm_sz;       //进程的数目

    MPI_Init(&argc, &argv); 
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
    MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);

	double start, end,readTime;
	MPI_Barrier(MPI_COMM_WORLD); /* IMPORTANT */
	start = MPI_Wtime();

/************************进程0读取文件************************/
	if(my_rank==0)
	{
		char buf[MAX_LINE];		//缓冲区
		FILE *fp;				//文件指针s
		int len;				//行字符个数

		//读取文件
		const char* fileLocation="E:\\test\\addVitalCapacityData.csv";
		fp = fopen(fileLocation,"r");
		if(fp  == NULL)
		{
			perror("fp  == NULL");
			exit (1) ;
		}

		//逐行读取及写入数组
		char *token;
		const char s[2] = ",";
		while(fgets(buf,MAX_LINE,fp) != NULL && i< DATA_LEN)
		{
			len = strlen(buf);
			//删去换行符
			buf[len-1] = '\0';
			//分割字符串
			token = strtok(buf, s);
			//继续分割字符串
			j = 0;
			while( token != NULL ) 
			{
				dataSet[i*EIGEN_NUM + j]=atof(token);
				token = strtok(NULL, s);
				j = j+1;
			 }
			i = i + 1;
		}
		dataLen=i;
		printf("%d行4列的数据读取完毕\n",dataLen);
		fclose(fp);

		//计算男女个数
		for(i=0;i<dataLen;i++){
			if(dataSet[i*4]==1){maleNum=maleNum+1;}
			if(dataSet[i*4]==2){femaleNum=femaleNum+1;}
		}

		readTime = MPI_Wtime();
		//printf("Read data time = %f", readTime-start);
	}


	MPI_Bcast(&dataLen,1,MPI_INT,0,MPI_COMM_WORLD);


/************************并行计算************************/

/***********计算高斯分布***********/

	/*向个进程散射分发数组*/
	int *Sendcounts; //对每个进程分发的数据长度
	Sendcounts = (int *) malloc(comm_sz * sizeof(int));//分配内存

	for(i=0;i<comm_sz;i++)
	{
		if(i==comm_sz-1){Sendcounts[i]=(int) (dataLen/comm_sz+(dataLen%comm_sz))*EIGEN_NUM;}
		else{Sendcounts[i]=(int) (dataLen/comm_sz)*EIGEN_NUM;}
		//printf("进程%d分发到的数据长度:%d\n",i,Sendcounts[i]);
	}

	int receiveDataNum; //接收的数据长度
	receiveDataNum=Sendcounts[my_rank];

	int *displs;	 //相对于dataSet的内存偏移量
	displs = (int *) malloc(comm_sz * sizeof(int)); //分配内存
	displs[0]=0;
	for(i=1;i<comm_sz;i++)
	{
		displs[i]=displs[i-1]+Sendcounts[i-1];
		//printf("displs[i]=%d",displs[i]);
		//printf("分发给进程%d的内存偏移量:%d\n",i,displs[i]);
	}


	//用来保存所接收到的数组
	float (*receiveBuf)= (float*) malloc((receiveDataNum) * sizeof(float)); 

	//printf("my_rank=%d,Sendcounts=%d,displs=%d,receiveDataNum=%d \n",my_rank,Sendcounts[my_rank],displs[my_rank],receiveDataNum);
	MPI_Scatterv(dataSet,Sendcounts,displs,MPI_FLOAT,receiveBuf,receiveDataNum,MPI_FLOAT,0,MPI_COMM_WORLD);
	
/****求和****/
	char *maenInf[6]={"maleLength","maleWeight","maleVC","femaleLength","femaleWeight","femaleVC"};
	//声明求和函数
	double getSum(float *data,int datalen,int sex,int column);
	//男性身高、体重、肺活量
	double maleLength=getSum(receiveBuf,receiveDataNum,1,1);
	double maleWeight=getSum(receiveBuf,receiveDataNum,1,2);
	double maleVC=getSum(receiveBuf,receiveDataNum,1,3);
	//女性身高、体重、肺活量
	double femaleLength=getSum(receiveBuf,receiveDataNum,2,1);
	double femaleWeight=getSum(receiveBuf,receiveDataNum,2,2);
	double femaleVC=getSum(receiveBuf,receiveDataNum,2,3);

	//printf("p-maleLength=%f p-maleWeight=%f p-maleVC=%f\n",maleLength,maleWeight,maleVC);
	//printf("p-femaleLength=%f p-femaleWeight=%f p-femaleVC=%f\n\n",femaleLength,femaleWeight,femaleVC);

	double sendSum[]={maleLength,maleWeight,maleVC,femaleLength,femaleWeight,femaleVC};//每个进程所计算出的和
	double *reciveSum=(double*) malloc((6*comm_sz) * sizeof(double)); //传给进程0的数组

/****求平均值****/
	double mean[6]={0,0,0,0,0,0};
	if(my_rank==0){
		MPI_Gather(sendSum,6,MPI_DOUBLE,reciveSum,6,MPI_DOUBLE,0,MPI_COMM_WORLD);

		for(i=0;i<comm_sz;i++){
			for(j=0;j<6;j++){
				mean[j]=mean[j]+reciveSum[i*6+j];
			}
		}
		for(i=0;i<6;i++){
			if(i<3){mean[i]=mean[i]/maleNum;}
			if(i>=3){mean[i]=mean[i]/femaleNum;}
		}
		//打印平均值的最终结果
		for(i=0;i<6;i++){
			//printf("mean-%s=%.16f\n",maenInf[i],mean[i]);
			if(i==5){printf("\n");}
		}
	}
	else{
		MPI_Gather(sendSum,6,MPI_DOUBLE,reciveSum,6,MPI_DOUBLE,0,MPI_COMM_WORLD);
	}

	//把平均值广播到所有进程
	MPI_Bcast(&mean,6,MPI_DOUBLE,0,MPI_COMM_WORLD);

	//打印每个进程获得的平均值
	/*for(i=0;i<6;i++)
	{
		printf("my_rank=%d mean[%d]=%f\n",my_rank,i,mean[i]);
	}*/

/****求标准差****/
	double sendSigma[6]={0,0,0,0,0,0}; //每个进程上的局部累加
	double *reciveSigma=(double*) malloc((6*comm_sz) * sizeof(double)); //传给进程0的数组
	//声明求累加函数
	double getSigma(float *data,int datalen,double mean,int sex,int column);
	i=0;
	for(int s=1;s<=2;s++){
		for(int j=1;j<=3;j++){
			sendSigma[i]=getSigma(receiveBuf,receiveDataNum,mean[i],s,j);
			//printf("sendSigma[%d]=%f\n",i,sendSigma[i]);
			i=i+1;
		}
	}
	
	double standardDeviation[6];	//标准差
	if(my_rank==0)
	{
		MPI_Gather(sendSigma,6,MPI_DOUBLE,reciveSigma,6,MPI_DOUBLE,0,MPI_COMM_WORLD);
		double Sigma[6]={0,0,0,0,0,0};	//累加
		for(i=0;i<comm_sz;i++){
			for(j=0;j<6;j++){
				Sigma[j]=Sigma[j]+reciveSigma[i*6+j];
			}
		}
		double sexNum;
		for(i=0;i<6;i++){
			if(i<3)
			{sexNum=maleNum;}
			if(i>=3)
			{sexNum=femaleNum;}
			standardDeviation[i]=sqrt(Sigma[i]/sexNum);
			//printf("Sigma[%d]=%f maleNum=%f",i,Sigma[i],sexNum);
			//printf("第%d个标准差=%f\n",i,standardDeviation[i]);
		}
	}
	else{
		MPI_Gather(sendSigma,6,MPI_DOUBLE,reciveSigma,6,MPI_DOUBLE,0,MPI_COMM_WORLD);
	}

	MPI_Bcast(&standardDeviation,6,MPI_DOUBLE,0,MPI_COMM_WORLD);

	//打印每个进程获得的标准差
	/*for(i=0;i<6;i++)
	{
		printf("my_rank=%d standardDeviation[%d]=%f\n",my_rank,i,standardDeviation[i]);
		if(i==5){printf("\n");}
	}*/


/*********** 朴素贝叶斯 & 准确率测试 ***********/
	//数据集有肺活量(VC),准确度判断
	float preSexID;
	float right=0;
	float error=0;
	//声明性别ID判断函数
	int sexIDResult(float height,float weight,float VC,double *mean,double *standardDeviation);
	for(int i=0;i<receiveDataNum/EIGEN_NUM;i++){
		preSexID=sexIDResult(receiveBuf[i*EIGEN_NUM+1],receiveBuf[i*EIGEN_NUM+2],receiveBuf[i*EIGEN_NUM+3],mean,standardDeviation);
		if(receiveBuf[i*EIGEN_NUM]==preSexID){right=right+1;}
		else{
			//printf("预测ID:%.0f  实际ID:%.0f \n",preSexID,receiveBuf[i*EIGEN_NUM]);
			//printf("性别:%.0f,身高:%.2f,体重:%.2f,肺活量:%.0f \n",receiveBuf[i*EIGEN_NUM],receiveBuf[i*EIGEN_NUM+1],receiveBuf[i*EIGEN_NUM+2],receiveBuf[i*EIGEN_NUM+3]);
			error=error+1;}
	}
	//printf("Right:%f\nError:%f\n",right,error);

	float sendRuslt[2]={right,error};
	/*for(i=0;i<comm_sz*2;i++)
		{
			printf("sendRuslt[%d]=%f\n",i,sendRuslt[i]);
		}*/
	float *reciveRuslt=(float*) malloc((2*comm_sz) * sizeof(float)); //传给进程0的数组
	if (my_rank==0)
	{
		MPI_Gather(sendRuslt,2,MPI_FLOAT,reciveRuslt,2,MPI_FLOAT,0,MPI_COMM_WORLD);

		float lastResult[2]={0,0};
		float right;
		float error;
		for(i=0;i<comm_sz;i++){
			lastResult[0]=lastResult[0]+reciveRuslt[2*i];
			lastResult[1]=lastResult[1]+reciveRuslt[2*i+1];
		}
		double accuracy  = lastResult[0]/(lastResult[0]+lastResult[1]);
		printf("Accuracy:%f\n",accuracy);
			
	}
	else{
		MPI_Gather(sendRuslt,2,MPI_FLOAT,reciveRuslt,2,MPI_FLOAT,0,MPI_COMM_WORLD);
	}

	MPI_Barrier(MPI_COMM_WORLD); /* IMPORTANT */
	end = MPI_Wtime();

	MPI_Finalize();

	if (my_rank == 0) { /* use time on master node */
		printf("Read data time = %f\n", readTime-start);
		printf("Calculate time = %f\n",end-readTime);
		printf("Run time = %f\n", end-start);
	}
}




/*****************函数*****************/

/***********高斯分布函数***********/
//求和
double getSum(float *data,int recDatalen,int sex,int column)
{
	double Sum=0;
	for(int i=0;i<(recDatalen/EIGEN_NUM);i++)
	{
		if(data[i*EIGEN_NUM]==sex){
			Sum=Sum+data[i*EIGEN_NUM+column];
		}
	}
	return Sum;
}

//求pow((data[i]-mean),2)的累加
double getSigma(float *data,int recDatalen,double mean,int sex,int column){
	double Sigma=0;
	for(int i=0;i<(recDatalen/EIGEN_NUM);i++){
		if(data[i*EIGEN_NUM]==sex){
			Sigma=Sigma+pow(data[i*EIGEN_NUM+column]-mean , 2 );
			//printf("sex=%d data[i]=%f mean=%f \n",sex,data[i*EIGEN_NUM+column],mean);
		}
	}
	return Sigma;
}

/***********朴素贝叶斯函数***********/

//计算概率p(特征列column = x | 性别)
double getProbability(double x,int column,int sex,double mean,double standardDeviation)
{
	double Probability;	//计算出的概率
	double u = mean;
	double p = standardDeviation;

	//高数分布概率密度函数 x:预测变量 u:样本平均值 p:标准差
	p=pow(p,2);
	Probability = (1 / (2*PI*p)) * exp( -pow((x-u),2) / (2*p) );


	//printf("p(%s=%lf|性别=%s)=%.16lf\n",basicInfo[column],x,gender,Probability);

	return Probability;
}


//返回性别ID结果
int sexIDResult(float height,float weight,float VC,double *mean,double *standardDeviation)
{
	double maleP;//男性概率
	double femaleP;//女性概率
	double a=0.5; //男女比例各50%

	maleP = a * getProbability(height,1,1,mean[0],standardDeviation[0]) * getProbability(weight,2,1,mean[1],standardDeviation[1]) 
		* getProbability(VC,3,1,mean[2],standardDeviation[2]);

	femaleP = a * getProbability(height,1,2,mean[3],standardDeviation[3]) * getProbability(weight,2,2,mean[4],standardDeviation[4]) 
		* getProbability(VC,3,2,mean[5],standardDeviation[5]);

	if(maleP > femaleP){return 1;}
	if(maleP < femaleP){return 2;}
	if(maleP == femaleP){return 0;}
}

result

The local running results are shown in the figure below. The calculation time is changed from 6.3 seconds to 2.5 seconds. This optimization ratio is quite obvious:

The result of server operation is shown in the figure below. The calculation time is changed from 7.8 seconds to 3.9 seconds. This optimization ratio is quite obvious:

Code: https://download.csdn.net/download/admiz/16162449

 

 

Guess you like

Origin blog.csdn.net/admiz/article/details/109828277