Parallelized Large Matrix Multiplication Based on MapReduce

Parallelized large matrix multiplication is one of the earlier basic algorithms implemented based on the MapReduce programming model. It was first proposed by Google to solve a large number of matrix multiplications contained in PageRank. Today we will learn about parallelized large matrix multiplication based on MapReduce.

We assume that there are two matrices M and N, where the number of columns of M is equal to the number of rows of N, then record the product of M and N P = M . N. where M ij represents the element in the i-th row and the j-th column in the matrix M, and Njk represents the element in the j-th row and the K-th column in the matrix N, then the elements in the matrix P can be obtained by the following formula:

                                                       

That is, Pik is the i-th row element of M and the k-th column element of N correspondingly multiplied and then added. From the above formula, we know that (i, k) ultimately determines the position of Pik. We can use (i, k) as the key of the Reduce output and Pik as the value of the output. To find Pik, we must know Mij and Njk. For Mij, the attributes we need to know are that the matrix to which it belongs is M, the row number is i, the column number is j, and the value of Mij itself. For Njk, the attributes we need to know are that the matrix to which it belongs is N, the row number is j, the column number is k, and the value of Njk itself. The attributes of Mij and Njk are processed by the Mapper class.

Map function : For each element Mij of the M matrix, generate a series of key-value pairs <(i,k),(M,j,Mij)>, where K=1, 2, to the number of columns of N. For each element Njk in the N matrix, generate a series of key-value pairs <(i,k),(N,j,Njk)>, where i=1, 2, to the number of rows of M.

Reduce function: For the same key (i, k), there are many values ​​(M, j, Mij), (N, j, Njk), multiply Mij and Njk with the same j value, and then process different j values The results are added together to get the value of Pik.

Let us take a specific matrix as an example to explain.

                                                       

We store the M matrix in the M.txt file, one line of the file is one element, and the content format is "the row where the element is located, the column where the element is located element value". The content of M.txt is as follows.

                                                          

We store the N matrix in N.txt, and the content of N.txt is as follows.

                                                           

Map function output: After processing by the map function, a series of key-value pairs in the form of <(i,k),(M,j,Mij)> are generated, as follows.


Reduce function output: For the same (i, k) keys, multiply and then add according to the j value. The process is as follows.

          

So the final product matrix is ​​P = [2,5,11].

The MapReduce program to parallelize large matrix multiplication is as follows:

package Matrix;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

/**
 *parameters : rowM,columnM,columnN,InputPaths,OutputPath
 * @author liuchen
 *
 */

public class MatrixMain {
	public static void main(String[] args)throws Exception {
		//create job = map + reduce
		Configuration conf = new Configuration();
		
		//Setting global share parameters
		conf.set("rowM", args[0]);
		conf.set("columnM", args[1]);
		conf.set("columnN", args[2]);
		
		//create Job
		Job job = Job.getInstance(conf);
		
		//the entry of job
		job.setJarByClass(MatrixMain.class);
		
		//the mapper of job
		job.setMapperClass(MatrixMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(Text.class);
		
		//the reducer of job
		job.setReducerClass(MatrixReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(Text.class);
		
		//input and output
		TextInputFormat.setInputPaths(job, new Path(args[3]));
		TextOutputFormat.setOutputPath(job, new Path(args[4]));
		
		//submit job
		job.waitForCompletion(true);
	}

}

package Matrix;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;

/**
 * Matrix multiplication Mapper
 * @author liuchen
 *
 */

public class MatrixMapper extends Mapper<Object, Text , Text , Text> {
	
	private static int columnN = 0;
	private static int rowM = 0;
	
	private Text map_key = new Text();
	private Text map_value = new Text();
	
	/**
	 *   Before executing the map function, get the necessary parameters
	 */
	protected void setup(Context context)throws IOException, InterruptedException {
		Configuration conf = context.getConfiguration();
		columnN = Integer.parseInt(conf.get("columnN"));
	    rowM = Integer.parseInt(conf.get("rowM"));	
	}

	protected void map(Object key, Text value,Context context)throws IOException, InterruptedException {
		//Through filename differentiation matrix
		FileSplit fileSplit = (FileSplit)context.getInputSplit();
		String fileName = fileSplit.getPath().getName();
		if(fileName.contains("M")){    //M Matrix
			String[] arr1 = value.toString().split(",");
			int i = Integer.parseInt(arr1[0]);
			String[] arr2 = arr1[1].split("\t");
			int j = Integer.parseInt(arr2[0]);
			int Me = Integer.parseInt(arr2[1]);
			for(int k = 1;k <= columnN;k++){
				map_key.set(i + "," + k);
				map_value.set("M," + j + "," + Mij);
				context.write(map_key, map_value);
			}
			
		}
		else if (fileName.contains("N")){   //N Matrix
			String[] arr1 = value.toString().split(",");
			int j = Integer.parseInt(arr1[0]);
			String[] arr2 = arr1[1].split("\t");
			int k = Integer.parseInt(arr2[0]);
			int Njk = Integer.parseInt(arr2[1]);
			
			for(int i = 1;i<= rowM;i++){
				map_key.set(i + "," + k);
				map_value.set("N," + j +"," + Njk);
				context.write(map_key, map_value);
			}	
		}
	}

	
	
	

}

package Matrix;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MatrixReducer extends Reducer<Text, Text, Text, Text>{
	private static int columnM = 0;
	
	protected void setup(Context context)throws IOException, InterruptedException {
		Configuration conf = context.getConfiguration();
		columnM = Integer.parseInt(conf.get("columnM"));
	}
	
	protected void reduce(Text key, Iterable<Text> values,Context context)throws IOException, InterruptedException {
		int[] M = new int[columnM + 1];   //Index is 0 Empty
		int[] N = new int[columnM + 1];
		int sum = 0;
		for(Text value : values){
			String[] arr1 = value.toString().split(",");
			if(arr1[0].contains("M")){
				M[Integer.parseInt(arr1[1])] = Integer.parseInt(arr1[2]);
			}
			else if (arr1[0].contains("N")){
				N[Integer.parseInt(arr1[1])] = Integer.parseInt(arr1[2]);
			}
		}
		
		for(int j = 1;j<columnM + 1;j++){
			sum += M[j] * N[j];
		}
		context.write(key, new Text(Integer.toString(sum)));
	}

	
	
	

}

For more dry goods, please pay attention to the WeChat public account: Dream Chasing Programmer.

                                                                     




Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325813300&siteId=291194637