Hadoop实现多输入路径输入

1.多路径输入

1)FileInputFormat.addInputPath 多次调用加载不同路径

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

String in0 = args[0];

String in1 = args[1];

String out = args[2];

FileInputFormat.addInputPath(job,new Path(in0));

FileInputFormat.addInputPath(job,new Path(in1));

FileOutputFormat.setOutputPath(job,new Path(out));

2)FileInputFormat.addInputPaths一次调用加载 多路径字符串用逗号隔开

FileInputFormat.addInputPaths(job, "hdfs://localhost:9000/cs/path1,hdfs://localhost:9000/cs/path2");

2.多种输入

MultipleInputs可以加载不同路径的输入文件,并且每个路径可用不同的maper

MultipleInputs.addInputPath(job, new Path("hdfs://localhost:9000/cs/path1"), TextInputFormat.class,MultiTypeFileInput1Mapper.class);

MultipleInputs.addInputPath(job, new Path("hdfs://localhost:9000/cs/path3"), TextInputFormat.class,MultiTypeFileInput3Mapper.class);

转载自:https://zhidao.baidu.com/question/2138952993402851188.html

猜你喜欢

转载自blog.csdn.net/A_stranger/article/details/84849458