Spark2.x基于Intellij IDEA开发

一、先贴Word Count的程序;这里用的是java版本

public final class JavaWordCount {
    private static final Pattern SPACE = Pattern.compile(" ");

    public static void main(String[] args) throws Exception {

//        if (args.length < 1) {
//            System.err.println("Usage: JavaWordCount <file>");
//            System.exit(1);
//        }

        String filePath = "/test.txt";

        SparkSession spark = SparkSession
                .builder()
                .appName("JavaWordCount")
                .getOrCreate();

        JavaRDD<String> lines = spark.read().textFile(filePath).javaRDD();

        JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() {
            @Override
            public Iterator<String> call(String s) {
                return Arrays.asList(SPACE.split(s)).iterator();
            }
        });

        JavaPairRDD<String, Integer> ones = words.mapToPair(
                new PairFunction<String, String, Integer>() {
                    @Override
                    public Tuple2<String, Integer> call(String s) {
                        return new Tuple2<>(s, 1);
                    }
                });

        JavaPairRDD<String, Integer> counts = ones.reduceByKey(
                new Function2<Integer, Integer, Integer>() {
                    @Override
                    public Integer call(Integer i1, Integer i2) {
                        return i1 + i2;
                    }
                });

        List<Tuple2<String, Integer>> output = counts.collect();

        counts.saveAsTextFile("/testResult");

        for (Tuple2<?,?> tuple : output) {
            System.out.println(tuple._1() + ": " + tuple._2());
        }

        spark.stop();
    }
}

二、创建需要输出的jar包

    1、选择  File >> Artifacts >> +(加号) >> Jar >> From modules with dependencies

        选择Main Class 点 OK摁钮 进入当前jar包的配置菜单如图所示:

        

 

    在Output Layout选择的jar包中,删除Extracted 相关jar包引用只留下“wordCount” compile output



 

     点击Apply 和 OK按钮保存

     点击Build >> Build Artrifact >> Build

     在对应的output输出文件夹下面就应该找到对应的jar包文件了

三、拷贝到Spark的服务器上,进行运行测试

    

./spark-submit --class com.mm.JavaWordCount --master spark://localhost:7077 /usr/spark/spark-2.0.0-bin-hadoop2.6/wordCount.jar 

猜你喜欢

转载自liumangafei.iteye.com/blog/2324853