- the way that to build the RDD
(1) generate from the folder : offer the folder path which has been upload the hdfs
SparkSession spark = SparkSession
.builder()
.appName("JavaHdfsLR").master("local")//note:master better no set in here ,this field show //which model you will start the spark cluster
.getOrCreate();
JavaRDD<Sting> rdd=spark.read().textFile("filePath").javaRDD();
(2)generate from the List
JavaSparkContext jsc=new JavaSparkContext(spark.sparkContext());
JavaRDD<String> lines = jsc.parallelize(new ArrayList());
- JavaRDD->JavaRDD
一对一类型的转变可以用到这个函数:map(new Function(T,R))
JavaRDD.map(new Function(T,R)),输入T,输出R,在函数里面完成了转化.
另外还有一个函数: flatMap(FlatMapFunction