1. cd $SPARK_HOME/bin
2. master=spark://master.hadoop.zjportdns.gov.cn ./spark-shell
Then you can run the script
scala> val a = sc.parallelize(1 to 9, 3) a: org.apache.spark.rdd.RDD [Int] = ParallelCollectionRDD [0] to parallelize at <console>: 24 scala> val b = a.map(x => x*2) b: org.apache.spark.rdd.RDD [Int] = MapPartitionsRDD [1] at map at <console>: 26 scala> a.collect res0: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9) scala> b.collect res1: Array[Int] = Array(2, 4, 6, 8, 10, 12, 14, 16, 18)
3. You can analyze hdfs files
and then you can happily perform big data analysis.