Big data study notes (7) - running spark script [original]

   After starting hadoop and spark, you can run the spark scripting environment on which you can run scala scripts.

1. cd $SPARK_HOME/bin
2. master=spark://master.hadoop.zjportdns.gov.cn ./spark-shell
Then you can run the script
   scala> val a = sc.parallelize(1 to 9, 3)
   a: org.apache.spark.rdd.RDD [Int] = ParallelCollectionRDD [0] to parallelize at <console>: 24
   scala> val b = a.map(x => x*2)
   b: org.apache.spark.rdd.RDD [Int] = MapPartitionsRDD [1] at map at <console>: 26
   scala> a.collect
   res0: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9)
   scala> b.collect
   res1: Array[Int] = Array(2, 4, 6, 8, 10, 12, 14, 16, 18)


3. You can analyze hdfs files


and then you can happily perform big data analysis.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326516160&siteId=291194637