Spark SQL 笔记(3)——Spark 环境搭建

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u012292754/article/details/83614821

1 local 模式

直接运行即可

2 Standalone 模式

和 Hadoop/HDFS 的架构类似
/home/hadoop/apps/spark-2.1.3-bin-2.6.0-cdh5.7.0/conf

2.1 spark-env.sh


SPARK_MASTER_HOST=node1
SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=1g
SPARK_WORKER_INSTANCES=1

2.2 启动

[hadoop@node1 ~]$ /home/hadoop/apps/spark-2.1.3-bin-2.6.0-cdh5.7.0/sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/apps/spark-2.1.3-bin-2.6.0-cdh5.7.0/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node1.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/apps/spark-2.1.3-bin-2.6.0-cdh5.7.0/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node1.out
[hadoop@node1 ~]$ jps
1442 Master
1596 Jps
1534 Worker
[hadoop@node1 ~]$ 

[hadoop@node1 ~]$ cat /home/hadoop/apps/spark-2.1.3-bin-2.6.0-cdh5.7.0/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node1.out
Spark Command: /usr/apps/jdk1.8.0_181-amd64/bin/java -cp /home/hadoop/apps/spark-2.1.3-bin-2.6.0-cdh5.7.0/conf/:/home/hadoop/apps/spark-2.1.3-bin-2.6.0-cdh5.7.0/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host node1 --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/11/01 09:09:30 INFO Master: Started daemon with process name: 1442@node1
18/11/01 09:09:30 INFO SignalUtils: Registered signal handler for TERM
18/11/01 09:09:30 INFO SignalUtils: Registered signal handler for HUP
18/11/01 09:09:30 INFO SignalUtils: Registered signal handler for INT
18/11/01 09:09:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/11/01 09:09:31 INFO SecurityManager: Changing view acls to: hadoop
18/11/01 09:09:31 INFO SecurityManager: Changing modify acls to: hadoop
18/11/01 09:09:31 INFO SecurityManager: Changing view acls groups to: 
18/11/01 09:09:31 INFO SecurityManager: Changing modify acls groups to: 
18/11/01 09:09:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
18/11/01 09:09:31 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
18/11/01 09:09:32 INFO Master: Starting Spark master at spark://node1:7077
18/11/01 09:09:32 INFO Master: Running Spark version 2.1.3
18/11/01 09:09:32 INFO Utils: Successfully started service 'MasterUI' on port 8080.
18/11/01 09:09:32 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://192.168.30.131:8080
18/11/01 09:09:32 INFO Utils: Successfully started service on port 6066.
18/11/01 09:09:32 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066
18/11/01 09:09:33 INFO Master: I have been elected leader! New state: ALIVE
18/11/01 09:09:36 INFO Master: Registering worker 192.168.30.131:32865 with 1 cores, 1024.0 MB RAM

浏览器访问 http://node1:8080/

3 wordcount

3.1 启动spark shell

[hadoop@node1 spark-2.1.3-bin-2.6.0-cdh5.7.0]$ ./bin/spark-shell --master spark://node1:7077
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/11/01 09:36:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/11/01 09:36:38 WARN SparkConf: 
SPARK_WORKER_INSTANCES was detected (set to '1').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --num-executors to specify the number of executors
 - Or set SPARK_EXECUTOR_INSTANCES
 - spark.executor.instances to configure the number of instances in the spark config.
        
Spark context Web UI available at http://192.168.30.131:4040
Spark context available as 'sc' (master = spark://node1:7077, app id = app-20181101093639-0000).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.3
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

scala> val file = sc.textFile("hdfs://node1:8020/words.txt")
file: org.apache.spark.rdd.RDD[String] = hdfs://node1:8020/words.txt MapPartitionsRDD[5] at textFile at <console>:24

scala> val res = file.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
res: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[8] at reduceByKey at <console>:26

scala> res.collect
res0: Array[(String, Int)] = Array((tom,3), (hello,2), (world,1), (jack,2), (mary,2))

猜你喜欢

转载自blog.csdn.net/u012292754/article/details/83614821