本章学习如何和spark使用命令行交互 - spark shell

一执行spark任务

1.1 执行第一个spark程序

/opt/module/spark/bin/spark-submit \

--class org.apache.spark.examples.SparkPi \

--master spark://linux102:7077 \

--executor-memory 1G \

--total-executor-cores 2 \

/opt/module/spark/examples/jars/spark-examples_2.11-2.1.1.jar \

100

参数说明：

--master spark://linux102:7077 \ 指定Master的地址

--executor-memory 1G 指定每个executor可用内存为1G

--total-executor-cores 2 指定每个executor使用的cup核数为2个

该算法是利用蒙特·卡罗算法求PI

1.2 使用spark - shell

1）集群模式

/opt/module/spark/bin/spark-shell \
--master spark://linux102:7077 \
--executor-memory 2g \
--total-executor-cores 2

注意：
如果启动spark shell时没有指定master地址，但是也可以正常启动spark shell和执行spark shell中的程序，其实是启动了spark的local模式，该模式仅在本机启动一个进程，没有与集群建立联系。

Spark Shell中已经默认将SparkContext类初始化为对象sc。用户代码如果需要用到，则直接应用sc即可

2)测试

我这里使用单词统计来做1个最简单的测试

新建hello.txt

hello world
linux linux
hadoop
spark
hello world
linux linux
hadoop
spark
hello world
linux linux
hadoop
spark

上传到linux集群

使用scala编写代码-只是展示

sc.textFile("hdfs://linux102:9001/hello.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect

结果展示

使用scala编写代码-结果保存到hdfs上

sc.textFile("hdfs://linux102:9001/hello.txt").flatMap(_.split(" ")).

map((_,1)).reduceByKey(_+_).saveAsTextFile("hdfs://linux102:9001/out")

查看结果

1 文件夹

2 结果

Spark入门到精通（入门）——第三节 Spark shell

一执行spark任务

1.1 执行第一个spark程序

1.2 使用spark - shell

1）集群模式

使用scala编写代码-只是展示

使用scala编写代码-结果保存到hdfs上

猜你喜欢

Spark入门到精通（入门）——第三节 Spark shell

一 执行spark任务

1.1 执行第一个spark程序

1.2 使用spark - shell

1）集群模式

使用scala编写代码-只是展示

使用scala编写代码-结果保存到hdfs上

猜你喜欢

一执行spark任务