Local mode of Spark's three operating modes

1. Mode overview

Local mode is a mode that runs on a computer, and is usually used for training and testing on this machine. It can set up Master in the following ways.

(1) local: All calculations run in one thread, there is no parallel calculation, usually we execute some test codes on this machine, or practice hands, use this mode;

(2) local[K]: Specify several threads to run calculations. For example, local[4] is to run 4 Worker threads. Usually our Cpu has several Cores, just specify a few threads to maximize the computing power of the Cpu;

(3) local[*]: This mode directly helps you set the number of threads according to the maximum number of cores in the CPU.

Two, installation and use

(1) Upload and decompress the spark installation package

[atguigu@hadoop102 sorfware]$ tar -zxvf spark-2.1.1-binhadoop2.7.tgz -C /opt/module/

[atguigu@hadoop102 module]$ mv spark-2.1.1-bin-hadoop2.7 spark

(2) Official request for PI case

[atguigu@hadoop102 spark]$ bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--executor-memory 1G \
--total-executor-cores 2 \
./examples/jars/spark-examples_2.11-2.1.1.jar \
100

Basic syntax:

bin/spark-submit \
--class <main-class>
--master  <master-url>\
--deploy-mode  <deploy-mode>\
--conf <key>=<value> \
... # other options
 <application-jar>\
[application-arguments]

Parameter Description:

–Master specifies the address of the Master, the default is Local
–class: the startup class of your application (such as org.apache.spark.examples.SparkPi)
–deploy-mode: whether to publish your driver to the worker node (cluster) or as a Local client (client) (default: client)*
–conf: any Spark configuration attributes, format key=value. If the value contains spaces, you can add quotation marks "key=value"
application-jar: packaged application jar, including Depends. This URL is globally visible in the cluster. For example, hdfs:// shared storage system, if it is file:// path, then the path of all nodes contains the same jar
application-arguments: the parameter passed to the main() method
–executor-memory 1G specifies that each executor is available Memory is 1G
--total-executor-cores 2 Specify the number of cup cores used by each executor as 2

(3) Display of results
The algorithm uses Monte Carlo algorithm to find PI
Insert picture description here
(4) Prepare documents

[atguigu@hadoop102 spark]$ mkdir input

Create 3 files 1.txt and 2.txt under input, and enter the following

hello atguigu
hello spark

(5) Start spark-shell

[atguigu@hadoop102 spark]$ bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4jdefaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR,
use setLogLevel(newLevel).
18/09/29 08:50:52 WARN NativeCodeLoader: Unable to load nativehadoop library for your platform... using builtin-java classes
where applicable
18/09/29 08:50:58 WARN ObjectStore: Failed to get database
global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.9.102:4040
Spark context available as 'sc' (master = local[*], app id = local1538182253312).
Spark session available as 'spark'.
Welcome to
 ____ __
 / __/__ ___ _____/ /__
  _\ \/ _ \/ _ `/ __/ '_/
 /___/ .__/\_,_/_/ /_/\_\ version 2.1.1
 /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
1.8.0_144)
Type in expressions to have them evaluated.
Type :help for more information.
scala>

Open another CRD window

[atguigu@hadoop102 spark]$ jps
3627 SparkSubmit
4047 Jps

You can log in hadoop102:4040 to view the program running
Insert picture description here
(6) Run the WordCount program

scala>sc.textFile("input").flatMap(_.split("
")).map((_,1)).reduceByKey(_+_).collect
res0: Array[(String, Int)] = Array((hadoop,6), (oozie,3), (spark,3),
(hive,3), (atguigu,3), (hbase,6))
scala>

You can log in hadoop102:4040 to view the program running
Insert picture description here

Three, submission process

Insert picture description here
Among the important roles:

(1) Driver(驱动器)
Spark's driver is the process of executing the main method in the development program. It is responsible for the execution of code written by developers to create SparkContext, create RDD, and perform RDD conversion operations and action operations. If you are using the spark shell, when you start the Spark shell, a Spark driver program is automatically launched in the background of the system, which is a SparkContext object called sc preloaded in the Spark shell. If the driver program terminates, then the Spark application also ends. Mainly responsible for:

1) Turn the user program into a task
2) Track the running status of Executor
3) Schedule tasks for the executor node
4) UI display the running status of the application

(2) Executor(执行器)
Spark Executor is a worker process, responsible for running tasks in Spark jobs, and the tasks are independent of each other. When the Spark application starts, the Executor node is started at the same time, and it always exists with the entire Spark application life cycle. If an Executor node fails or crashes, the Spark application can continue to execute, and the task on the error node will be scheduled to other Executor nodes to continue running. Mainly responsible for:

1) Responsible for running the tasks that make up the Spark application and returning the results to the driver process;
2) Provides memory storage for RDDs that require caching in the user program through its own block manager (Block Manager). RDD is directly cached in the Executor process, so the task can make full use of the cached data to speed up operations at runtime.

Four, data flow

textFile("input"): Read the input folder data of the local file;

flatMap(_.split(" ")): Flattening operation, mapping a line of data into words according to the space separator;

map((_,1)): Operate on each element and map words to tuples;

reduceByKey(_+_): Aggregate and add values ​​according to key;

collect: Collect the data to the Driver side for display.

WordCount case study:
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_43520450/article/details/108579683