Silicon Valley is still large data Spark-2019 version of the latest

Chapter One:

 

Four .Local mode

Run only

Local [k] representatives of several threads running

Local [*] represents the full run

 

Five .spark use

1.bin / spark-submit parameters can be used to submit the job

Parameters are as follows

Master --master address specified, the default is Local 
--class: start classes in your application (such as org.apache.spark.examples.SparkPi) 
--deploy- the MODE : whether to publish your drive to the worker node (cluster) or as a local client (client) (default: client) * 
--conf: Spark arbitrary configuration attributes, the format Key = value if the value has spaces, quotes "Key =. value" 
file application JAR-: prepackaged application JAR , contains dependencies. this URL globally visible in the cluster. For example hdfs: // shared storage system, if File: // path , then the path to all nodes contain the same JAR 
file application-arguments: passed to main () method parameter 
--executor-memory 1G specify each executor memory available. 1G 
--total-cores executor- 2 executor used to specify each of two cup audit

Do the following

bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--executor-memory 1G \
--total-executor-cores 2 \
./examples/jars/spark-examples_2.11-2.1.1.jar \
100

 

2.bin / spark-shell, enter the command line environment, the default will create a lot of good things, such as the sc

jsp java command to view the running program

spark-shell prompt, URL, such as hadoop102: 4040, see the web version of the program is to run the state, namely Spark Jobs

yarn application -list, see the application id

 

Six .WordCount program

1.load

2.flat

3.group

4. polymerization

5. Print

 

Guess you like

Origin www.cnblogs.com/cascle/p/12404836.html