Chapter One:
Four .Local mode
Run only
Local [k] representatives of several threads running
Local [*] represents the full run
Five .spark use
1.bin / spark-submit parameters can be used to submit the job
Parameters are as follows
Master --master address specified, the default is Local --class: start classes in your application (such as org.apache.spark.examples.SparkPi) --deploy- the MODE : whether to publish your drive to the worker node (cluster) or as a local client (client) (default: client) * --conf: Spark arbitrary configuration attributes, the format Key = value if the value has spaces, quotes "Key =. value" file application JAR-: prepackaged application JAR , contains dependencies. this URL globally visible in the cluster. For example hdfs: // shared storage system, if File: // path , then the path to all nodes contain the same JAR file application-arguments: passed to main () method parameter --executor-memory 1G specify each executor memory available. 1G --total-cores executor- 2 executor used to specify each of two cup audit
Do the following
bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --executor-memory 1G \ --total-executor-cores 2 \ ./examples/jars/spark-examples_2.11-2.1.1.jar \ 100
2.bin / spark-shell, enter the command line environment, the default will create a lot of good things, such as the sc
jsp java command to view the running program
spark-shell prompt, URL, such as hadoop102: 4040, see the web version of the program is to run the state, namely Spark Jobs
yarn application -list, see the application id
Six .WordCount program
1.load
2.flat
3.group
4. polymerization
5. Print