Spark development environment setup
1. Install Spark
2. Word frequency statistics case
3. Scala development environment configuration
One, install Spark
1.1 Download and unzip
The official download address: http://spark.apache.org/downloads.html , select the Spark version and the corresponding Hadoop version before downloading:
Unzip the installation package:
# tar -zxvf spark-2.2.3-bin-hadoop2.6.tgz
1.2 Configure environment variables
# vim /etc/profile
Add environment variables:
export SPARK_HOME=/usr/app/spark-2.2.3-bin-hadoop2.6
export PATH=${SPARK_HOME}/bin:$PATH
Make the configured environment variables take effect immediately:
# source /etc/profile
1.3 Local mode
Local mode is the easiest way to run. It runs in a single-node multi-threaded mode, without deployment, out of the box, suitable for daily test development.
# 启动spark-shell
spark-shell --master local[2]
- local : Only start one worker thread;
- local[k] : start k worker threads;
- * local[ ]**: Start the same number of worker threads as the number of CPUs.
<br/>
After entering spark-shell, the program has automatically created the context SparkContext
, which is equivalent to executing the following Scala code:
val conf = new SparkConf().setAppName("Spark shell").setMaster("local[2]")
val sc = new SparkContext(conf)
2. Word frequency statistics case
After the installation is complete, you can make a simple example of word frequency statistics and feel the charm of spark. Prepare a file sample of word frequency statistics wc.txt
, the content is as follows:
hadoop,spark,hadoop
spark,flink,flink,spark
hadoop,hadoop
Execute the following Scala statement in the scala interactive command line:
val file = spark.sparkContext.textFile("file:///usr/app/wc.txt")
val wordCounts = file.flatMap(line => line.split(",")).map((word => (word, 1))).reduceByKey(_ + _)
wordCounts.collect
The execution process is as follows, you can see that the results of word frequency statistics have been output:
At the same time, you can view the execution of the job through the Web UI. The access port is 4040
:
Three, Scala development environment configuration
Spark is developed based on the Scala language, and provides APIs based on Scala, Java, and Python respectively. If you want to use the Scala language for development, you need to build a Scala language development environment.
3.1 Precondition
The operation of Scala relies on JDK, so you need to have the corresponding version of JDK installed on your machine. The latest Scala 2.12.x requires JDK 1.8+.
3.2 Install the Scala plugin
IDEA does not support the development of Scala language by default and needs to be extended through plug-ins. Open IDEA, click File => settings => plugins tab in turn, search for Scala plugin (as shown below). After finding the plug-in, install it and restart IDEA to make the installation take effect.
3.3 Create Scala Project
Click in the IDEA File => New => Project tab, then select Create Scala—IDEA
Project:
3.4 Download Scala SDK
1. Method One
This time to see Scala SDK
empty, then click on Create
=> Download
, select the version you want, click the OK
button to download, the download is complete click Finish
into the project.
2. Method two
The first method is the method used in the Scala official installation guide, but the download speed is usually slow, and the Scala command line tool is not directly provided under this installation. Therefore, I personally recommend downloading the installation package to the official website for installation. Download address: https://www.scala-lang.org/download/
My system here is Windows. After downloading the msi version of the installation package, I keep clicking Next to install it. After the installation is complete, the environment variables will be automatically configured.
Since the environment variables have been automatically configured during installation, IDEA will automatically select the corresponding version of the SDK.
3.5 Create Hello World
In the project src
right-click on the directory New => Scala class is created Hello.scala
. Enter the code as follows, and click the run button after completion. If it runs successfully, the setup is successful.
3.6 Switch Scala version
In the day to day development, since the corresponding software (eg Spark) version of the switch, may result in the need to switch Scala version, you can in Project Structures
the Global Libraries
switch tab.
3.7 Possible problems
After sometimes reopen the project in IDEA, right-click and will not appear new scala
options file, or no prompt Scala grammar when writing, then you can delete Global Libraries
a configured SDK, and add it again later:
In addition, running the Spark project in local mode in IDEA does not require setting up Spark and Hadoop environments on this machine.
For more dry goods, pay attention to the public account: data is great