Write to busy people series: Spark development environment construction

Spark development environment setup

1. Install Spark

2. Word frequency statistics case

3. Scala development environment configuration

One, install Spark

1.1 Download and unzip

The official download address: http://spark.apache.org/downloads.html , select the Spark version and the corresponding Hadoop version before downloading:

Write to busy people series: Spark development environment construction

Unzip the installation package:

# tar -zxvf  spark-2.2.3-bin-hadoop2.6.tgz

1.2 Configure environment variables

# vim /etc/profile

Add environment variables:

export SPARK_HOME=/usr/app/spark-2.2.3-bin-hadoop2.6
export  PATH=${SPARK_HOME}/bin:$PATH

Make the configured environment variables take effect immediately:

# source /etc/profile

1.3 Local mode

Local mode is the easiest way to run. It runs in a single-node multi-threaded mode, without deployment, out of the box, suitable for daily test development.

# 启动spark-shell
spark-shell --master local[2]
  • local : Only start one worker thread;
  • local[k] : start k worker threads;
  • * local[ ]**: Start the same number of worker threads as the number of CPUs.

Write to busy people series: Spark development environment construction

<br/>

After entering spark-shell, the program has automatically created the context SparkContext, which is equivalent to executing the following Scala code:

val conf = new SparkConf().setAppName("Spark shell").setMaster("local[2]")
val sc = new SparkContext(conf)

2. Word frequency statistics case

After the installation is complete, you can make a simple example of word frequency statistics and feel the charm of spark. Prepare a file sample of word frequency statistics wc.txt, the content is as follows:

hadoop,spark,hadoop
spark,flink,flink,spark
hadoop,hadoop

Execute the following Scala statement in the scala interactive command line:

val file = spark.sparkContext.textFile("file:///usr/app/wc.txt")
val wordCounts = file.flatMap(line => line.split(",")).map((word => (word, 1))).reduceByKey(_ + _)
wordCounts.collect

The execution process is as follows, you can see that the results of word frequency statistics have been output:

Write to busy people series: Spark development environment construction

At the same time, you can view the execution of the job through the Web UI. The access port is 4040:

Write to busy people series: Spark development environment construction

Three, Scala development environment configuration

Spark is developed based on the Scala language, and provides APIs based on Scala, Java, and Python respectively. If you want to use the Scala language for development, you need to build a Scala language development environment.

3.1 Precondition

The operation of Scala relies on JDK, so you need to have the corresponding version of JDK installed on your machine. The latest Scala 2.12.x requires JDK 1.8+.

3.2 Install the Scala plugin

IDEA does not support the development of Scala language by default and needs to be extended through plug-ins. Open IDEA, click File => settings => plugins tab in turn, search for Scala plugin (as shown below). After finding the plug-in, install it and restart IDEA to make the installation take effect.

Write to busy people series: Spark development environment construction

3.3 Create Scala Project

Click in the IDEA File => New => Project tab, then select Create Scala—IDEAProject:

Write to busy people series: Spark development environment construction

3.4 Download Scala SDK

1. Method One

This time to see Scala SDKempty, then click on Create=> Download, select the version you want, click the OKbutton to download, the download is complete click Finishinto the project.

Write to busy people series: Spark development environment construction

2. Method two

The first method is the method used in the Scala official installation guide, but the download speed is usually slow, and the Scala command line tool is not directly provided under this installation. Therefore, I personally recommend downloading the installation package to the official website for installation. Download address: https://www.scala-lang.org/download/

My system here is Windows. After downloading the msi version of the installation package, I keep clicking Next to install it. After the installation is complete, the environment variables will be automatically configured.

Write to busy people series: Spark development environment construction

Since the environment variables have been automatically configured during installation, IDEA will automatically select the corresponding version of the SDK.
Write to busy people series: Spark development environment construction

3.5 Create Hello World

In the project srcright-click on the directory New => Scala class is created Hello.scala. Enter the code as follows, and click the run button after completion. If it runs successfully, the setup is successful.
Write to busy people series: Spark development environment construction

3.6 Switch Scala version

In the day to day development, since the corresponding software (eg Spark) version of the switch, may result in the need to switch Scala version, you can in Project Structuresthe Global Librariesswitch tab.

Write to busy people series: Spark development environment construction

3.7 Possible problems

After sometimes reopen the project in IDEA, right-click and will not appear new scalaoptions file, or no prompt Scala grammar when writing, then you can delete Global Librariesa configured SDK, and add it again later:
Write to busy people series: Spark development environment construction

In addition, running the Spark project in local mode in IDEA does not require setting up Spark and Hadoop environments on this machine.

For more dry goods, pay attention to the public account: data is great

Write to busy people series: Spark development environment construction

Guess you like

Origin blog.51cto.com/14974545/2551460