Configuring spark and scala under win10 environment

In this configuration the spark running environment for learning the command line, the following final results, run simple code.

 

Version of the problem 0, jdk, scala and spark the

 

    About the version as the official website shown, I want to emphasize that spark so far does not support jdk11, only support to jdk8 (jdk1.8). If the wrong version, you run the code will be reported normal classes, functions, there is no mistake , I the downloaded version shown spark2.4.3, scala2.11.12 shown above, java1.8.

 

 

1, the installation environment

In win10 64-bit system, I have installed jdk, scala, and set the environment variable JAVA_HOME, SCALA_HOME, PATH. Now enter the scala -version java -version and can obtain the corresponding version cmd in.

2, install spark

From the official website http://spark.apache.org/downloads.html download the corresponding version of the archive and extract in a local directory, and set the environment variable.

download:

 

 

Decompression:

 

 

Set the environment variable:

SPARK_HOME set to file directory after you unpack, add% SPARK_HOME% in the path \ bin

 

3, configure hadoop

Also download the compressed package, extract, add environment variables. Note hadoop corresponding version, download the official website http://hadoop.apache.org/releases.html

Environment variables:

HADOOP_HOME set to file directory after you unpack, add% HADOOP_HOME% in the path \ bin

 

4, the test

After which you can enter by spark-shell interaction spark command line, a simple code tests, such as:

Exercise 1:

//通过并行化生成rdd

val rdd1 = sc.parallelize(List(5, 6, 4, 7, 3, 8, 2, 9, 1, 10))

//对rdd1里的每一个元素乘2然后排序

val rdd2 = rdd1.map(_ * 2).sortBy(x => x, true)

//过滤出大于等于十的元素

val rdd3 = rdd2.filter(_ >= 10)

//将元素以数组的方式在客户端显示

rdd3.collect

 

参考:

https://blog.csdn.net/songhaifengshuaige/article/details/79480491

Guess you like

Origin www.cnblogs.com/liwxmyself/p/11345571.html