Spark build with eclipse runtime environment (stand-alone)

Learning Spark with the eclipse habits, read this article to build a stand-alone environment, but this single environment to build up, after a cluster environment, I believe also easy ...

  • Local download Scala

  • Local downloading Spark

  • Scala and local downloading eclipse plugins

The above process has a lot of online tutorials.


Building process

My build process with the top of the article there are some gaps, after all, the age-old ...

  • The first step: to build a good environment than later, to build a Scala project.

  • Step Two: In this project to build a Scala Object , on Scala build one based on Object prevail. That means the equivalent of python a method works. Scala has always been strange to see its syntax know ...

  • The third step: Import spark in the relevant package (as shown), in order to avoid any problems anyway, I was all imported


If, after the establishment of the following projects appear to have a small cross, right Scala Library Container , select the Properties , the choice of a generation version of Scala (as shown) to solve the problem of the small cross, the other version if not you can try try. (My default is 2.12, below the 2.11 change small cross disappeared, spark2.2.0 create and distribute default Scala 2.11 )


test

  • Built environment is so simple, in this test. Enter the following code in the project just completed, notices of which
      
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
      
      
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object First_Spark {
def (args: Array[String]): Unit = {
selection conf = new SparkConf ()
conf.setAppName( "Frist Spark")
conf.setMaster( "local")
val sc = new SparkContext(conf)
// The following is the location of the file
val lines = sc.textFile( "/Users/junjieliu/Downloads/README.md", 1)
val words = lines.flatMap { (lines => lines.split( " ")) }
val pairs = words.map { word => (word, 1) }
val wordcount = pairs.reduceByKey(_ + _)
wordcount.foreach(pair => println(pair._1 + ":" + pair._2))
sc.stop()
}
}

  • 运行,点击run(如图)


  • 测试效果:

看起来是不是比Hadoopmapreduce简单多了,哈哈。

  • 补充:

eclipse中编译spark代码时,应当加上println之类的方法在最后以保结果可以正确输出,为何要强调这一点呢?因为在终端运行代码时,我们通常是不需要输入println之类的方法就可输出相关的结果的,这一点应该记住,不然易导致在eclipse中运行代码时容易出错,并且大部分人我相信也会像从前的我一样没有什么头绪,而现在就有了…

  • 如图:


这些真的都需要自己去实践发现的,我也是在参考了一些资料之后才发现的这个问题…因为网上关于用eclipse像我这样编译spark的教程并不多,据我所看到的,大多用maven来搭建spark的编译环境的。

最为重要的一点是在eclipse中编译相关的代码时,输出的结果是无法判断是否是正确的…即使我们在平时的编译中是有一定的错误提示的,但是对于初学者来说还是有一点不友好的感觉…

Also added: the eclipse when running result, there will be some buffer time in the lower right corner of the display, a little careful friends may find, and thus the result is said to take a little time to output (as it is right result). So eclipse runtime required to run more than a few times (because of the need buffer after the first run will start, it is generally run twice so you can get the correct correlation of the results), if the result is not right, the biggest problem probably in the code on the ...

Original: Large columns  built Spark operating environment (stand-alone) with eclipse


Guess you like

Origin www.cnblogs.com/chinatrump/p/11584435.html