The first 4 chapters IDEA environmental applications
spark shell more used only when testing and validation of our program, in a production environment, usually in the IDE compiled the program, and then labeled jar package, and then submitted to the cluster, the most common is to create a Maven project, use Maven to manage jar dependent packages.
4.1 in IDEA in preparation WordCount program
1 ) create a Maven project WordCount and import dependence
<dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.1.1</version> </dependency> </dependencies> <build> <finalName>WordCount</finalName> <plugins> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.2.2</version> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>3.0.0</version> <configuration> <archive> <manifest> <mainClass>WordCount(修改)</mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> <id>make-assembly</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins> </build>
2) to write code
Package com.atguigu Import org.apache.spark {SparkConf, SparkContext}. Object the WordCount { DEF main (args: the Array [String]): Unit = { // create and set SparkConf App Name Val = the conf new new . SparkConf () setAppName ( "the WC" ) // create SparkContext, the object is submitted to the inlet Spark App Val sc = new new SparkContext (the conf) // use the sc create and execute the corresponding transformation RDD and Action sc.textFile (args (0)). flatMap (_.split ( "")). Map ((_,. 1)). reduceByKey (+ _ _,. 1) .sortBy (_._ 2, to false ) .saveAsTextFile (args (. 1 )) sc.stop () } }
3) packed into a cluster test
bin/spark-submit \ --class WordCount \ --master spark://hadoop102:7077 \ WordCount.jar \ /word.txt \ /out
4 .2 local debugging
Local Spark program debugging requires the use of local commit mode, is about to run as a native environment , Master and Worker are native . It can be added directly breakpoint debugging runtime. as follows:
Creating SparkConf when setting additional properties, indicates that the local execution:
val conf = new SparkConf().setAppName("WC").setMaster("local[*]")
If the native operating system is Windows , if you use the program in hadoop -related things, such as writing a file to HDFS , we encounter the following exception:
The reason this problem , not a program error , but uses hadoop -related services, the solution is to attach inside the hadoop-the Common-bin- 2.7.3-x64.zip extract to any directory .
In IDEA configuration in the Run the Configuration, add HADOOP_HOME variable
4 .3 Remote Debugging
By IDEA remote debugging, mainly to IDEA as Driver to submit applications, configuration process is as follows:
Modify sparkConf, adding eventually need to run the Jar package , Driver address of the program, and set the Master submit address:
val conf = new SparkConf().setAppName("WC")
.setMaster("spark://hadoop102:7077")
.setJars(List("E:\\SparkIDEA\\spark_test\\target\\WordCount.jar"))
然后加入断点,直接调试即可: