4 IDEA environmental applications

The first 4 chapters IDEA environmental applications

spark shell more used only when testing and validation of our program, in a production environment, usually in the IDE compiled the program, and then labeled jar package, and then submitted to the cluster, the most common is to create a Maven project, use Maven to manage jar dependent packages.

4.1 in IDEA in preparation WordCount program

1 ) create a Maven project WordCount and import dependence

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.1.1</version>
    </dependency>
</dependencies>
<build>
        <finalName>WordCount</finalName>
        <plugins>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.0.0</version>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass>WordCount(修改)</mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
</build>

2) to write code

Package com.atguigu 

Import org.apache.spark {SparkConf, SparkContext}. 

Object the WordCount { 

  DEF main (args: the Array [String]): Unit = { 

// create and set SparkConf App Name 
    Val = the conf new new . SparkConf () setAppName ( "the WC" ) 

// create SparkContext, the object is submitted to the inlet Spark App 
    Val sc = new new SparkContext (the conf) 

    // use the sc create and execute the corresponding transformation RDD and Action 
    sc.textFile (args (0)). flatMap (_.split ( "")). Map ((_,. 1)). reduceByKey (+ _ _,. 1) .sortBy (_._ 2, to false ) .saveAsTextFile (args (. 1 )) 

    sc.stop () 
  } 
}

3) packed into a cluster test

bin/spark-submit \

--class WordCount \

--master spark://hadoop102:7077 \

WordCount.jar \

/word.txt \

/out

4 .2 local debugging

Local Spark program debugging requires the use of local commit mode, is about to run as a native environment , Master and Worker are native . It can be added directly breakpoint debugging runtime. as follows:

 

Creating SparkConf when setting additional properties, indicates that the local execution:

val conf = new SparkConf().setAppName("WC").setMaster("local[*]")

 

    If the native operating system is Windows , if you use the program in hadoop -related things, such as writing a file to HDFS , we encounter the following exception:

The reason this problem , not a program error , but uses hadoop -related services, the solution is to attach inside the hadoop-the Common-bin- 2.7.3-x64.zip extract to any directory .

In IDEA configuration in the Run  the Configuration, add HADOOP_HOME variable

4 .3 Remote Debugging

By IDEA remote debugging, mainly to IDEA as Driver to submit applications, configuration process is as follows:

Modify sparkConf, adding eventually need to run the Jar package , Driver address of the program, and set the Master submit address:

val conf = new SparkConf().setAppName("WC")

.setMaster("spark://hadoop102:7077")
.setJars(List("E:\\SparkIDEA\\spark_test\\target\\WordCount.jar"))

然后加入断点,直接调试即可:

 

Guess you like

Origin www.cnblogs.com/Diyo/p/11297045.html