Spark- practical operation case

Case practical operation

  Spark Shell only be used when more testing and validation of our program, in a production environment, usually in the IDE
In the preparation of the program, and then labeled jar package, and then submitted to the cluster, the most common is to create a Maven project, use
Maven to manage dependent jar package.
 
 

1 Write WordCount program

 

1) create a Maven project WordCount and import dependence
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.atlxl</groupId>
    <artifactId>spark01</artifactId>
    <version>1.0-SNAPSHOT</version>
    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.1.1</version>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.0.0</version>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass>WordCount</mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>

 

 

2) to write code
 
 
 
 
3) packaged plug-ins
 
 
 
 
4) packed into a cluster test
 
 
 
 
 
 

2 local debugging

  Local Spark program debugging mode requires the use of local submission, as the machine is about to run rings
Habitat, Master and Worker are native. It can be added directly breakpoint debugging runtime. as follows:
Creating SparkConf when setting additional properties, indicates that the local execution:
val conf = new SparkConf().setAppName("WC").setMaster("local[*]")
If the native operating system is windows, if used in the program related to the hadoop
Things, such as writing a file to HDFS, we encounter the following exception:
 
 
  This problem occurs because the error is not a program, but uses related hadoop
Service, solution is to extract the additional inside hadoop-common-bin-2.7.3-x64.zip
To any directory.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Guess you like

Origin www.cnblogs.com/LXL616/p/11139436.html