First, create a new Maven project in eclipse Java EE. The specific options are as follows
Click Finish to create successfully, then change the default jdk1.5 to jdk1.8
Then edit pom.xml to add spark-core dependencies
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.2.1</version>
</dependency>
Then copy the source code sample program in the book. Since the spark version in the book is 1.2, my environment spark is 2.2.1, so I need to modify the code to adapt to the new version of the spark API
JavaRDD<String> words = input.flatMap(
new FlatMapFunction<String, String>() {
public Iterator<String> call(String x) {
return Arrays.asList(x.split(" ")).iterator();
}});
Then execute Maven install and then enter the directory E:\developtools\eclipse-jee-neon-3-win32\workspace\learning-spark-mini-example\target to find learning-spark-mini-example-0.0.1-SNAPSHOT. jar and upload it to the linux directory of the spark2.2.1 environment
Then execute the following command in linux, as shown below
[root@hserver1 ~]# spark-submit \
> --class com.oreilly.learningsparkexamples.mini.java.WordCount \
> learning-spark-mini-example-0.0.1-SNAPSHOT.jar \
> /opt/spark-2.2.1-bin-hadoop2.7/README.md wordcounts