1. Description
In Chapter 2 of the book, there is an example, after building it, run:
${SPARK_HOME}/bin/spark-submit --class com.oreilly.learningsparkexamples.mini.java.WordCount ./target/learning-spark-mini-example-0.0.1.jar ./README.md ./wordcouts
If the spark version used is different from the one used in the book, there will be various problems. For example, the book uses 1.2.0 and I use the latest 2.3.0.
2. Problems and solutions
1. When compiling for the first time, an error similar to the following appears:
ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.AbstractMethodError: com.oreilly.learningsparkexamples.mini.java.WordCount$1.call(Ljava/lang/Object;)Ljava/util/Iterator;
....
The first is to solve the problem of version dependencies :
(1) Obtain the spark-core version and the spark version by viewing the following paths:
${SPARK_HOME}/jars/spark-core_x.xx-y.y.y.jar
(2) Modify the pom.xml in the mini-complete-example directory and replace the version number you just checked with the original:
<dependency> <!-- Spark dependency --> <groupId>org.apache.spark</groupId> <artifactId>spark-core_x.xx</artifactId> <version>y.y.y</version> <scope>provided</scope> </dependency>
Recompile.
2. The second compilation is estimated to encounter the following error:
Java FlatMapFunction in Spark: error: is not abstract and does not override abstract method call(String) in FlatMapFunction ......
Locate the wrong sentence:
JavaRDD<String> words = input.flatMap( new FlatMapFunction<String, String>() { public Iterable<String> call(String x) { return Arrays.asList(x.split(" ")); }});
I checked the inheritance rules of the FlatMapFunction <T, R> () interface in the book again, and found no errors. After thinking about it, it may be caused by different versions. I checked the latest version of the api and found that the return type of the method that needs to be implemented has changed:
java.util.Iterator<R> call(T t)
is an Iterator<R> instead of an Iterable<R>, which is the right medicine:
(1) Import the Iterator package:
import java.util.Iterator;
(2) Modify the wrong sentence into:
JavaRDD<String> words = input.flatMap( new FlatMapFunction<String, String>() { @Override public Iterator<String> call(String x) { return Arrays.asList(x.split(" ")).iterator(); }});
Recompile, package:
mvn compile && mvn package
Then run it again, the problem is solved
3. Reference
1. Apache Spark: ERROR Executor –> Iterator
3. Spark Api
(over)