The first Spark program


1. Install the Scala plug- in
  in
  Eclipse

:


3. Enter the eclipse installation directory
  , there is a dropins directory, create a new scala directory, and copy all the decompressed files to the scala directory


4. Restart eclipse, you can see that you can create a new scala project


Above

, the plugin installation is complete 2. Create a new scala project
and create a newA scala object, the code is as follows:
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext

object SimpleApp {
  def main(args: Array[String]) {    
    if (args.length != 2) {
      println("error : too few arguments");
      sys.exit();
    }
    val conf = new SparkConf().setAppName("Simple Application");
    val filePath = args(0);
    val sc = new SparkContext(conf);
    val file = sc.textFile(filePath, 2).cache();
    val counts = file.flatMap { line => line.split(" ") }.map { word => (word, 1) }.reduceByKey(_ + _);
    counts.saveAsTextFile(args(1));
  }
}


3. Import the jar package
1. Create a new user library
2. From the jars folder in the spark installation directory, copy all the jar packages to the user library
3. "Configure Build Path..." Import the user library


4. Run the program to
configure the running parameters as shown in the figure below, and click "Run as"--"Scala application" to run this code


5.
Errors that may be encountered 1. [java.lang.UnsupportedClassVersionError] The version is
  inconsistent. Dependent JDK version
  solution: Select the JDK that matches Spark

2. [A master URL must be set in your configuration]
  From the prompt, you can see that the master running the program cannot be found, and you need to configure environment variables.
  The master url passed to spark can be as follows:
  local local single-threaded
  local[K] local multi-threaded (specify K cores)
  local[asterisk] local multi-threaded (specify all available cores)
  spark://HOST:PORT To connect to the specified Spark standalone cluster master, the specified port is required.
  mesos://HOST:PORT Connect to the specified Mesos cluster, you need to specify the port.
  The yarn-client client mode connects to the YARN cluster. HADOOP_CONF_DIR needs to be configured.
  The yarn-cluster cluster mode connects to the YARN cluster. HADOOP_CONF_DIR needs to be configured.
 
  Here, we configure the virtual machine's startup parameter "-Dspark.master=local" to instruct the program to run locally in a single thread, and run it again, as shown below:


or specify in the code:
val conf = new SparkConf().setAppName("Simple Application").setMaster("local");


3. [(null) entry in command string: null chmod 0700]
  Reason: Since Spark is designed and developed in a linux environment, running in a Windows stand-alone environment (without the support of a hadoop cluster) will encounter the problem of winutils , in order to solve this problem, we need to install winutils.ext
  Solution:
  1) Create a new directory on the C drive: C:/hadoop/bin
  2) Copy winutils.exe, libwinutils.lib to C:/hadoop/bin
  3) Configure Environment variable HADOOP_HOME=C:\hadoop\ or directly specify hadoop.home in the code, add the following code to SimpleApp:
System.setProperty("hadoop.home.dir","C:\\hadoop" );



With
winutils.exe, libwinutils.lib download address: http://pan.baidu.com/s/1pLNVdc3

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326306538&siteId=291194637