Spark学习(2) 初始化

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/d413122031/article/details/82631341

Spark初始化

  • SparkConf

    • setMaster() :集群地址

          spark://host:port   Standalone clusterport 7077.
          mesos://host:port   Mesos cluster.port 5050.
          yarn  YARN cluster. 需要设置 HADOOP_CONF_DIR
          local 本地单核
          local[N] Run in local mode with N cores.
          local[*] Run in local mode and use as many cores as the machine has.
    • spark.cassandra.connection.host :cassandra地址

    • setAppName() :APP名称
    • setSparkHome()
    • setExecutorEnv()

          >>> conf.setExecutorEnv("VAR1", "value1")
          <pyspark.conf.SparkConf object at ...>
          >>> conf.setExecutorEnv(pairs = [("VAR3", "value3"), ("VAR4", "value4")])
          <pyspark.conf.SparkConf object at ...>
          >>> conf.get("spark.executorEnv.VAR1")
          u'value1'
          >>> print(conf.toDebugString())
          spark.executorEnv.VAR1=value1
          spark.executorEnv.VAR3=value3
          spark.executorEnv.VAR4=value4
    • spark.executor.memory : 每个execute内存

    • spark.driver.memory :driver内存
    • spark.cores.max : 分配核心数
  • spark-submit

    • –master Indicates the cluster manager to connect to. The options for this flag are described in Table 7-1.
    • –deploy-mode Whether to launch the driver program locally (“client”) or on one of the worker machines inside the
      cluster (“cluster”). In client mode spark-submit will run your driver on the same machine where
      spark-submit is itself being invoked. In cluster mode, the driver will be shipped to execute on a
      worker node in the cluster. The default is client mode.
    • –class The “main” class of your application if you’re running a Java or Scala program.
    • –name A human-readable name for your application. This will be displayed in Spark’s web UI.
    • –jars A list of JAR files to upload and place on the classpath of your application. If your application depends
      on a small number of third-party JARs, you can add them here.
    • –files A list of files to be placed in the working directory of your application. This can be used for data files
      that you want to distribute to each node.
    • –py-files A list of files to be added to the PYTHONPATH of your application. This can contain .py, .egg, or .zip
      files.

猜你喜欢

转载自blog.csdn.net/d413122031/article/details/82631341