Spark (3): Installation and Configuration

content:

  • Spark cluster installation
  • parameter configuration
  • Test verification

Spark cluster installation:


  • Select "add Service" on the ambari -service interface, as shown in the figure:
  • Select the spark service in the pop-up interface, as shown in the figure:

  • "Next", assign the host node, because we have installed hadoop and hbase clusters in the previous stage, and assign the spark history server according to the wizard.
  • Assign the client, as shown below:
  • Release installation, the following correct state

Parameter configuration:


  • After the installation is complete, restart hdfs and yarn
  • Check the spark service, the spark thrift server is not started normally, the log is as follows:
    copy code
    16/08/30 14:13:25 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (512 MB per container)
    16/08/30 14:13:25 ERROR SparkContext: Error initializing SparkContext.
    java.lang.IllegalArgumentException: Required executor memory (1024+384 MB) is above the max threshold (512 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
        at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:284)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:140)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
        at org.apache.spark.SparkContext. <init> (SparkContext.scala: 530)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:56)
        at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:76)
        at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    copy code
  •  Solution: Adjust yarn related parameters to configure yarn.nodemanager.resource.memory-mb, yarn.scheduler.maximum-allocation-mb

  •  yarn.nodemanager.resource.memory-mb

    Indicates the total amount of physical memory that YARN can use on this node. The default is 8192 (MB). Note that my local hdp2-3 memory is 4G, and the default value is 512M, which is adjusted to the size as shown below.

  • yarn.scheduler.maximum-allocation-mb

    The maximum amount of physical memory that a single task can apply for, the default is 8192 (MB).

  • Save the configuration and restart the services that depend on the configuration. After normal operation, the following figure is shown:

  •  

Test Verification:


  • On any installation spark client machine (hdp4), switch the directory to the bin directory of the spark installation directory
  • Command:  ./spark-sql
  • sql command:  show database; as shown below
  • View the history as follows:

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325263716&siteId=291194637