content:
- Spark cluster installation
- parameter configuration
- Test verification
Spark cluster installation:
- Select "add Service" on the ambari -service interface, as shown in the figure:
- Select the spark service in the pop-up interface, as shown in the figure:
- "Next", assign the host node, because we have installed hadoop and hbase clusters in the previous stage, and assign the spark history server according to the wizard.
- Assign the client, as shown below:
- Release installation, the following correct state
Parameter configuration:
- After the installation is complete, restart hdfs and yarn
- Check the spark service, the spark thrift server is not started normally, the log is as follows:
16/08/30 14:13:25 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (512 MB per container) 16/08/30 14:13:25 ERROR SparkContext: Error initializing SparkContext. java.lang.IllegalArgumentException: Required executor memory (1024+384 MB) is above the max threshold (512 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'. at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:284) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:140) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext. <init> (SparkContext.scala: 530) at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:56) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:76) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
-
Solution: Adjust yarn related parameters to configure yarn.nodemanager.resource.memory-mb, yarn.scheduler.maximum-allocation-mb
-
yarn.nodemanager.resource.memory-mb
Indicates the total amount of physical memory that YARN can use on this node. The default is 8192 (MB). Note that my local hdp2-3 memory is 4G, and the default value is 512M, which is adjusted to the size as shown below.
-
yarn.scheduler.maximum-allocation-mb
The maximum amount of physical memory that a single task can apply for, the default is 8192 (MB).
-
Save the configuration and restart the services that depend on the configuration. After normal operation, the following figure is shown:
-
Test Verification:
- On any installation spark client machine (hdp4), switch the directory to the bin directory of the spark installation directory
- Command: ./spark-sql
- sql command: show database; as shown below
- View the history as follows: