1, Environment Description
operating system | CentOS Linux release 7.4.1708 (Core) |
---|---|
Ambari |
2.6.x |
HDP |
2.6.3.0 |
Spark |
2.x |
Phoenix |
4.10.0-HBase-1.2 |
2, condition
- HBase installation is complete
- Phoenix has been enabled, as shown Ambari interface as follows:
- Spark 2 installation is complete
3, Spark2 and Phoenix Integration
Phoenix official website integration Tutorial: http://phoenix.apache.org/phoenix_spark.html
step:
- Ambari Spark2 into the configuration interface
- Find
自定义 spark2-defaults
and add the following configuration items:
spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.10.0-HBase-1.2-client.jar
spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.10.0-HBase-1.2-client.jar
4, Yarn HA problem
If you configure Yarn HA, you need to modify Yarn HA configuration, otherwise spark-submit
submit the task will be reported the following error:
Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.getProxyInternal()Ljava/lang/Object; from class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider
at org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider.init(RequestHedgingRMFailoverProxyProvider.java:75)
at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:163)
at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:94)
at org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceStart(YarnClientImpl.java:187)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:153)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:922)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:914)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:914)
at cn.spark.sxt.SparkOnPhoenix$.main(SparkOnPhoenix.scala:13)
at cn.spark.sxt.SparkOnPhoenix.main(SparkOnPhoenix.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.i
修改Yarn HA配置:
Will 原来的配置
:
yarn.client.failover-proxy-provider=org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider
Read 现在的配置
:
yarn.client.failover-proxy-provider=org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider
If no Yarn HA, this step is not required configuration
---