Spark (4): Spark-sql reads hbase

  • SparkSql access hbase configuration
  • Test verification

SparkSql access hbase configuration:


  •  Copy the related jar packages of HBase to the $SPARK_HOME/lib directory on the Spark node. The list is as follows:
    copy code
    guava-14.0.1.jar
    htrace-core-3.1.0-incubating.jar
    hbase-common-1.1.2.2.4.2.0-258.jar
    hbase-common-1.1.2.2.4.2.0-258-tests.jar
    hbase-client-1.1.2.2.4.2.0-258.jar
    hbase-server-1.1.2.2.4.2.0-258.jar
    hbase-protocol-1.1.2.2.4.2.0-258.jar
    hive-hbase-handler-1.2.1000.2.4.2.0-258.jar
    copy code
  • Configure $SPARK_HOME/conf/spark-env.sh of the Spark node on ambari, and add the above jar package to SPARK_CLASSPATH, as shown below:
  • The list of configuration items is as follows: Note that there can be no spaces or carriage returns between the jar packages
    export SPARK_CLASSPATH=/usr/hdp/2.4.2.0-258/spark/lib/guava-11.0.2.jar: /usr/hdp/2.4.2.0-258/spark/lib/hbase-client-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/spark/lib/hbase-common-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/spark/lib/hbase-protocol-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/spark/lib/hbase-server-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/spark/lib/hive-hbase-handler-1.2.1000.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/spark/lib/htrace-core-3.1.0-incubating.jar:  /usr/hdp/2.4.2.0-258/spark/lib/protobuf-java-2.5.0.jar:${SPARK_CLASSPATH}
  • Copy hbase-site.xml to ${HADOOP_CONF_DIR}. Since the Hadoop configuration file directory ${HADOOP_CONF_DIR} is configured in spark-env.sh, hbase-site.xml will be loaded, and hbase-site.xml mainly contains the following Configuration of several parameters:
copy code
<property>
<name>hbase.zookeeper.quorum</name>
<value>r,hdp2,hdp3</value>
<description>zookeeper node used by HBase</description>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>100</value>
<description>The HBase client scans the cache, which is very helpful for query performance</description>
</property>
copy code
  •  Component services affected by restarting the modified configuration on ambari

 

Test Verification:


  • Any spark client node authentication:
  • Command:  cd /usr/hdp/2.4.2.0-258/spark/bin   (spark installation directory)
  • Command:  ./spark-sql
  • Execute:  select * from stocksinfo;    (stocksinfo is a hive external table associated with hbase)
  • The result is as follows:

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324490876&siteId=291194637