通过pyspark将KUDU写入hbase

kudu2hbaseOnpyspark

通过pyspark配置将kudu的数据写入hbase:

运行命令spark-submit:

spark-submit --packages com.hortonworks:shc-core:1.1.1-2.1-s_2.11,com.XXX.bigdata:label-hive-utils:1.0 --conf "spark.jars.ivySettings=/usr/hive-client/conf/ivysettings.xml" --conf "spark.jars.excludes=org.apache.hadoop:hadoop-*" --jars hdfs:///team/lib/shc-core-1.1.2-2.2-s_2.11-SNAPSHOT-shaded.jar,hdfs:///team/lib/kudu-spark-1.0-SNAPSHOT.jar ./hz_hbase.py

需要注意点:

如果要写入时间类型为:2018-09-18 17:01:00.0会报错误。

PrimitiveType coder: unsupported data type 2018-09-18 17:01:00.0

解决方法:

可以使用substr将时间类型为2018-09-18 17:01:00.0截取出钱19个字符:substr(“2018-09-18 17:01:00.0”)

catalog的写法必须我str类型,否则会报str相关错误。

相关代码如下:

# -*- coding: utf-8 -*-
""" 
@author:zangjuanling 
@file: hz_hbase.py 
@time: 2018/09/17 
"""
import os
import sys
from pyspark.sql import SparkSession
spark= SparkSession.builder.appName("hz_hbase").enableHiveSupport().getOrCreate()
if __name__ == '__main__':
    spark.read.format('org.apache.kudu.spark.kudu') \
        .option('kudu.master', "hadoop-kudu-1:7051,hadoop-kudu-2:7051,hadoop-kudu-3:7051") \
        .option('kudu.table', "DXC_KUDU.XXX") \
        .load().registerTempTable("temp")
    df = spark.sql("select fk_customer,result,substr(modifytime,0,19) as modifytime from temp").toDF("col0", "col1", "col2")
    catalog = "".join("""{
        "table":{"namespace": "default", "name": "dna_result"},
        "rowkey": "key",
        "columns": {
            "col0": {"cf": "rowkey", "col": "key", "type": "binary"},
            "col1": {"cf": "hz", "col": "result", "type": "binary"},
            "col2": {"cf": "hz", "col": "modifytime", "type": "string"}
    }
    }""".split())
    df.write.options(catalog=catalog) \
        .mode('append') \
        .format("org.apache.spark.sql.execution.datasources.hbase") \
        .option("zookeeper.znode.parent", "/hbase-unsecure") \
        .option("hbase.zookeeper.quorum", "hbase-master-1,hbase-master-2,hbase-master-3") \
        .option("hbase.zookeeper.property.clientPort", "2181") \
        .option("newTable", "30") \
        .save()

猜你喜欢

转载自blog.csdn.net/qq_37050993/article/details/82761252