hive 同步数据到 Elasticsearch

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/shuimofengyang/article/details/89159108

背景:将客户的详细地址清洗拼接 调用外部百度地图接口解析成经纬度,结合业务信息组成宽表同步到Elasticsearch中

#!/bin/sh -X
#引用公共参数,获取上游数据库权限赋值
source P_PARAMETER_SHARE.sh
export HADOOP_USER_NAME=uintegrate
#如果函数变量只有一个,那默认取上一天,如果有两个变量,则第二个为开始时间,如果有三个,那2,3个依次为开始结束时间
if [ KaTeX parse error: Expected 'EOF', got '#' at position 1: #̲ -eq 0 ]; the…p_in_time_str

elif [ $# -eq 1 ];
then
p_in_time_str=$1
p_in_time_end=$1
elif [ $# -eq 2 ];
then
p_in_time_str=$1
p_in_time_end=$2
else
p_in_time_str=$1
p_in_time_end=$2
fi

定义时间变量:vi表示整型

vi_stat_st 为开始日期时间戳,vi_stat_ed 为结束日期时间戳

vi_stat 为开始日期

vi_stat_st=date -d "$p_in_time_str 1 day ago" +'%Y-%m-%d'' 00:00:00'
vi_stat_ed=date -d "$p_in_time_end" +'%Y-%m-%d'' 00:00:00'
vi_stat=date -d "$p_in_time_str 1 day ago" +'%Y%m%d'

vi_stat_stn=date -d "$p_in_time_str 3 day ago" +'%Y%m%d'
vi_stat_edn=date -d "$p_in_time_end" +'%Y%m%d'

#创建hive_es 表
beeline -u jdbc:hive2://${HIVE_SERVER} -n ${HADOOP_USER_NAME} -e "
add jar /var/lib/hadoop-hdfs/elasticsearch-hadoop-hive-5.2.1.jar;
DROP TABLE IF EXISTS t_dw.map_iu_merge;
CREATE EXTERNAL TABLE if not exists t_dw.map_iu_merge(
userid string,
cp string ,
email string ,
ssoid string ,
huan_id string ,
weixin_unionid string ,
table_source string ,
id string ,
statdate string ,
name string ,
sex string ,
age string ,
birth string ,
constellation string ,
interests string,
education string ,
province string ,
city string ,
region string ,
town string ,
address string ,
loaddate string ,
detail_address string,
lng double,
lat double
)STORED BY ‘org.elasticsearch.hadoop.hive.EsStorageHandler’
TBLPROPERTIES (
‘es.resource’ = ‘user_scan/user_scan’,
‘es.index.auto.create’ = ‘true’,
es.mapping.id’=‘id’,
‘es.input.json’=‘no’,
‘es.nodes’=‘10.68.20.46’,
‘es.port’=‘9200’ );

#es.resource=索引/type
#es.index.auto.create 为true时,es自己为每个插入数据创建一个id,如果需要指定这里需要写false,并填写相应的mapping.id
#es.mapping.id 指定mapping.id 字段
#es.input.json 数据是否以json格式传入
#es.nodes/port es集群ip/端口

et mapred.max.split.size=1000000000;
set mapred.min.split.size.per.node=1000000000;
set mapred.min.split.size.per.rack=1000000000;
#将数据灌入es
insert overwrite table t_dw.map_iu_merge
select i.userid,i.cp,i.email,i.ssoid,i.huan_id,i.weixin_unionid,i.table_source,i.id,i.statdate,i.name,i.sex,i.age,i.birth,
i.constellation,i.interests,i.education,i.province,i.city,i.region,i.town,i.address,i.loaddate,
j.detail_address,cast(j.lng as DOUBLE) lng ,cast(j.lat as DOUBLE) lat FROM t_dw.map_jwd j JOIN t_dw.iu_merge i on i.id=j.id limit 50000000 ;"

猜你喜欢

转载自blog.csdn.net/shuimofengyang/article/details/89159108