Hive data warehouse financial business-business combat one

Business name: Customers of the top 10 customers in monthly transaction volume of a financial company's business department

Technical Description:

1. Data access

The data comes from the business table in the counter system and the table in the Lexus marketing integration system

System database 1:

hive_s5_szdb_sz_asset_client
hive_s5_szdb_sz_his_his_deliver
hive_s5_szdb_sz_his_his_fundjour
hive_s5_szdb_sz_his_his_assetdebit

System database 2:
hive_s8__t_gxgl_gxmx_query
hive_s8__t_ehr_jjr_jbxx

2. Writing business SQL in hive data warehouse

Based on star model development

Fact table related dimension table

Business preparation:

1. Monthly transaction volume = income = commission + amount incurred + financing interest payable

--当期佣金fare0 
with a as (select branch_no,client_id from hive.hive_s5_szdb_sz_asset_client where hive_p_date=20190531 and branch_no in (1,2))
with b as (select branch_no,client_id,fund_account,client_name,sum(fare0)value from hive.hive_s5_szdb_sz_his_his_deliver where hive_p_date between 20190501 and 20190531 and branch_no in(1,2) and asset_prop = 7 group by branch_no,client_id,fund_account,client_name order by sum(fare0)desc)
--当期发生金额occur_balance(默认为负值)
,c as (select branch_no,client_id,fund_account,max(client_name)client_name,sum(-occur_balance)value from hive.hive_s5_szdb_sz_his_his_fundjour where hive_p_date between 20190501 and 20190531 and branch_no in(271,2) and business_flag,15 ,2718,2721,2724,2732,2735,2738,2741,2744,2769,2776,2778,2779,2794,2795) and asset_prop = 7 group by branch_no,client_id,fund_account order by sum(-occur_balance) desc)
- -Interest receivable income at the end of the current period-Interest receivable income at the end of the previous period = financing interest payable fin_pre_interest
, da as (select a.branch_no,a.client_id,a.client_name,a.fund_account,(value1-value2)value from
 (select branch_no,client_id,fund_account,max(client_name)client_name,sum(fin_pre_interest) value1 from hive.hive_s5_szdb_sz_his_his_assetdebit where hive_p_date = 20190531 and branch_no in (1,2) group by branch_no,client_id,fund_account order by branch_no,client_id)a
left join(select branch_no,client_id,fund_account,max(client_name)client_name,sum(fin_pre_interest) value2 from hive.hive_s5_szdb_sz_his_his_assetdebit where hive_p_date = 20181231 and branch_no in (1,2) group by branch_no,client_id,fund_account order by branch_no,client_id)b on a.branch_no=b.branch_no and a.client_id=b.client_id order by branch_no,(value1-value2)desc)
--,c as (select branch_no,client_id,fund_account,max(client_name)client_name,sum(fin_pre_interest)value1 from hive.hive_s5_szdb_sz_his_his_assetdebit where hive_p_date between 20190501 and 20190531 and branch_no in (1,2) group by branch_no,client_id,fund_account order by branch_no,client_id,fund_account)
--客户服务关系
,e as (select a.ryid,a.khid,a.gxlxbh,b.xm from (
select * from (select ryid,khid,case when gxlxbh=2 then '服务关系' else '@' end gxlxbh from hive.hive_s8__t_gxgl_gxmx_query where hive_p_date=20190507)where gxlxbh !='@')a 
join (select ryid,xm from hive.hive_s8__t_ehr_jjr_jbxx where hive_p_date=20190507)b on a.ryid=b.ryid)
/* 查看中间状态结果select a.branch_no,a.client_id,
nvl(nvl(b.client_name,c.client_name),d.client_name)client_name,
nvl(nvl(b.fund_account,c.fund_account),d.fund_account)fund_account,
(nvl(b.value,0)+nvl(c.value,0)+nvl(d.value,0))value1,xm,gxlxbh
from a left join b on a.branch_no = b.branch_no and a.client_id=b.client_id
left join c on a.branch_no=c.branch_no and a.client_id=c.client_id
left join d on a.branch_no=d.branch_no and a.client_id=d.client_id  
left join e on a.client_id=e.khid 
order by value1 desc;*/
--目前里面都是信用账户的业务(后续若有变动请调整)
,f as (select a.branch_no,a.client_id,
nvl(nvl(b.client_name,c.client_name),d.client_name)client_name,
nvl(nvl(b.fund_account,c.fund_account),d.fund_account)fund_account,
(nvl(b.value,0)+nvl(c.value,0)+nvl(d.value,0))value1,xm,gxlxbh
from a left join b on a.branch_no = b.branch_no and a.client_id=b.client_id
left join c on a.branch_no=c.branch_no and a.client_id=c.client_id
left join d on a.branch_no=d.branch_no and a.client_id=d.client_id  
left join e on a.client_id=e.khid
order by value1 desc)
--去重,每个营业部输出10条交易量前10的客户数据
,g as (select *,row_number() over(partition by branch_no order by value1 desc)rid from f where value1>0)
select branch_no,client_id,client_name,fund_account,value1,rid,201904,xm,gxlxbh from g where rid<=10;

3. Package sql into .py script

1) Script running python xxx.py date

2) After being configured on the big data management platform, the platform defaults to schedule tasks for the day before the run

4. Export data to the backend database of the reporting platform

1) Personalize the result data of business SQL on the report platform (tables, key-value pairs, ring-mounted diagrams, radar diagrams...) display, large screen display

2) Related operations of the reporting platform

3) The authority configuration of the report is visible to the designated user, and the corresponding report authority is open to the corresponding demand department

5. Task Management

1) Configure the dependency view of the script, note: each step is a task, with a corresponding task id

Inbound -> Indicator Calculation -> Statistical Indicators -> Export Indicators to Related Tables in the Backend Database xxxreport of a Reporting Platform

2) Set the date running parameters of the task, whether it is just running data on the trading day

3) If there is an error log during the operation the next day, look up and analyze the specific reason according to the log.

6. Scheduling configuration

1) Task time setting

The task is called up at 3:30 every day, and the running screen outputs the running log

7. Data verification

1) The big data management platform dashboard displays all failed tasks in the queues for access, calculation, statistics, and outbound every day

2) According to the platform monitoring dashboard, you can view the individual failed tasks. If the task fails, you can find the reason according to the log. Otherwise, there will be nothing. The task is running very well.

 

Guess you like

Origin blog.csdn.net/ALIVEE/article/details/89978177