Big Data Data Warehouse Project--Zhixing Education_Access Consulting Topics_Full Process

4.6 Full process

OLTP raw data (mysql)——"Data Acquisition (ODS)——"Cleaning Conversion (DWD)——"Statistical Analysis (DWS)——"Export to OLAP (Mysql), as shown in the figure:
Insert picture description here

4.6.1 Data Collection

4.6.1.1 web_chat_ems table
4.6.1.1.1 SQL:

select id,
       create_date_time,
       session_id,
       sid,
       create_time,
       seo_source,
       seo_keywords,
       ip,
       area,
       country,
       province,
       city,
       origin_channel,
       user         as user_match,
       manual_time,
       begin_time,
       end_time,
       last_customer_msg_time_stamp,
       last_agent_msg_time_stamp,
       reply_msg_count,
       msg_count,
       browser_name,
       os_info,
       "2019-07-01" as starts_time
from web_chat_ems_2019_07;

4.6.1.1.2 Sqoop:

sqoop import \
--connect jdbc:mysql://192.168.52.150:3306/nev \
--username root \
--password 123456 \
--query 'select id, create_date_time, session_id, sid, create_time, seo_source, seo_keywords, ip, area, country, province, city, origin_channel, user as user_match, manual_time, begin_time, end_time, last_customer_msg_time_stamp, last_agent_msg_time_stamp, reply_msg_count, msg_count, browser_name, os_info, "2019-07-01" as starts_time from web_chat_ems_2019_07 where $CONDITIONS' \
--hcatalog-database itcast_ods_test \
--hcatalog-table web_chat_ems \
-m 100 \
--split-by id


bin/sqoop import \
--connect jdbc:mysql://192.168.10.10:3306/nev \
--username root \
--query 'select id, create_date_time, session_id, sid, create_time, seo_source, seo_keywords, ip, area, country, province, city, origin_channel, user as user_match, manual_time, begin_time, end_time, last_customer_msg_time_stamp, last_agent_msg_time_stamp, reply_msg_count, msg_count, browser_name, os_info, "2019-07-01" as starts_time from web_chat_ems_2019_07 where $CONDITIONS' \
--hcatalog-database itcast_ods_test \
--hcatalog-table web_chat_ems \
-m 100 \
--split-by id

-m 100 refers to the use of 100 MapReduce tasks for parallel processing;
and the split-by parameter refers to which field is used as the basis for splitting.

4.6.1.2 web_chat_text_ems table
4.6.1.2.1 SQL

select id,
       referrer,
       from_url,
       landing_page_url,
       url_title,
       platform_description,
       other_params,
       history,
       "2019-07-01" as start_time
from web_chat_text_ems_2019_07;

4.6.1.2.2 Sqoop

sqoop import \
--connect jdbc:mysql://192.168.52.150:3306/nev \
--username root \
--password 123456 \
--query 'select id,referrer,from_url,landing_page_url,url_title,platform_description,other_params,history, "2019-07-01" as start_time from web_chat_text_ems_2019_07 where $CONDITIONS' \
--hcatalog-database itcast_ods \
--hcatalog-table web_chat_text_ems \
-m 100 \
--split-by id

bin/sqoop import \
--connect jdbc:mysql://192.168.10.10:3306/nev \
--username root \
--query 'select id,referrer,from_url,landing_page_url,url_title,platform_description,other_params,history, "2019-07-01" as start_time from web_chat_text_ems_2019_07 where $CONDITIONS' \
--hcatalog-database itcast_ods \
--hcatalog-table web_chat_text_ems \
-m 100 \
--split-by id

4.6.2.4 Code

--动态分区配置
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
--hive压缩
set hive.exec.compress.intermediate=true;
set hive.exec.compress.output=true;
--写入时压缩生效
set hive.exec.orc.compression.strategy=COMPRESSION;


insert into table itcast_dwd.visit_consult_dwd partition (yearinfo, monthinfo, dayinfo)
select
    wce.session_id,
    wce.sid,
    unix_timestamp(wce.create_time, 'yyyy-MM-dd HH:mm:ss.SSS') as create_time,
    wce.seo_source,
    wce.ip,
    wce.area,
    cast(if(wce.msg_count is null, 0, wce.msg_count) as int) as msg_count,
    wcte.referrer,
    wcte.from_url,
    wcte.landing_page_url,
    wcte.url_title,
    wcte.platform_description,
    wcte.other_params,
    wcte.history,
    substr(wce.create_time, 12, 2) as hourinfo,
    quarter(wce.create_time) as quarterinfo,
    substr(wce.create_time, 1, 4) as yearinfo,
    substr(wce.create_time, 6, 2) as monthinfo,
    substr(wce.create_time, 9, 2) as dayinfo
from itcast_ods.web_chat_ems wce inner join itcast_ods.web_chat_text_ems wcte
on wce.id = wcte.id;

4.6.3 Statistical analysis

4.6.3.1 Analysis of the
DWD layer is followed by the DWM middle layer and DWS business layer. Looking back at the modeling and analysis stage, we have got the dimensions related to the indicators: year, quarter, month, day, hour, region, source channel, and page. Divided into two categories:
Time dimension: year, quarter, month, day, hour
Business attribute dimension: region, source channel, page, total traffic.
In the DWS layer, count+distinct is used to count indicators according to different dimensions to form a wide table.
Null value processing
The dimension association key in the fact table cannot have a null value, and the associated dimension information must use a surrogate key (-1) instead of a null value to indicate an unknown condition.
4.6.3.2 Code
Our dimensions are divided into two categories: time dimension and product attribute dimension. In the DWS layer, we can produce a wide table to generate data in all dimensions for use by the APP layer and OLAP applications.
4.6.3.2.1 When
calculating the regional dimension by region grouping , you need to set the product attribute type groupType to 1 (region), and set other product attributes to -1 (search source, source channel, conversation source page), which is convenient for the team to understand and reduce The error rate of myself and the team also reduces the communication cost.
In insertsql, try to add aliases to the queried fields, especially tables with many fields, for easy identification.
Hour dimension:

--分区
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.created.files=150000;
--hive压缩
set hive.exec.compress.intermediate=true;
set hive.exec.compress.output=true;
--写入时压缩生效
set hive.exec.orc.compression.strategy=COMPRESSION;

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select
    count(distinct sid)        as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip)         as ip_total,
    area,
    '-1' as seo_source,
    '-1' as origin_channel,
    hourinfo,
    quarterinfo,
    concat(yearinfo,'-',monthinfo,'-',dayinfo,' ',hourinfo) as time_str,
    '-1' as from_url,
    '1' as grouptype,
    '1' as time_type,
    yearinfo, monthinfo, dayinfo
from itcast_dwd.visit_consult_dwd
group by area, yearinfo, quarterinfo, monthinfo, dayinfo, hourinfo;

Day dimension:

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select 
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    area,
    '-1' as seo_source,
    '-1' as origin_channel,
    '-1' as hourinfo,
    quarterinfo,
    concat(yearinfo,'-',monthinfo,'-',dayinfo) as time_str,
    '-1' as from_url,
    '1' as grouptype,
    '2' as time_type,
    yearinfo, monthinfo, dayinfo
from itcast_dwd.visit_consult_dwd 
group by area, yearinfo, quarterinfo, monthinfo, dayinfo;

Month dimension:

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select 
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    area,
    '-1' as seo_source,
    '-1' as origin_channel,
    '-1' as hourinfo,
    quarterinfo,
    concat(yearinfo,'-',monthinfo) as time_str,
    '-1' as from_url,
    '1' as grouptype,
    '3' as time_type,
    yearinfo, monthinfo,
    '-1' as dayinfo
from itcast_dwd.visit_consult_dwd 
group by area, yearinfo, quarterinfo, monthinfo;

Quarterly dimension:

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select 
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    area,
    '-1' as seo_source,
    '-1' as origin_channel,
    '-1' as hourinfo,
    quarterinfo,
    concat(yearinfo,'-Q',quarterinfo) as time_str,
    '-1' as from_url,
    '1' as grouptype,
    '4' as time_type,
    yearinfo,
    '-1' as monthinfo,
    '-1' as dayinfo
from itcast_dwd.visit_consult_dwd 
group by area, yearinfo, quarterinfo;

Year dimension:

INSERT  INTO TABLE itcast_dws.visit_dws PARTITION (yearinfo,monthinfo,dayinfo)
select 
   COUNT(DISTINCT wce.sid) as sid_total,
   COUNT(DISTINCT wce.session_id) as sessionid_total,
   COUNT(DISTINCT wce.ip) as ip_total,
   wce.area as area,
   '-1' as seo_source,
   '-1' as origin_channel,
   '-1' as hourinfo,
   '-1' as quarterinfo,
   wce.yearinfo as time_str,
   '-1' as from_url,
   '1' as groupType,
   '5' as time_type,
   wce.yearinfo as yearinfo,
   '-1' as monthinfo,
   '-1' as dayinfo
from itcast_dwd.visit_consult_dwd wce
group by wce.area,wce.yearinfo;

4.6.3.2.2 Search source grouping

Hour dimension:

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select 
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    '-1' as area,
    seo_source,
    '-1' as origin_channel,
    hourinfo,
    quarterinfo,
    concat(yearinfo,'-',monthinfo,'-',dayinfo,' ',hourinfo) as time_str,
    '-1' as from_url,
    '2' as grouptype,
    '1' as time_type,
    yearinfo, monthinfo, dayinfo
from itcast_dwd.visit_consult_dwd 
group by seo_source, yearinfo, quarterinfo, monthinfo, dayinfo, hourinfo;

Day dimension:

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    '-1' as area,
    seo_source,
    '-1' as origin_channel,
    '-1' as hourinfo,
    quarterinfo,
    concat(yearinfo,'-',monthinfo,'-',dayinfo) as time_str,
    '-1' as from_url,
    '2' as grouptype,
    '2' as time_type,
    yearinfo, monthinfo, dayinfo
from itcast_dwd.visit_consult_dwd
group by seo_source, yearinfo, quarterinfo, monthinfo, dayinfo;

Month dimension:

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    '-1' as area,
    seo_source,
    '-1' as origin_channel,
    '-1' as hourinfo,
    quarterinfo,
    concat(yearinfo,'-',monthinfo) as time_str,
    '-1' as from_url,
    '2' as grouptype,
    '3' as time_type,
    yearinfo, monthinfo,
    '-1' as dayinfo
from itcast_dwd.visit_consult_dwd
group by seo_source, yearinfo, quarterinfo, monthinfo;

Quarterly dimension:

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    '-1' as area,
    seo_source,
    '-1' as origin_channel,
    '-1' as hourinfo,
    quarterinfo,
    concat(yearinfo,'-Q',quarterinfo) as time_str,
    '-1' as from_url,
    '2' as grouptype,
    '4' as time_type,
    yearinfo,
    '-1' as monthinfo,
    '-1' as dayinfo
from itcast_dwd.visit_consult_dwd
group by seo_source, yearinfo, quarterinfo;

Year dimension:

INSERT  INTO TABLE itcast_dws.visit_dws PARTITION (yearinfo,monthinfo,dayinfo)
select
   COUNT(DISTINCT wce.sid) as sid_total,
   COUNT(DISTINCT wce.session_id) as sessionid_total,
   COUNT(DISTINCT wce.ip) as ip_total,
   '-1' as  area,
   seo_source,
   '-1' as origin_channel,
   '-1' as hourinfo,
   '-1' as quarterinfo,
   wce.yearinfo as time_str,
   '-1' as from_url,
   '2' as groupType,
   '5' as time_type,
   wce.yearinfo as yearinfo,
   '-1' as monthinfo,
   '-1' as dayinfo
from itcast_dwd.visit_consult_dwd wce
group by wce.seo_source,wce.yearinfo;

4.6.3.2.3 Source channel grouping

Hour dimension:

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    '-1' as area,
    '-1' as seo_source,
    origin_channel,
    hourinfo,
    quarterinfo,
    concat(yearinfo,'-',monthinfo,'-',dayinfo,' ',hourinfo) as time_str,
    '-1' as from_url,
    '3' as grouptype,
    '1' as time_type,
    yearinfo, monthinfo, dayinfo
from itcast_dwd.visit_consult_dwd
group by origin_channel, yearinfo, quarterinfo, monthinfo, dayinfo, hourinfo;

Day dimension:

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    '-1' as area,
    '-1' as seo_source,
    origin_channel,
    '-1' as hourinfo,
    quarterinfo,
    concat(yearinfo,'-',monthinfo,'-',dayinfo) as time_str,
    '-1' as from_url,
    '3' as grouptype,
    '2' as time_type,
    yearinfo, monthinfo, dayinfo
from itcast_dwd.visit_consult_dwd
group by origin_channel, yearinfo, quarterinfo, monthinfo, dayinfo;

Month dimension:

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    '-1' as area,
    '-1' as seo_source,
    origin_channel,
    '-1' as hourinfo,
    quarterinfo,
    concat(yearinfo,'-',monthinfo) as time_str,
    '-1' as from_url,
    '3' as grouptype,
    '3' as time_type,
    yearinfo, monthinfo,
    '-1' as dayinfo
from itcast_dwd.visit_consult_dwd
group by origin_channel, yearinfo, quarterinfo, monthinfo;

Quarterly dimension:

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    '-1' as area,
    '-1' as seo_source,
    origin_channel,
    '-1' as hourinfo,
    quarterinfo,
    concat(yearinfo,'-Q',quarterinfo) as time_str,
    '-1' as from_url,
    '3' as grouptype,
    '4' as time_type,
    yearinfo,
    '-1' as monthinfo,
    '-1' as dayinfo
from itcast_dwd.visit_consult_dwd
group by origin_channel, yearinfo, quarterinfo;

Year dimension:

INSERT  INTO TABLE itcast_dws.visit_dws PARTITION (yearinfo,monthinfo,dayinfo)
select
   COUNT(DISTINCT wce.sid) as sid_total,
   COUNT(DISTINCT wce.session_id) as sessionid_total,
   COUNT(DISTINCT wce.ip) as ip_total,
   '-1' as  area,
   '-1' as seo_source,
   origin_channel,
   '-1' as hourinfo,
   '-1' as quarterinfo,
   wce.yearinfo as time_str,
   '-1' as from_url,
   '3' as groupType,
   '5' as time_type,
   wce.yearinfo as yearinfo,
   '-1' as monthinfo,
   '-1' as dayinfo
from itcast_dwd.visit_consult_dwd wce
group by wce.origin_channel,wce.yearinfo;

4.6.3.2.4 Session source page grouping

Hour dimension:

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select 
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    '-1' as area,
    '-1' as seo_source,
    '-1' as origin_channel,
    hourinfo,
    quarterinfo,
    concat(yearinfo,'-',monthinfo,'-',dayinfo,' ',hourinfo) as time_str,
    from_url,
    '4' as grouptype,
    '1' as time_type,
    yearinfo, monthinfo, dayinfo
from itcast_dwd.visit_consult_dwd 
group by from_url, yearinfo, quarterinfo, monthinfo, dayinfo, hourinfo;

Day dimension:

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    '-1' as area,
    '-1' as seo_source,
    '-1' as origin_channel,
    '-1' as hourinfo,
    quarterinfo,
    concat(yearinfo,'-',monthinfo,'-',dayinfo) as time_str,
    from_url,
    '4' as grouptype,
    '2' as time_type,
    yearinfo, monthinfo, dayinfo
from itcast_dwd.visit_consult_dwd
group by from_url, yearinfo, quarterinfo, monthinfo, dayinfo;

Month dimension:

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    '-1' as area,
    '-1' as seo_source,
    '-1' as origin_channel,
    '-1' as hourinfo,
    quarterinfo,
    concat(yearinfo,'-',monthinfo) as time_str,
    from_url,
    '4' as grouptype,
    '3' as time_type,
    yearinfo, monthinfo,
    '-1' as dayinfo
from itcast_dwd.visit_consult_dwd
group by from_url, yearinfo, quarterinfo, monthinfo;

Quarterly dimension:

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    '-1' as area,
    '-1' as seo_source,
    '-1' as origin_channel,
    '-1' as hourinfo,
    quarterinfo,
    concat(yearinfo,'-Q',quarterinfo) as time_str,
    from_url,
    '4' as grouptype,
    '4' as time_type,
    yearinfo,
    '-1' as monthinfo,
    '-1' as dayinfo
from itcast_dwd.visit_consult_dwd
group by from_url, yearinfo, quarterinfo;

Year dimension:

INSERT  INTO TABLE itcast_dws.visit_dws PARTITION (yearinfo,monthinfo,dayinfo)
select
   COUNT(DISTINCT wce.sid) as sid_total,
   COUNT(DISTINCT wce.session_id) as sessionid_total,
   COUNT(DISTINCT wce.ip) as ip_total,
   '-1' as  area,
   '-1' as seo_source,
   '-1' as origin_channel,
   '-1' as hourinfo,
   '-1' as quarterinfo,
   wce.yearinfo as time_str,
   from_url,
   '4' as groupType,
   '5' as time_type,
   wce.yearinfo as yearinfo,
   '-1' as monthinfo,
   '-1' as dayinfo
from itcast_dwd.visit_consult_dwd wce
group by wce.from_url,wce.yearinfo;

4.6.3.2.5 Total Visits

Hour (basic data of the hour interval)
Because hourly data can be summed directly, OLAP applications can perform simple sum operations on the basis of hourly data to obtain interval hour data.

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select 
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    '-1' as area,
    '-1' as seo_source,
    '-1' as origin_channel,
    hourinfo,
    quarterinfo,
    concat(yearinfo,'-',monthinfo,'-',dayinfo,' ',hourinfo) as time_str,
    '-1' as from_url,
    '5' as grouptype,
    '1' as time_type,
    yearinfo, monthinfo, dayinfo
from itcast_dwd.visit_consult_dwd 
group by yearinfo, quarterinfo, monthinfo, dayinfo, hourinfo;

day

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    '-1' as area,
    '-1' as seo_source,
    '-1' as origin_channel,
    '-1' as hourinfo,
    quarterinfo,
    concat(yearinfo,'-',monthinfo,'-',dayinfo) as time_str,
    '-1' as from_url,
    '5' as grouptype,
    '2' as time_type,
    yearinfo, monthinfo, dayinfo
from itcast_dwd.visit_consult_dwd
group by yearinfo, quarterinfo, monthinfo, dayinfo;

month

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    '-1' as area,
    '-1' as seo_source,
    '-1' as origin_channel,
    '-1' as hourinfo,
    quarterinfo,
    concat(yearinfo,'-',monthinfo) as time_str,
    '-1' as from_url,
    '5' as grouptype,
    '3' as time_type,
    yearinfo, monthinfo,
    '-1' as dayinfo
from itcast_dwd.visit_consult_dwd
group by yearinfo, quarterinfo, monthinfo;

Quarterly

insert into itcast_dws.visit_dws partition (yearinfo, monthinfo, dayinfo)
select
    count(distinct sid) as sid_total,
    count(distinct session_id) as session_total,
    count(distinct ip) as ip_total,
    '-1' as area,
    '-1' as seo_source,
    '-1' as origin_channel,
    '-1' as hourinfo,
    quarterinfo,
    concat(yearinfo,'-Q',quarterinfo) as time_str,
    '-1' as from_url,
    '5' as grouptype,
    '4' as time_type,
    yearinfo,
    '-1' as monthinfo,
    '-1' as dayinfo
from itcast_dwd.visit_consult_dwd
group by yearinfo, quarterinfo;

year

INSERT  INTO TABLE itcast_dws.visit_dws PARTITION (yearinfo,monthinfo,dayinfo)
select
   COUNT(DISTINCT wce.sid) as sid_total,
   COUNT(DISTINCT wce.session_id) as sessionid_total,
   COUNT(DISTINCT wce.ip) as ip_total,
   '-1' as  area,
   '-1' as seo_source,
   '-1' as origin_channel,
   '-1' as hourinfo,
   '-1' as quarterinfo,
   wce.yearinfo as time_str,
   '-1' as from_url,
   '5' as groupType,
   '5' as time_type,
   wce.yearinfo as yearinfo,
   '-1' as monthinfo,
   '-1' as dayinfo
from itcast_dwd.visit_consult_dwd wce
group by wce.yearinfo;

4.6.4 Export data

4.6.4.1 Create mysql table

create database scrm_bi default character set utf8mb4 collate utf8mb4_general_ci;

CREATE TABLE `itcast_visit` (
  sid_total int(11) COMMENT '根据sid去重求count',
  sessionid_total int(11) COMMENT '根据sessionid去重求count',
  ip_total int(11) COMMENT '根据IP去重求count',
  area varchar(32) COMMENT '区域信息',
  seo_source varchar(32) COMMENT '搜索来源',
  origin_channel varchar(32) COMMENT '来源渠道',
  hourinfo varchar(32) COMMENT '小时信息',
  quarterinfo varchar(32) COMMENT '季度',
  time_str varchar(32) COMMENT '时间明细',
  from_url varchar(32) comment '会话来源页面',
  groupType varchar(32) COMMENT '产品属性类型:1.地区;2.搜索来源;3.来源渠道;4.会话来源页面;5.总访问量',
  time_type varchar(32) COMMENT '时间聚合类型:1、按小时聚合;2、按天聚合;3、按月聚合;4、按季度聚合;5、按年聚合;',
  yearinfo varchar(32) COMMENT '年信息',
  monthinfo varchar(32) COMMENT '月信息',
  dayinfo varchar(32) COMMENT '日信息'
);

4.6.4.2 Execute sqoop export script

sqoop export \
--connect "jdbc:mysql://192.168.52.150:3306/scrm_bi?useUnicode=true&characterEncoding=utf-8" \
--username root \
--password '123456' \
--table itcast_visit \
--hcatalog-database itcast_dws \
--hcatalog-table visit_dws \
-m 100

Guess you like

Origin blog.csdn.net/xianyu120/article/details/111686465