Data center case of Hudi data lake technology


1 Case Architecture

insert image description here

This case is based on the integration of Flink SQL and Hudi. The MySQL database business data is collected and stored in the Hudi table in real time. Presto and Flink SQL are used to perform offline query analysis and stream query data respectively. Finally, the report is stored in the MySQL database and finebi is used to integrate. Visual display.

insert image description here

1. MySQL database:
store educational customer business data and offline real-time analysis report results, and connect with visual FineBI tool display.

2. Flink SQL engine
Use CDC in Flink SQL to collect MySQL database table data to Hudi table in real time. In addition, integrate Hudi and MySQL based on Flink SQL Connector, data storage and query.

3. Apache Hudi: Data Lake Framework
Education business data is finally stored in the Hudi table (underlying storage: HDFS distributed file system), unified management of data files, and later integration with Spark and Hive for business indicator analysis.

4. Presto analysis engine
An open source distributed SQL query engine by Facebook, suitable for interactive analysis and query, the data volume supports GB to PB bytes.
In this case, data is loaded directly from the Hudi table, which relies on Hive MetaStore to manage metadata. Among them, Presto can integrate multiple data sources to facilitate data interactive processing.

2 business data

The actual business data of this case comes from the business data (consultation, visit, registration, browsing, etc.) generated by the actual customer, and is stored in the MySQL database: oldlu_nev, using the business table:
insert image description here

Start the MySQL database, log in through the command line, first create the database, then create the table, and finally import the data.

[root@node1 ~]# mysql -uroot -p123456

CREATE DATABASE IF NOT EXISTS oldlu_nev;
USE oldlu_nev;

2.1 Customer Information Form

Customer information table: customer, create table DDL statement:

CREATE TABLE IF NOT EXISTS oldlu_nev.customer (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `customer_relationship_id` int(11) DEFAULT NULL COMMENT '当前意向id',
  `create_date_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `update_date_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '最后更新时间',
  `deleted` bit(1) NOT NULL DEFAULT b'0' COMMENT '是否被删除(禁用)',
  `name` varchar(128) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT '姓名',
  `idcard` varchar(24) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT '' COMMENT '身份证号',
  `birth_year` int(5) DEFAULT NULL COMMENT '出生年份',
  `gender` varchar(8) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT 'MAN' COMMENT '性别',
  `phone` varchar(24) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT '手机号',
  `wechat` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT '' COMMENT '微信',
  `qq` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT '' COMMENT 'qq号',
  `email` varchar(56) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT '' COMMENT '邮箱',
  `area` varchar(128) COLLATE utf8mb4_unicode_ci DEFAULT '' COMMENT '所在区域',
  `leave_school_date` date DEFAULT NULL COMMENT '离校时间',
  `graduation_date` date DEFAULT NULL COMMENT '毕业时间',
  `bxg_student_id` varchar(64) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '博学谷学员ID,可能未关联到,不存在',
  `creator` int(11) DEFAULT NULL COMMENT '创建人ID',
  `origin_type` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL COMMENT '数据来源',
  `origin_channel` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL COMMENT '来源渠道',
  `tenant` int(11) NOT NULL DEFAULT '0',
  `md_id` int(11) DEFAULT '0' COMMENT '中台id',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Import customer information data into the table in advance, use the command: source

mysql> source /root/1-customer.sql ;

2.2 Customer Intent Form

Customer intent table: customer_relationship, create table DDL statement:

CREATE TABLE IF NOT EXISTS oldlu_nev.customer_relationship(
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `create_date_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `update_date_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '最后更新时间',
  `deleted` bit(1) NOT NULL DEFAULT b'0' COMMENT '是否被删除(禁用)',
  `customer_id` int(11) NOT NULL DEFAULT '0' COMMENT '所属客户id',
  `first_id` int(11) DEFAULT NULL COMMENT '第一条客户关系id',
  `belonger` int(11) DEFAULT NULL COMMENT '归属人',
  `belonger_name` varchar(10) DEFAULT NULL COMMENT '归属人姓名',
  `initial_belonger` int(11) DEFAULT NULL COMMENT '初始归属人',
  `distribution_handler` int(11) DEFAULT NULL COMMENT '分配处理人',
  `business_scrm_department_id` int(11) DEFAULT '0' COMMENT '归属部门',
  `last_visit_time` datetime DEFAULT NULL COMMENT '最后回访时间',
  `next_visit_time` datetime DEFAULT NULL COMMENT '下次回访时间',
  `origin_type` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL COMMENT '数据来源',
  `oldlu_school_id` int(11) DEFAULT NULL COMMENT '校区Id',
  `oldlu_subject_id` int(11) DEFAULT NULL COMMENT '学科Id',
  `intention_study_type` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL COMMENT '意向学习方式',
  `anticipat_signup_date` date DEFAULT NULL COMMENT '预计报名时间',
  `level` varchar(8) DEFAULT NULL COMMENT '客户级别',
  `creator` int(11) DEFAULT NULL COMMENT '创建人',
  `current_creator` int(11) DEFAULT NULL COMMENT '当前创建人:初始==创建人,当在公海拉回时为 拉回人',
  `creator_name` varchar(32) DEFAULT '' COMMENT '创建者姓名',
  `origin_channel` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL COMMENT '来源渠道',
  `comment` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT '' COMMENT '备注',
  `first_customer_clue_id` int(11) DEFAULT '0' COMMENT '第一条线索id',
  `last_customer_clue_id` int(11) DEFAULT '0' COMMENT '最后一条线索id',
  `process_state` varchar(32) DEFAULT NULL COMMENT '处理状态',
  `process_time` datetime DEFAULT NULL COMMENT '处理状态变动时间',
  `payment_state` varchar(32) DEFAULT NULL COMMENT '支付状态',
  `payment_time` datetime DEFAULT NULL COMMENT '支付状态变动时间',
  `signup_state` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL COMMENT '报名状态',
  `signup_time` datetime DEFAULT NULL COMMENT '报名时间',
  `notice_state` varchar(32) DEFAULT NULL COMMENT '通知状态',
  `notice_time` datetime DEFAULT NULL COMMENT '通知状态变动时间',
  `lock_state` bit(1) DEFAULT b'0' COMMENT '锁定状态',
  `lock_time` datetime DEFAULT NULL COMMENT '锁定状态修改时间',
  `oldlu_clazz_id` int(11) DEFAULT NULL COMMENT '所属ems班级id',
  `oldlu_clazz_time` datetime DEFAULT NULL COMMENT '报班时间',
  `payment_url` varchar(1024) DEFAULT '' COMMENT '付款链接',
  `payment_url_time` datetime DEFAULT NULL COMMENT '支付链接生成时间',
  `ems_student_id` int(11) DEFAULT NULL COMMENT 'ems的学生id',
  `delete_reason` varchar(64) DEFAULT NULL COMMENT '删除原因',
  `deleter` int(11) DEFAULT NULL COMMENT '删除人',
  `deleter_name` varchar(32) DEFAULT NULL COMMENT '删除人姓名',
  `delete_time` datetime DEFAULT NULL COMMENT '删除时间',
  `course_id` int(11) DEFAULT NULL COMMENT '课程ID',
  `course_name` varchar(64) DEFAULT NULL COMMENT '课程名称',
  `delete_comment` varchar(255) DEFAULT '' COMMENT '删除原因说明',
  `close_state` varchar(32) DEFAULT NULL COMMENT '关闭装填',
  `close_time` datetime DEFAULT NULL COMMENT '关闭状态变动时间',
  `appeal_id` int(11) DEFAULT NULL COMMENT '申诉id',
  `tenant` int(11) NOT NULL DEFAULT '0' COMMENT '租户',
  `total_fee` decimal(19,0) DEFAULT NULL COMMENT '报名费总金额',
  `belonged` int(11) DEFAULT NULL COMMENT '小周期归属人',
  `belonged_time` datetime DEFAULT NULL COMMENT '归属时间',
  `belonger_time` datetime DEFAULT NULL COMMENT '归属时间',
  `transfer` int(11) DEFAULT NULL COMMENT '转移人',
  `transfer_time` datetime DEFAULT NULL COMMENT '转移时间',
  `follow_type` int(4) DEFAULT '0' COMMENT '分配类型,0-自动分配,1-手动分配,2-自动转移,3-手动单个转移,4-手动批量转移,5-公海领取',
  `transfer_bxg_oa_account` varchar(64) DEFAULT NULL COMMENT '转移到博学谷归属人OA账号',
  `transfer_bxg_belonger_name` varchar(64) DEFAULT NULL COMMENT '转移到博学谷归属人OA姓名',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;

Import customer intention data into the table in advance, use the command: source

mysql> source /root/2-customer_relationship.sql ;

2.3 Customer lead form

Customer clue table: customer_clue, create table DDL statement:

CREATE TABLE IF NOT EXISTS oldlu_nev.customer_clue(
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `create_date_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `update_date_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '最后更新时间',
  `deleted` bit(1) NOT NULL DEFAULT b'0' COMMENT '是否被删除(禁用)',
  `customer_id` int(11) DEFAULT NULL COMMENT '客户id',
  `customer_relationship_id` int(11) DEFAULT NULL COMMENT '客户关系id',
  `session_id` varchar(48) COLLATE utf8_bin DEFAULT '' COMMENT '七陌会话id',
  `sid` varchar(48) COLLATE utf8_bin DEFAULT '' COMMENT '访客id',
  `status` varchar(16) COLLATE utf8_bin DEFAULT '' COMMENT '状态(undeal待领取 deal 已领取 finish 已关闭 changePeer 已流转)',
  `user` varchar(16) COLLATE utf8_bin DEFAULT '' COMMENT '所属坐席',
  `create_time` datetime DEFAULT NULL COMMENT '七陌创建时间',
  `platform` varchar(16) COLLATE utf8_bin DEFAULT '' COMMENT '平台来源 (pc-网站咨询|wap-wap咨询|sdk-app咨询|weixin-微信咨询)',
  `s_name` varchar(32) COLLATE utf8_bin DEFAULT '' COMMENT '用户名称',
  `seo_source` varchar(255) COLLATE utf8_bin DEFAULT '' COMMENT '搜索来源',
  `seo_keywords` varchar(255) COLLATE utf8_bin DEFAULT '' COMMENT '关键字',
  `ip` varchar(48) COLLATE utf8_bin DEFAULT '' COMMENT 'IP地址',
  `referrer` text COLLATE utf8_bin COMMENT '上级来源页面',
  `from_url` text COLLATE utf8_bin COMMENT '会话来源页面',
  `landing_page_url` text COLLATE utf8_bin COMMENT '访客着陆页面',
  `url_title` varchar(1024) COLLATE utf8_bin DEFAULT '' COMMENT '咨询页面title',
  `to_peer` varchar(255) COLLATE utf8_bin DEFAULT '' COMMENT '所属技能组',
  `manual_time` datetime DEFAULT NULL COMMENT '人工开始时间',
  `begin_time` datetime DEFAULT NULL COMMENT '坐席领取时间 ',
  `reply_msg_count` int(11) DEFAULT '0' COMMENT '客服回复消息数',
  `total_msg_count` int(11) DEFAULT '0' COMMENT '消息总数',
  `msg_count` int(11) DEFAULT '0' COMMENT '客户发送消息数',
  `comment` varchar(1024) COLLATE utf8_bin DEFAULT '' COMMENT '备注',
  `finish_reason` varchar(255) COLLATE utf8_bin DEFAULT '' COMMENT '结束类型',
  `finish_user` varchar(32) COLLATE utf8_bin DEFAULT '' COMMENT '结束坐席',
  `end_time` datetime DEFAULT NULL COMMENT '会话结束时间',
  `platform_description` varchar(255) COLLATE utf8_bin DEFAULT '' COMMENT '客户平台信息',
  `browser_name` varchar(255) COLLATE utf8_bin DEFAULT '' COMMENT '浏览器名称',
  `os_info` varchar(255) COLLATE utf8_bin DEFAULT '' COMMENT '系统名称',
  `area` varchar(255) COLLATE utf8_bin DEFAULT NULL COMMENT '区域',
  `country` varchar(16) COLLATE utf8_bin DEFAULT '' COMMENT '所在国家',
  `province` varchar(16) COLLATE utf8_bin DEFAULT '' COMMENT '省',
  `city` varchar(255) COLLATE utf8_bin DEFAULT '' COMMENT '城市',
  `creator` int(11) DEFAULT '0' COMMENT '创建人',
  `name` varchar(64) COLLATE utf8_bin DEFAULT '' COMMENT '客户姓名',
  `idcard` varchar(24) COLLATE utf8_bin DEFAULT '' COMMENT '身份证号',
  `phone` varchar(24) COLLATE utf8_bin DEFAULT '' COMMENT '手机号',
  `oldlu_school_id` int(11) DEFAULT NULL COMMENT '校区Id',
  `oldlu_school` varchar(128) COLLATE utf8_bin DEFAULT '' COMMENT '校区',
  `oldlu_subject_id` int(11) DEFAULT NULL COMMENT '学科Id',
  `oldlu_subject` varchar(128) COLLATE utf8_bin DEFAULT '' COMMENT '学科',
  `wechat` varchar(32) COLLATE utf8_bin DEFAULT '' COMMENT '微信',
  `qq` varchar(32) COLLATE utf8_bin DEFAULT '' COMMENT 'qq号',
  `email` varchar(56) COLLATE utf8_bin DEFAULT '' COMMENT '邮箱',
  `gender` varchar(8) COLLATE utf8_bin DEFAULT 'MAN' COMMENT '性别',
  `level` varchar(8) COLLATE utf8_bin DEFAULT NULL COMMENT '客户级别',
  `origin_type` varchar(32) COLLATE utf8_bin DEFAULT '' COMMENT '数据来源渠道',
  `information_way` varchar(32) COLLATE utf8_bin DEFAULT NULL COMMENT '资讯方式',
  `working_years` date DEFAULT NULL COMMENT '开始工作时间',
  `technical_directions` varchar(255) COLLATE utf8_bin DEFAULT '' COMMENT '技术方向',
  `customer_state` varchar(32) COLLATE utf8_bin DEFAULT '' COMMENT '当前客户状态',
  `valid` bit(1) DEFAULT b'0' COMMENT '该线索是否是网资有效线索',
  `anticipat_signup_date` date DEFAULT NULL COMMENT '预计报名时间',
  `clue_state` varchar(32) COLLATE utf8_bin DEFAULT 'NOT_SUBMIT' COMMENT '线索状态',
  `scrm_department_id` int(11) DEFAULT NULL COMMENT 'SCRM内部部门id',
  `superior_url` text COLLATE utf8_bin COMMENT '诸葛获取上级页面URL',
  `superior_source` varchar(1024) COLLATE utf8_bin DEFAULT NULL COMMENT '诸葛获取上级页面URL标题',
  `landing_url` text COLLATE utf8_bin COMMENT '诸葛获取着陆页面URL',
  `landing_source` varchar(1024) COLLATE utf8_bin DEFAULT NULL COMMENT '诸葛获取着陆页面URL来源',
  `info_url` text COLLATE utf8_bin COMMENT '诸葛获取留咨页URL',
  `info_source` varchar(255) COLLATE utf8_bin DEFAULT NULL COMMENT '诸葛获取留咨页URL标题',
  `origin_channel` varchar(32) COLLATE utf8_bin DEFAULT '' COMMENT '投放渠道',
  `course_id` int(32) DEFAULT NULL,
  `course_name` varchar(255) COLLATE utf8_bin DEFAULT NULL,
  `zhuge_session_id` varchar(500) COLLATE utf8_bin DEFAULT NULL,
  `is_repeat` int(4) NOT NULL DEFAULT '0' COMMENT '是否重复线索(手机号维度) 0:正常 1:重复',
  `tenant` int(11) NOT NULL DEFAULT '0' COMMENT '租户id',
  `activity_id` varchar(16) COLLATE utf8_bin DEFAULT NULL COMMENT '活动id',
  `activity_name` varchar(64) COLLATE utf8_bin DEFAULT NULL COMMENT '活动名称',
  `follow_type` int(4) DEFAULT '0' COMMENT '分配类型,0-自动分配,1-手动分配,2-自动转移,3-手动单个转移,4-手动批量转移,5-公海领取',
  `shunt_mode_id` int(11) DEFAULT NULL COMMENT '匹配到的技能组id',
  `shunt_employee_group_id` int(11) DEFAULT NULL COMMENT '所属分流员工组',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

To pre-import customer lead table data into the table, use the command: source

mysql> source /root/3-customer_clue.sql;

2.4 Lead Appeal Form

Lead appeal form: customer_appeal, create table DDL statement:

CREATE TABLE IF NOT EXISTS oldlu_nev.customer_appeal
(
  id int auto_increment primary key COMMENT '主键',
  customer_relationship_first_id int not NULL COMMENT '第一条客户关系id',
  employee_id int NULL COMMENT '申诉人',
  employee_name varchar(64) NULL COMMENT '申诉人姓名',
  employee_department_id int NULL COMMENT '申诉人部门',
  employee_tdepart_id int NULL COMMENT '申诉人所属部门',
  appeal_status int(1) not NULL COMMENT '申诉状态,0:待稽核 1:无效 2:有效',
  audit_id int NULL COMMENT '稽核人id',
  audit_name varchar(255) NULL COMMENT '稽核人姓名',
  audit_department_id int NULL COMMENT '稽核人所在部门',
  audit_department_name varchar(255) NULL COMMENT '稽核人部门名称',
  audit_date_time datetime NULL COMMENT '稽核时间',
  create_date_time datetime DEFAULT CURRENT_TIMESTAMP NULL COMMENT '创建时间(申诉时间)',
  update_date_time timestamp DEFAULT CURRENT_TIMESTAMP NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
  deleted bit DEFAULT b'0'  not NULL COMMENT '删除标志位',
  tenant int DEFAULT 0 not NULL
)ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

To pre-import lead complaint data into the table, use the command: source

mysql> source /root/4-customer_appeal.sql ;

2.5 Customer Visit Consultation Record Form

Customer access consultation record table: web_chat_ems, create table DDL statement:

create table IF NOT EXISTS oldlu_nev.web_chat_ems(
  id int auto_increment primary key comment '主键' ,
  create_date_time timestamp null comment '数据创建时间',
  session_id varchar(48) default '' not null comment '七陌sessionId',
  sid varchar(48) collate utf8_bin  default '' not null comment '访客id',
  create_time datetime null comment '会话创建时间',
  seo_source varchar(255) collate utf8_bin default '' null comment '搜索来源',
  seo_keywords varchar(512) collate utf8_bin default '' null comment '关键字',
  ip varchar(48) collate utf8_bin  default '' null comment 'IP地址',
  area varchar(255) collate utf8_bin default '' null comment '地域',
  country varchar(16) collate utf8_bin  default '' null comment '所在国家',
  province varchar(16) collate utf8_bin  default '' null comment '省',
  city varchar(255) collate utf8_bin default '' null comment '城市',
  origin_channel varchar(32) collate utf8_bin  default '' null comment '投放渠道',
  user varchar(255) collate utf8_bin default '' null comment '所属坐席',
  manual_time datetime null comment '人工开始时间',
  begin_time datetime null comment '坐席领取时间 ',
  end_time datetime null comment '会话结束时间',
  last_customer_msg_time_stamp datetime null comment '客户最后一条消息的时间',
  last_agent_msg_time_stamp datetime null comment '坐席最后一下回复的时间',
  reply_msg_count int(12) default 0  null comment '客服回复消息数',
  msg_count int(12) default 0  null comment '客户发送消息数',
  browser_name varchar(255) collate utf8_bin default '' null comment '浏览器名称',
  os_info varchar(255) collate utf8_bin default '' null comment '系统名称'
);

To pre-import access consultation records to the table, use the command: source

mysql> source /root/5-web_chat_ems.sql;

3 Flink CDC real-time data collection

Flink 1.11 introduces Flink SQL CDC, which facilitates the real-time collection of RDBMS table data to storage systems, such as Hudi tables, among which the MySQL CDC connector allows reading snapshot data and incremental data from the MySQL database.

insert image description here

3.1 Open MySQL binlog

For MySQL CDC, you need to enable the MySQL database binlog first, and then restart the MySQL database service.
The first step, open the MySQL binlog log

[root@node1 ~]# vim /etc/my.cnf [mysqld]下面添加内容:
server-id=2
log-bin=mysql-bin
binlog_format=row
expire_logs_days=15
binlog_row_image=full

insert image description here

The second step, restart MySQL Server

service mysqld restart

Log in to the MySQL Client command line to check whether it takes effect.

insert image description here

The third step is to download the Flink CDC MySQL Jar package.
Since the Flink 1.12.2 version is used, the Flink CDC version: 1.3.0 is currently supported, and the maven dependency is added:

<!-- https://mvnrepository.com/artifact/com.alibaba.ververica/flink-connector-mysql-cdc -->
<dependency>
    <groupId>com.alibaba.ververica</groupId>
    <artifactId>flink-connector-mysql-cdc</artifactId>
    <version>1.3.0</version>
</dependency>

If you use Flink SQL Client, you need to put the jar package in the $FLINK_HOME/lib directory:

insert image description here

3.2 Environment preparation

For real-time data collection, you can write Java programs and run DDL statements directly.
Method 1: Start the Flink SQL Client, execute and write DDL statements, and submit the Flink Job to the Standalone cluster
– start the HDFS service

hadoop-daemon.sh start namenode
hadoop-daemon.sh start datanode

– Start the Flink Standalone cluster

export HADOOP_CLASSPATH=/export/server/hadoop/bin/hadoop classpath
/export/server/flink/bin/start-cluster.sh

– Start SQL Client

/export/server/flink/bin/sql-client.sh embedded
-j /export/server/flink/lib/hudi-flink-bundle_2.12-0.9.0.jar shell

– set properties

set execution.result-mode=tableau; set
execution.checkpointing.interval=3sec;
SET execution.runtime-mode =streaming;

Method 2: Use IDEA to create a Maven project, add related dependencies, write programs, and execute DDL statements.
Rely on pom.xml to add the following content:

<repositories>
    <repository>
        <id>nexus-aliyun</id>
        <name>Nexus aliyun</name>
        <url>http://maven.aliyun.com/nexus/content/groups/public</url>
    </repository>
    <repository>
        <id>central_maven</id>
        <name>central maven</name>
        <url>https://repo1.maven.org/maven2</url>
    </repository>
    <repository>
        <id>cloudera</id>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
    <repository>
        <id>apache.snapshots</id>
        <name>Apache Development Snapshot Repository</name>
        <url>https://repository.apache.org/content/repositories/snapshots/</url>
        <releases>
            <enabled>false</enabled>
        </releases>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
    </repository>
</repositories>

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>${java.version}</maven.compiler.source>
    <maven.compiler.target>${java.version}</maven.compiler.target>
    <java.version>1.8</java.version>
    <scala.binary.version>2.12</scala.binary.version>
    <flink.version>1.12.2</flink.version>
    <hadoop.version>2.7.3</hadoop.version>
    <mysql.version>8.0.16</mysql.version>
</properties>

<dependencies>
    <!-- Flink Client -->
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-clients_${scala.binary.version}</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-java</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-runtime-web_${scala.binary.version}</artifactId>
        <version>${flink.version}</version>
    </dependency>

    <!-- Flink Table API & SQL -->
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-table-common</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-table-planner-blink_${scala.binary.version}</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-table-api-java-bridge_${scala.binary.version}</artifactId>
        <version>${flink.version}</version>
    </dependency>

    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-json</artifactId>
        <version>${flink.version}</version>
    </dependency>

    <dependency>
        <groupId>org.apache.hudi</groupId>
        <artifactId>hudi-flink-bundle_${scala.binary.version}</artifactId>
        <version>0.9.0</version>
    </dependency>

    <dependency>
        <groupId>com.alibaba.ververica</groupId>
        <artifactId>flink-connector-mysql-cdc</artifactId>
        <version>1.3.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-shaded-hadoop-2-uber</artifactId>
        <version>2.7.5-10.0</version>
    </dependency>

    <!-- MySQL-->
    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>${mysql.version}</version>
    </dependency>

    <!-- slf4j及log4j -->
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-log4j12</artifactId>
        <version>1.7.7</version>
        <scope>runtime</scope>
    </dependency>
    <dependency>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
        <version>1.2.17</version>
        <scope>runtime</scope>
    </dependency>

</dependencies>

<build>
    <sourceDirectory>src/main/java</sourceDirectory>
    <testSourceDirectory>src/test/java</testSourceDirectory>
    <plugins>
        <!-- 编译插件 -->
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.5.1</version>
            <configuration>
                <source>1.8</source>
                <target>1.8</target>
                <!--<encoding>${project.build.sourceEncoding}</encoding>-->
            </configuration>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-surefire-plugin</artifactId>
            <version>2.18.1</version>
            <configuration>
                <useFile>false</useFile>
                <disableXmlReport>true</disableXmlReport>
                <includes>
                    <include>**/*Test.*</include>
                    <include>**/*Suite.*</include>
                </includes>
            </configuration>
        </plugin>
        <!-- 打jar包插件(会包含所有依赖) -->
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>2.3</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <filters>
                            <filter>
                                <artifact>*:*</artifact>
                                <excludes>
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
                        <transformers>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                        </transformers>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

Write a program to realize real-time data acquisition and synchronization. There are three main steps: input table InputTable, output table outputTable, query and insert INSERT...SELECT statement, the schematic diagram is as follows:
insert image description here

In this case, in order to focus more on seeing the effect, start the Flink SQL Client client, write DDL and DML statements, and execute them directly.

3.3 Real-time data collection

To collect data in real time based on Flink CDC, you need to create two tables, Input and Output, and then write INSERT...SELECT to insert query statements.
insert image description here

Next, the five business data tables of the MySQL database are collected and synchronized in real time to the Hudi table (storing the HDFS file system).

3.3.1 Customer Information Form

Synchronize the customer information table [customer] data to the Hudi table, write and execute DDL and DML statements according to the above steps.
The first step, the input table InputTable

create table tbl_customer_mysql (
  id STRING PRIMARY KEY NOT ENFORCED,
  customer_relationship_id STRING,
  create_date_time STRING,
  update_date_time STRING,
  deleted STRING,
  name STRING,
  idcard STRING,
  birth_year STRING,
  gender STRING,
  phone STRING,
  wechat STRING,
  qq STRING,
  email STRING,
  area STRING,
  leave_school_date STRING,
  graduation_date STRING,
  bxg_student_id STRING,
  creator STRING,
  origin_type STRING,
  origin_channel STRING,
  tenant STRING,
  md_id STRING
)WITH (
  'connector' = 'mysql-cdc',
  'hostname' = 'node1.oldlu.cn',
  'port' = '3306',
  'username' = 'root',
  'password' = '123456',
  'server-time-zone' = 'Asia/Shanghai',
  'debezium.snapshot.mode' = 'initial',
  'database-name' = 'oldlu_nev',
  'table-name' = 'customer'
);

The second step, the output table OutputTable

CREATE TABLE edu_customer_hudi(
  id STRING PRIMARY KEY NOT ENFORCED,
  customer_relationship_id STRING,
  create_date_time STRING,
  update_date_time STRING,
  deleted STRING,
  name STRING,
  idcard STRING,
  birth_year STRING,
  gender STRING,
  phone STRING,
  wechat STRING,
  qq STRING,
  email STRING,
  area STRING,
  leave_school_date STRING,
  graduation_date STRING,
  bxg_student_id STRING,
  creator STRING,
  origin_type STRING,
  origin_channel STRING,
  tenant STRING,
  md_id STRING,
  part STRING
)
PARTITIONED BY (part)
WITH(
  'connector'='hudi',
  'path'= 'hdfs://node1.oldlu.cn:8020/ehualu/hudi-warehouse/edu_customer_hudi', 
  'table.type'= 'MERGE_ON_READ',
  'hoodie.datasource.write.recordkey.field'= 'id', 
  'write.precombine.field'= 'create_date_time',
  'write.tasks'= '1',
'read.tasks'= '1',
  'write.rate.limit'= '2000', 
  'compaction.tasks'= '1', 
  'compaction.async.enabled'= 'true',
  'compaction.trigger.strategy'= 'num_commits',
  'compaction.delta_commits'= '1',
  'changelog.enabled'= 'true'
);

The third step, insert query statement

insert into edu_customer_hudi 
select *, CAST(CURRENT_DATE AS STRING) AS part from tbl_customer_mysql;

At this time, a Flink job is generated and submitted to the Standalone cluster for operation. First, the historical data in the table is synchronized to the Hudi table, and then the incremental data is synchronized in real time.
insert image description here

3.3.2 Customer Intent Form

Synchronize the customer intent table [customer_relationship] data to the Hudi table, write and execute DDL and DML statements according to the above steps.
The first step, the input table InputTable

create table tbl_customer_relationship_mysql (
  id string PRIMARY KEY NOT ENFORCED,
  create_date_time string,
  update_date_time string,
  deleted string,
  customer_id string,
  first_id string,
  belonger string,
  belonger_name string,
  initial_belonger string,
  distribution_handler string,
  business_scrm_department_id string,
  last_visit_time string,
  next_visit_time string,
  origin_type string,
  oldlu_school_id string,
  oldlu_subject_id string,
  intention_study_type string,
  anticipat_signup_date string,
  `level` string,
  creator string,
  current_creator string,
  creator_name string,
  origin_channel string,
  `comment` string,
  first_customer_clue_id string,
  last_customer_clue_id string,
  process_state string,
  process_time string,
  payment_state string,
  payment_time string,
  signup_state string,
  signup_time string,
  notice_state string,
  notice_time string,
  lock_state string,
  lock_time string,
  oldlu_clazz_id string,
  oldlu_clazz_time string,
  payment_url string,
  payment_url_time string,
  ems_student_id string,
  delete_reason string,
  deleter string,
  deleter_name string,
  delete_time string,
  course_id string,
  course_name string,
  delete_comment string,
  close_state string,
  close_time string,
  appeal_id string,
  tenant string,
  total_fee string,
  belonged string,
  belonged_time string,
  belonger_time string,
  transfer string,
  transfer_time string,
  follow_type string,
  transfer_bxg_oa_account string,
  transfer_bxg_belonger_name string
)WITH(
  'connector' = 'mysql-cdc',
  'hostname' = 'node1.oldlu.cn',
  'port' = '3306',
  'username' = 'root',
  'password' = '123456',
  'server-time-zone' = 'Asia/Shanghai',
  'debezium.snapshot.mode' = 'initial',
  'database-name' = 'oldlu_nev',
  'table-name' = 'customer_relationship'
);

The second step, the output table OutputTable

create table edu_customer_relationship_hudi(
  id string PRIMARY KEY NOT ENFORCED,
  create_date_time string,
  update_date_time string,
  deleted string,
  customer_id string,
  first_id string,
  belonger string,
  belonger_name string,
  initial_belonger string,
  distribution_handler string,
  business_scrm_department_id string,
  last_visit_time string,
  next_visit_time string,
  origin_type string,
  oldlu_school_id string,
  oldlu_subject_id string,
  intention_study_type string,
  anticipat_signup_date string,
  `level` string,
  creator string,
  current_creator string,
  creator_name string,
  origin_channel string,
  `comment` string,
  first_customer_clue_id string,
  last_customer_clue_id string,
  process_state string,
  process_time string,
  payment_state string,
  payment_time string,
  signup_state string,
  signup_time string,
  notice_state string,
  notice_time string,
  lock_state string,
  lock_time string,
  oldlu_clazz_id string,
  oldlu_clazz_time string,
  payment_url string,
  payment_url_time string,
  ems_student_id string,
  delete_reason string,
  deleter string,
  deleter_name string,
  delete_time string,
  course_id string,
  course_name string,
  delete_comment string,
  close_state string,
  close_time string,
  appeal_id string,
  tenant string,
  total_fee string,
  belonged string,
  belonged_time string,
  belonger_time string,
  transfer string,
  transfer_time string,
  follow_type string,
  transfer_bxg_oa_account string,
  transfer_bxg_belonger_name string,
  part STRING
)
PARTITIONED BY (part)
WITH(
  'connector'='hudi',
  'path'= 'hdfs://node1.oldlu.cn:8020/ehualu/hudi-warehouse/edu_customer_relationship_hudi', 
  'table.type'= 'MERGE_ON_READ',
  'hoodie.datasource.write.recordkey.field'= 'id', 
  'write.precombine.field'= 'create_date_time',
  'write.tasks'= '1',
  'write.rate.limit'= '2000', 
  'compaction.tasks'= '1', 
  'compaction.async.enabled'= 'true',
  'compaction.trigger.strategy'= 'num_commits',
  'compaction.delta_commits'= '1',
  'changelog.enabled'= 'true'
);

The third step, insert query statement

insert into edu_customer_relationship_hudi 
select *, CAST(CURRENT_DATE AS STRING) AS part from tbl_customer_relationship_mysql;

View the HDFS file system and synchronize the full data storage Hudi directory:
insert image description here

3.3.3 Customer lead form

Synchronize the customer clue table [customer_clue] data to the Hudi table, write and execute DDL and DML statements according to the above steps.
The first step, the input table InputTable

create table tbl_customer_clue_mysql (
  id string PRIMARY KEY NOT ENFORCED,
  create_date_time string,
  update_date_time string,
  deleted string,
  customer_id string,
  customer_relationship_id string,
  session_id string,
  sid string,
  status string,
  `user` string,
  create_time string,
  platform string,
  s_name string,
  seo_source string,
  seo_keywords string,
  ip string,
  referrer string,
  from_url string,
  landing_page_url string,
  url_title string,
  to_peer string,
  manual_time string,
  begin_time string,
  reply_msg_count string,
  total_msg_count string,
  msg_count string,
  `comment` string,
  finish_reason string,
  finish_user string,
  end_time string,
  platform_description string,
  browser_name string,
  os_info string,
  area string,
  country string,
  province string,
  city string,
  creator string,
  name string,
  idcard string,
  phone string,
  oldlu_school_id string,
  oldlu_school string,
  oldlu_subject_id string,
  oldlu_subject string,
  wechat string,
  qq string,
  email string,
  gender string,
  `level` string,
  origin_type string,
  information_way string,
  working_years string,
  technical_directions string,
  customer_state string,
  valid string,
  anticipat_signup_date string,
  clue_state string,
  scrm_department_id string,
  superior_url string,
  superior_source string,
  landing_url string,
  landing_source string,
  info_url string,
  info_source string,
  origin_channel string,
  course_id string,
  course_name string,
  zhuge_session_id string,
  is_repeat string,
  tenant string,
  activity_id string,
  activity_name string,
  follow_type string,
  shunt_mode_id string,
  shunt_employee_group_id string
)WITH(
  'connector' = 'mysql-cdc',
  'hostname' = 'node1.oldlu.cn',
  'port' = '3306',
  'username' = 'root',
  'password' = '123456',
  'server-time-zone' = 'Asia/Shanghai',
  'debezium.snapshot.mode' = 'initial',
  'database-name' = 'oldlu_nev',
  'table-name' = 'customer_clue'
);

The second step, the output table OutputTable

create table edu_customer_clue_hudi (
  id string PRIMARY KEY NOT ENFORCED,
  create_date_time string,
  update_date_time string,
  deleted string,
  customer_id string,
  customer_relationship_id string,
  session_id string,
  sid string,
  status string,
  `user` string,
  create_time string,
  platform string,
  s_name string,
  seo_source string,
  seo_keywords string,
  ip string,
  referrer string,
  from_url string,
  landing_page_url string,
  url_title string,
  to_peer string,
  manual_time string,
  begin_time string,
  reply_msg_count string,
  total_msg_count string,
  msg_count string,
  `comment` string,
  finish_reason string,
  finish_user string,
  end_time string,
  platform_description string,
  browser_name string,
  os_info string,
  area string,
  country string,
  province string,
  city string,
  creator string,
  name string,
  idcard string,
  phone string,
  oldlu_school_id string,
  oldlu_school string,
  oldlu_subject_id string,
  oldlu_subject string,
  wechat string,
  qq string,
  email string,
  gender string,
  `level` string,
  origin_type string,
  information_way string,
  working_years string,
  technical_directions string,
  customer_state string,
  valid string,
  anticipat_signup_date string,
  clue_state string,
  scrm_department_id string,
  superior_url string,
  superior_source string,
  landing_url string,
  landing_source string,
  info_url string,
  info_source string,
  origin_channel string,
  course_id string,
  course_name string,
  zhuge_session_id string,
  is_repeat string,
  tenant string,
  activity_id string,
  activity_name string,
  follow_type string,
  shunt_mode_id string,
  shunt_employee_group_id string,
  part STRING
)
PARTITIONED BY (part)
WITH(
  'connector'='hudi',
  'path'= 'hdfs://node1.oldlu.cn:8020/ehualu/hudi-warehouse/edu_customer_clue_hudi', 
  'table.type'= 'MERGE_ON_READ',
  'hoodie.datasource.write.recordkey.field'= 'id', 
  'write.precombine.field'= 'create_date_time',
  'write.tasks'= '1',
  'write.rate.limit'= '2000', 
  'compaction.tasks'= '1', 
  'compaction.async.enabled'= 'true',
  'compaction.trigger.strategy'= 'num_commits',
  'compaction.delta_commits'= '1',
  'changelog.enabled'= 'true'
);

The third step, insert query statement

insert into edu_customer_clue_hudi 
select *, CAST(CURRENT_DATE AS STRING) AS part from tbl_customer_clue_mysql;

View the HDFS file system and synchronize the full data storage Hudi directory:
insert image description here

3.3.4 Customer Complaint Form

Synchronize the data of the customer appeal form [customer_appeal] to the Hudi table, and follow the above steps to write DDL and DML statements for execution.
The first step, the input table InputTable

create table tbl_customer_appeal_mysql (
  id string PRIMARY KEY NOT ENFORCED,
  customer_relationship_first_id string,
  employee_id string,
  employee_name string,
  employee_department_id string,
  employee_tdepart_id string,
  appeal_status string,
  audit_id string,
  audit_name string,
  audit_department_id string,
  audit_department_name string,
  audit_date_time string,
  create_date_time string,
  update_date_time string,
  deleted string,
  tenant string
)WITH (
  'connector' = 'mysql-cdc',
  'hostname' = 'node1.oldlu.cn',
  'port' = '3306',
  'username' = 'root',
  'password' = '123456',
  'server-time-zone' = 'Asia/Shanghai',
  'debezium.snapshot.mode' = 'initial',
  'database-name' = 'oldlu_nev',
  'table-name' = 'customer_appeal'
);

The second step, the output table OutputTable

create table edu_customer_appeal_hudi (
  id string PRIMARY KEY NOT ENFORCED,
  customer_relationship_first_id STRING,
  employee_id STRING,
  employee_name STRING,
  employee_department_id STRING,
  employee_tdepart_id STRING,
  appeal_status STRING,
  audit_id STRING,
  audit_name STRING,
  audit_department_id STRING,
  audit_department_name STRING,
  audit_date_time STRING,
  create_date_time STRING,
  update_date_time STRING,
  deleted STRING,
  tenant STRING,
  part STRING
)
PARTITIONED BY (part)
WITH(
  'connector'='hudi',
  'path'= 'hdfs://node1.oldlu.cn:8020/ehualu/hudi-warehouse/edu_customer_appeal_hudi', 
  'table.type'= 'MERGE_ON_READ',
  'hoodie.datasource.write.recordkey.field'= 'id', 
  'write.precombine.field'= 'create_date_time',
  'write.tasks'= '1',
  'write.rate.limit'= '2000', 
  'compaction.tasks'= '1', 
  'compaction.async.enabled'= 'true',
  'compaction.trigger.strategy'= 'num_commits',
  'compaction.delta_commits'= '1',
  'changelog.enabled'= 'true'
);

The third step, insert query statement

insert into edu_customer_appeal_hudi 
select *, CAST(CURRENT_DATE AS STRING) AS part from tbl_customer_appeal_mysql;

View the HDFS file system and synchronize the full data storage Hudi directory:
insert image description here

3.3.5 Customer Visit Consultation Record Form

Synchronize the customer service access consultation record table [web_chat_ems] data to the Hudi table, follow the above steps to write and execute DDL and DML statements.
The first step, the input table InputTable

create table tbl_web_chat_ems_mysql (
  id string PRIMARY KEY NOT ENFORCED,
  create_date_time string,
  session_id string,
  sid string,
  create_time string,
  seo_source string,
  seo_keywords string,
  ip string,
  area string,
  country string,
  province string,
  city string,
  origin_channel string,
  `user` string,
  manual_time string,
  begin_time string,
  end_time string,
  last_customer_msg_time_stamp string,
  last_agent_msg_time_stamp string,
  reply_msg_count string,
  msg_count string,
  browser_name string,
  os_info string
)WITH(
  'connector' = 'mysql-cdc',
  'hostname' = 'node1.oldlu.cn',
  'port' = '3306',
  'username' = 'root',
  'password' = '123456',
  'server-time-zone' = 'Asia/Shanghai',
  'debezium.snapshot.mode' = 'initial',
  'database-name' = 'oldlu_nev',
  'table-name' = 'web_chat_ems'
);

The second step, the output table OutputTable

create table edu_web_chat_ems_hudi (
  id string PRIMARY KEY NOT ENFORCED,
  create_date_time string,
  session_id string,
  sid string,
  create_time string,
  seo_source string,
  seo_keywords string,
  ip string,
  area string,
  country string,
  province string,
  city string,
  origin_channel string,
  `user` string,
  manual_time string,
  begin_time string,
  end_time string,
  last_customer_msg_time_stamp string,
  last_agent_msg_time_stamp string,
  reply_msg_count string,
  msg_count string,
  browser_name string,
  os_info string,
  part STRING
)
PARTITIONED BY (part)
WITH(
  'connector'='hudi',
  'path'= 'hdfs://node1.oldlu.cn:8020/ehualu/hudi-warehouse/edu_web_chat_ems_hudi', 
  'table.type'= 'MERGE_ON_READ',
  'hoodie.datasource.write.recordkey.field'= 'id', 
  'write.precombine.field'= 'create_date_time',
  'write.tasks'= '1',
  'write.rate.limit'= '2000', 
  'compaction.tasks'= '1', 
  'compaction.async.enabled'= 'true',
  'compaction.trigger.strategy'= 'num_commits',
  'compaction.delta_commits'= '1',
  'changelog.enabled'= 'true'
);

The third step, insert query statement

insert into edu_web_chat_ems_hudi 
select *, CAST(CURRENT_DATE AS STRING) AS part from tbl_web_chat_ems_mysql;

View the HDFS file system and synchronize the full data storage Hudi directory:
insert image description here

The collection is synchronized to the Hudi table. At this time, the five Flink jobs are still running on the Standalone cluster. If there is business data generated in each table, it will also be obtained in real time and stored in the Hudi table.
insert image description here

4 Presto Ad Hoc Analysis

Use Presto to analyze the Hudi table data, and finally store the results directly in the MySQL database table, as shown in the schematic diagram below.
insert image description here

First, create tables in Hive and associate Hudi tables
Second, integrate Presto with Hive and load Hive table data
Third, integrate Presto with MySQL to read or save data

4.1 What is Presto

Presto is an OLAP query engine based on Facebook's open source MPP architecture. It is a distributed SQL execution engine that can execute large-capacity data sets for different data sources. It is suitable for interactive analysis and query, and the data volume supports GB to PB bytes.
1. A clear architecture is a system that can run independently and does not depend on any other external systems. For example, scheduling, presto itself provides monitoring of the cluster, and can complete scheduling based on monitoring information.
2. Simple data structure, columnar storage, logical rows, most of the data can be easily converted into the data structure required by presto.
3. Abundant plug-in interfaces, perfectly connected to external storage systems, or adding custom functions.
insert image description here

Presto adopts a typical master-slave model, which consists of a Coordinator node, a Discovery Server node, and multiple Worker nodes. The Discovery Server is usually embedded in the Coordinator node.
insert image description here

1. Coordinator (master) is responsible for meta management, worker management, query analysis and scheduling
2. Worker is responsible for calculation and reading and writing
3. Discovery server, usually embedded in the coordinator node, can also be deployed separately for node heartbeat. In the following, the default discovery and coordinator share a machine.
Presto data model: adopt a three-tier table structure
insert image description here

1. catalog corresponds to a certain type of data source, such as hive data, or mysql data
2. schema corresponds to the database in mysql
3. table corresponds to the table in mysql

4.2 Presto installation and deployment

Install Presto with single-node deployment, server name: node1.oldlu.cn, IP address: 192.168.88.100.
1. JDK8 installation

java -version

insert image description here

2. Upload and decompress the Presto installation package
to create an installation directory

mkdir -p /export/server

Yum installs the upload file plugin lrzsz

yum install -y lrzsz

Upload the installation package to the /export/server directory of node1

presto-server-0.245.1.tar.gz

unzip, rename

tar -xzvf presto-server-0.245.1.tar.gz -C /export/server
ln -s presto-server-0.245.1 presto

Create configuration file storage directory

mkdir -p /export/server/presto/etc

3. Configure presto

etc/config.properties
vim /export/server/presto/etc/config.properties
内容:
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8090
query.max-memory=6GB
query.max-memory-per-node=2GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://192.168.88.100:8090

etc/jvm.config

vim /export/server/presto/etc/jvm.config
内容:
-server
-Xmx3G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError

etc/node.properties

vim /export/server/presto/etc/node.properties
内容:
node.environment=hudipresto
node.id=presto-node1
node.data-dir=/export/server/presto/data

etc/catalog/hive.properties
mkdir -p /export/server/presto/etc/catalog
vim /export/server/presto/etc/catalog/hive.properties
内容:
connector.name=hive-hadoop2
hive.metastore.uri=thrift://192.168.88.100:9083
hive.parquet.use-column-names=true
hive.config.resources=/export/server/presto/etc/catalog/core-site.xml,/export/server/presto/etc/catalog/hdfs-site.xml

etc/catalog/mysql.properties
vim /export/server/presto/etc/catalog/mysql.properties
内容:
connector.name=mysql
connection-url=jdbc:mysql://node1.oldlu.cn:3306
connection-user=root
connection-password=123456

4. Start the service
Enter the Presto installation directory and execute the script in $PRESTO_HOME/bin

/export/server/presto/bin/launcher start

Use jps to check whether the process exists, process name: PrestoServer.
insert image description here
In addition WEB UI interface:

http://192.168.88.100:8090/ui/

insert image description here

Presto CLI command line client
Download CLI client

presto-cli-0.241-executable.jar

Upload presto-cli-0.245.1-executable.jar to /export/server/presto/bin

mv presto-cli-0.245.1-executable.jar presto
chmod +x presto

CLI client start

/export/server/presto/bin/presto --server 192.168.88.100:8090

insert image description here

4.3 Hive create table

In order for Presto to analyze the data in the Hudi table, the Hudi table needs to be mapped to the Hive table. Next, create five educational customer business data tables in Hive, and map them to the Hudi table.
insert image description here

Start the HDFS service, HiveMetaStore and HiveServer service, and run the Beeline command line:

-- 启动HDFS服务
hadoop-daemon.sh start namenode 
hadoop-daemon.sh start datanode

-- Hive服务
/export/server/hive/bin/start-metastore.sh 
/export/server/hive/bin/start-hiveserver2.sh

-- 启动Beeline客户端
/export/server/hive/bin/beeline -u jdbc:hive2://node1.oldlu.cn:10000 -n root -p 123456

Set Hive local mode for easy testing:

-- 设置Hive本地模式
set hive.exec.mode.local.auto=true;
set hive.exec.mode.local.auto.tasks.max=10;
set hive.exec.mode.local.auto.inputbytes.max=50000000;

4.3.1 Create database

-- 创建数据库
CREATE DATABASE IF NOT EXISTS edu_hudi ;
-- 使用数据库
USE edu_hudi ;

4.3.2 Customer Information Form

Write a DDL statement to create a table:

CREATE EXTERNAL TABLE edu_hudi.tbl_customer(
  id string,
  customer_relationship_id string,
  create_date_time string,
  update_date_time string,
  deleted string,
  name string,
  idcard string,
  birth_year string,
  gender string,
  phone string,
  wechat string,
  qq string,
  email string,
  area string,
  leave_school_date string,
  graduation_date string,
  bxg_student_id string,
  creator string,
  origin_type string,
  origin_channel string,
  tenant string,
  md_id string
)PARTITIONED BY (day_str string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 
  '/ehualu/hudi-warehouse/edu_customer_hudi' ;
由于是分区表,所以添加分区:
ALTER TABLE edu_hudi.tbl_customer ADD IF NOT EXISTS PARTITION(day_str='2022-09-23') 
location '/ehualu/hudi-warehouse/edu_customer_hudi/2022-09-23' ;

4.3.3 Customer Intent Form

Write a DDL statement to create a table:

CREATE EXTERNAL TABLE edu_hudi.tbl_customer_relationship(
  id string,
  create_date_time string,
  update_date_time string,
  deleted string,
  customer_id string,
  first_id string,
  belonger string,
  belonger_name string,
  initial_belonger string,
  distribution_handler string,
  business_scrm_department_id string,
  last_visit_time string,
  next_visit_time string,
  origin_type string,
  oldlu_school_id string,
  oldlu_subject_id string,
  intention_study_type string,
  anticipat_signup_date string,
  `level` string,
  creator string,
  current_creator string,
  creator_name string,
  origin_channel string,
  `comment` string,
  first_customer_clue_id string,
  last_customer_clue_id string,
  process_state string,
  process_time string,
  payment_state string,
  payment_time string,
  signup_state string,
  signup_time string,
  notice_state string,
  notice_time string,
  lock_state string,
  lock_time string,
  oldlu_clazz_id string,
  oldlu_clazz_time string,
  payment_url string,
  payment_url_time string,
  ems_student_id string,
  delete_reason string,
  deleter string,
  deleter_name string,
  delete_time string,
  course_id string,
  course_name string,
  delete_comment string,
  close_state string,
  close_time string,
  appeal_id string,
  tenant string,
  total_fee string,
  belonged string,
  belonged_time string,
  belonger_time string,
  transfer string,
  transfer_time string,
  follow_type string,
  transfer_bxg_oa_account string,
  transfer_bxg_belonger_name string
)PARTITIONED BY (day_str string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 
  '/ehualu/hudi-warehouse/edu_customer_relationship_hudi' ;
由于是分区表,所以添加分区:
ALTER TABLE edu_hudi.tbl_customer_relationship ADD IF NOT EXISTS PARTITION(day_str='2022-09-23') 
location '/ehualu/hudi-warehouse/edu_customer_relationship_hudi/2022-09-23' ;

4.3.4 Customer lead form

Write a DDL statement to create a table:

CREATE EXTERNAL TABLE edu_hudi.tbl_customer_clue(
  id string,
  create_date_time string,
  update_date_time string,
  deleted string,
  customer_id string,
  customer_relationship_id string,
  session_id string,
  sid string,
  status string,
  `user` string,
  create_time string,
  platform string,
  s_name string,
  seo_source string,
  seo_keywords string,
  ip string,
  referrer string,
  from_url string,
  landing_page_url string,
  url_title string,
  to_peer string,
  manual_time string,
  begin_time string,
  reply_msg_count string,
  total_msg_count string,
  msg_count string,
  `comment` string,
  finish_reason string,
  finish_user string,
  end_time string,
  platform_description string,
  browser_name string,
  os_info string,
  area string,
  country string,
  province string,
  city string,
  creator string,
  name string,
  idcard string,
  phone string,
  oldlu_school_id string,
  oldlu_school string,
  oldlu_subject_id string,
  oldlu_subject string,
  wechat string,
  qq string,
  email string,
  gender string,
  `level` string,
  origin_type string,
  information_way string,
  working_years string,
  technical_directions string,
  customer_state string,
  valid string,
  anticipat_signup_date string,
  clue_state string,
  scrm_department_id string,
  superior_url string,
  superior_source string,
  landing_url string,
  landing_source string,
  info_url string,
  info_source string,
  origin_channel string,
  course_id string,
  course_name string,
  zhuge_session_id string,
  is_repeat string,
  tenant string,
  activity_id string,
  activity_name string,
  follow_type string,
  shunt_mode_id string,
  shunt_employee_group_id string
)
PARTITIONED BY (day_str string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 
  '/ehualu/hudi-warehouse/edu_customer_clue_hudi' ;
由于是分区表,所以添加分区:
ALTER TABLE edu_hudi.tbl_customer_clue ADD IF NOT EXISTS PARTITION(day_str='2022-09-23') 
location '/ehualu/hudi-warehouse/edu_customer_clue_hudi/2022-09-23' ;

4.3.5 Customer Complaint Form

Write a DDL statement to create a table:

CREATE EXTERNAL TABLE edu_hudi.tbl_customer_appeal(
  id string,
  customer_relationship_first_id STRING,
  employee_id STRING,
  employee_name STRING,
  employee_department_id STRING,
  employee_tdepart_id STRING,
  appeal_status STRING,
  audit_id STRING,
  audit_name STRING,
  audit_department_id STRING,
  audit_department_name STRING,
  audit_date_time STRING,
  create_date_time STRING,
  update_date_time STRING,
  deleted STRING,
  tenant STRING
)
PARTITIONED BY (day_str string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 
  '/ehualu/hudi-warehouse/edu_customer_appeal_hudi' ;
由于是分区表,所以添加分区:
ALTER TABLE edu_hudi.tbl_customer_appeal ADD IF NOT EXISTS PARTITION(day_str='2022-09-23') 
location '/ehualu/hudi-warehouse/edu_customer_appeal_hudi/2022-09-23' ;

4.3.6 Customer Visit Consultation Record Form

Write a DDL statement to create a table:

CREATE EXTERNAL TABLE edu_hudi.tbl_web_chat_ems (
  id string,
  create_date_time string,
  session_id string,
  sid string,
  create_time string,
  seo_source string,
  seo_keywords string,
  ip string,
  area string,
  country string,
  province string,
  city string,
  origin_channel string,
  `user` string,
  manual_time string,
  begin_time string,
  end_time string,
  last_customer_msg_time_stamp string,
  last_agent_msg_time_stamp string,
  reply_msg_count string,
  msg_count string,
  browser_name string,
  os_info string
)
PARTITIONED BY (day_str string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 
  '/ehualu/hudi-warehouse/edu_web_chat_ems_hudi' ;
由于是分区表,所以添加分区:
ALTER TABLE edu_hudi.tbl_web_chat_ems ADD IF NOT EXISTS PARTITION(day_str='2022-09-23') 
location '/ehualu/hudi-warehouse/edu_web_chat_ems_hudi/2022-09-23' ;

4.4 Analysis of offline indicators

To use Presto to analyze Hudi table data, you need to put the integrated jar package: hudi-presto-bundle-0.9.0.jar into the Presto plugin directory: /export/server/presto/plugin/hive-hadoop2:
insert image description here

Start the Presto Client client command line to view the database created in Hive:
insert image description here

Use the database: edu_hudi to see which tables are available:
insert image description here

Next, according to the business indicator requirements, use Presto to analyze the Hudi table data, and directly save the indicators in the MySQL database.
insert image description here

First, create a database in the MySQL database to store the analysis indicator table:

-- 创建数据库
CREATE DATABASE `oldlu_rpt` /*!40100 DEFAULT CHARACTER SET utf8 */;

4.4.1 Daily registration volume

Statistical analysis of customer intention table data: daily customer registration volume, first create MySQL table, then write SQL, and finally save the data.
MySQL table: oldlu_rpt.stu_apply

CREATE TABLE  IF NOT EXISTS `oldlu_rpt`.`stu_apply` (
  `report_date` longtext,
  `report_total` bigint(20) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Index SQL statement:

WITH tmp AS (
  SELECT 
    format_datetime(from_unixtime(cast(payment_time as bigint) / 1000),'yyyy-MM-dd')AS day_value, customer_id 
  FROM hive.edu_hudi.tbl_customer_relationship 
  WHERE 
    day_str = '2022-09-23' AND payment_time IS NOT NULL AND payment_state = 'PAID' AND deleted = 'false'
)
SELECT day_value, COUNT(customer_id) AS total FROM tmp GROUP BY day_value ;

The analysis results are saved in the MySQL table:

INSERT INTO mysql.oldlu_rpt.stu_apply (report_date, report_total) 
SELECT day_value, total FROM (
  SELECT day_value, COUNT(customer_id) AS total FROM (
    SELECT 
      format_datetime(from_unixtime(cast(payment_time as bigint) / 1000), 'yyyy-MM-dd')AS day_value, customer_id 
    FROM hive.edu_hudi.tbl_customer_relationship 
    WHERE day_str = '2022-09-23' AND payment_time IS NOT NULL AND payment_state = 'PAID' AND deleted = 'false'
  ) GROUP BY day_value
) ;

View the data in the database table:
insert image description here

4.4.2 Daily visits

Statistical analysis of customer intention table data: daily customer visits, first create MySQL table, then write SQL, and finally save the data.
MySQL table: oldlu_rpt.web_pv

CREATE TABLE  IF NOT EXISTS `oldlu_rpt`.`web_pv` (
  `report_date` longtext,
  `report_total` bigint(20) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

指标SQL语句:
WITH tmp AS (
  SELECT 
    id, format_datetime(from_unixtime(cast(create_time as bigint) / 1000), 'yyyy-MM-dd')AS day_value
  FROM hive.edu_hudi.tbl_web_chat_ems 
  WHERE day_str = '2022-09-23' 
)
SELECT day_value, COUNT(id) AS total FROM tmp GROUP BY day_value ;

The analysis results are saved in the MySQL table:

INSERT INTO mysql.oldlu_rpt.web_pv (report_date, report_total) 
SELECT day_value, COUNT(id) AS total FROM (
  SELECT 
    id, format_datetime(from_unixtime(cast(create_time as bigint) / 1000), 'yyyy-MM-dd') AS day_value
  FROM hive.edu_hudi.tbl_web_chat_ems 
  WHERE day_str = '2022-09-23' 
) GROUP BY day_value ;

View the data in the database table:
insert image description here

4.4.3 Daily Intents

Statistical analysis of customer intention table data: the number of daily customer intentions, first create a MySQL table, then write SQL, and finally save the data.
MySQL table: oldlu_rpt.stu_intention

CREATE TABLE  IF NOT EXISTS `oldlu_rpt`.`stu_intention` (
  `report_date` longtext,
  `report_total` bigint(20) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Index SQL statement:

WITH tmp AS (
  SELECT 
    id, format_datetime(from_unixtime(cast(create_date_time as bigint) / 1000), 'yyyy-MM-dd')AS day_value
  FROM hive.edu_hudi.tbl_customer_relationship 
  WHERE day_str = '2022-09-23' AND create_date_time IS NOT NULL AND deleted = 'false'
)
SELECT day_value, COUNT(id) AS total FROM tmp GROUP BY day_value ;

The analysis results are saved in the MySQL table:

INSERT INTO mysql.oldlu_rpt.stu_intention (report_date, report_total) 
SELECT day_value, COUNT(id) AS total FROM (
  SELECT 
    id, format_datetime(from_unixtime(cast(create_date_time as bigint) / 1000), 'yyyy-MM-dd')AS day_value
  FROM hive.edu_hudi.tbl_customer_relationship 
  WHERE day_str = '2022-09-23' AND create_date_time IS NOT NULL AND deleted = 'false'
) GROUP BY day_value ;

View the data in the database table:
insert image description here

4.4.4 Daily lead volume

Statistical analysis of customer intention table data: daily customer leads, first create MySQL table, then write SQL, and finally save the data.
MySQL table: oldlu_rpt.stu_clue

CREATE TABLE IF NOT EXISTS `oldlu_rpt`.`stu_clue` (
  `report_date` longtext,
  `report_total` bigint(20) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Index SQL statement:

WITH tmp AS (
  SELECT 
    id, format_datetime(from_unixtime(cast(create_date_time as bigint) / 1000), 'yyyy-MM-dd')AS day_value
  FROM hive.edu_hudi.tbl_customer_clue 
  WHERE day_str = '2022-09-23' AND clue_state IS NOT NULL AND deleted = 'false'
)
SELECT day_value, COUNT(id) AS total FROM tmp GROUP BY day_value ;

The analysis results are saved in the MySQL table:

INSERT INTO mysql.oldlu_rpt.stu_clue (report_date, report_total) 
SELECT day_value, COUNT(id) AS total FROM (
  SELECT 
    id, format_datetime(from_unixtime(cast(create_date_time as bigint) / 1000), 'yyyy-MM-dd')AS day_value
  FROM hive.edu_hudi.tbl_customer_clue 
  WHERE day_str = '2022-09-23' AND clue_state IS NOT NULL AND deleted = 'false'
) GROUP BY day_value ;

View the data in the database table:
insert image description here

5 Flink SQL streaming analysis

Use Flink SQL to query the real-time data of the Hudi table today, count offline indicators corresponding to today's real-time indicators, and finally use FineBI to display them on a large screen in real time.
insert image description here

Based on the integration of Flink SQL Connector with Hudi and MySQL, write SQL streaming query analysis, and execute DDL statements and SELECT statements on the SQL Clientk client command line.

5.1 Business requirements

insert image description here

There are a total of 5 indicators, involving 3 business tables: customer access record table, customer lead table and customer intention table, and the real-time data of each indicator is stored in a table in the MySQL database.
insert image description here

Each real-time indicator statistics is divided into three steps:
Step 1, create an input table, and stream load Hudi table data;
Step 2, create an output table, and save the data to the MySQL table in real time;
Step 3, write according to the business SQL statement, query the input table data, and insert the result into the output table;
insert image description here

5.2 Create MySQL table

Each real-time indicator is stored in a table in the MySQL database. First, create 5 tables corresponding to 5 indicators. The names are different and the fields are the same. The DDL statement is as follows: Indicator 1: Today's
visits

CREATE TABLE `oldlu_rpt`.`realtime_web_pv` (
  `report_date` varchar(255) NOT NULL,
  `report_total` bigint(20) NOT NULL,
  PRIMARY KEY (`report_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Indicator 2: Today's consultation volume

CREATE TABLE `oldlu_rpt`.`realtime_stu_consult` (
  `report_date` varchar(255) NOT NULL,
  `report_total` bigint(20) NOT NULL,
  PRIMARY KEY (`report_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Indicator 3: Number of Intentions Today

CREATE TABLE `oldlu_rpt`.`realtime_stu_intention` (
  `report_date` varchar(255) NOT NULL,
  `report_total` bigint(20) NOT NULL,
  PRIMARY KEY (`report_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Indicator 4: The number of applicants today

CREATE TABLE `oldlu_rpt`.`realtime_stu_apply` (
  `report_date` varchar(255) NOT NULL,
  `report_total` bigint(20) NOT NULL,
  PRIMARY KEY (`report_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Indicator 5: The amount of effective leads today

CREATE TABLE `oldlu_rpt`.`realtime_stu_clue` (
  `report_date` varchar(255) NOT NULL,
  `report_total` bigint(20) NOT NULL,
  PRIMARY KEY (`report_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

5.3 Real-time indicator analysis

insert image description here

1. Today's visit volume and today's consultation volume, stream loading table: edu_web_chat_ems_hudi data
insert image description here

The number of intentions and the number of applicants today, stream loading table: edu_customer_relationship_hudi data
insert image description here

3. The amount of effective clues today, streaming loading table: edu_customer_clue_hudi data
insert image description here

Start the HDFS service and the Standalone cluster, run the SQL Client client, and set properties:

-- 启动HDFS服务
hadoop-daemon.sh start namenode 
hadoop-daemon.sh start datanode

-- 启动Flink Standalone集群
export HADOOP_CLASSPATH=`/export/server/hadoop/bin/hadoop classpath`
/export/server/flink/bin/start-cluster.sh

-- 启动SQL Client
/export/server/flink/bin/sql-client.sh embedded \
-j /export/server/flink/lib/hudi-flink-bundle_2.12-0.9.0.jar shell

-- 设置属性
set execution.result-mode=tableau;
set execution.checkpointing.interval=3sec;
-- 流处理模式
SET execution.runtime-mode = streaming; 

5.3.1 Today's visits

insert image description here

First create the input table: streaming loading, Hudi table data:

CREATE TABLE edu_web_chat_ems_hudi (
  id string PRIMARY KEY NOT ENFORCED,
  create_date_time string,
  session_id string,
  sid string,
  create_time string,
  seo_source string,
  seo_keywords string,
  ip string,
  area string,
  country string,
  province string,
  city string,
  origin_channel string,
  `user` string,
  manual_time string,
  begin_time string,
  end_time string,
  last_customer_msg_time_stamp string,
  last_agent_msg_time_stamp string,
  reply_msg_count string,
  msg_count string,
  browser_name string,
  os_info string,
  part STRING
)
PARTITIONED BY (part)
WITH(
  'connector'='hudi',
  'path'= 'hdfs://node1.oldlu.cn:8020/ehualu/hudi-warehouse/edu_web_chat_ems_hudi', 
  'table.type'= 'MERGE_ON_READ',
  'hoodie.datasource.write.recordkey.field'= 'id', 
  'write.precombine.field'= 'create_date_time',
  'read.streaming.enabled' = 'true',
  'read.streaming.check-interval' = '5',
  'read.tasks' = '1'
);

Statistical results are stored in View:

CREATE VIEW IF NOT EXISTS view_tmp_web_pv AS
SELECT day_value, COUNT(id) AS total FROM (
  SELECT
    FROM_UNIXTIME(CAST(create_time AS BIGINT) / 1000, 'yyyy-MM-dd') AS day_value, id
  FROM edu_web_chat_ems_hudi
  WHERE part = CAST(CURRENT_DATE AS STRING)
) GROUP BY  day_value;

Save the MySQL database:
- SQL Connector MySQL

CREATE TABLE realtime_web_pv_mysql (
  report_date STRING,
  report_total BIGINT, 
  PRIMARY KEY (report_date) NOT ENFORCED
) WITH (
  'connector' = 'jdbc',
  'url' = 'jdbc:mysql://node1.oldlu.cn:3306/oldlu_rpt',
  'driver' = 'com.mysql.cj.jdbc.Driver',
  'username' = 'root',
  'password' = '123456',
  'table-name' = 'realtime_web_pv'
);

– INSERT INTO insert

INSERT INTO  realtime_web_pv_mysql SELECT day_value, total FROM view_tmp_web_pv;

5.3.2 Today's consultation volume

insert image description here

Since today's visits and today's consultations are all querying the table in Hudi: edu_web_chat_emes_hudi, so after streaming and incrementally loading data, it is not needed here.
Statistical results are stored in View:

CREATE VIEW IF NOT EXISTS view_tmp_stu_consult AS
SELECT day_value, COUNT(id) AS total FROM (
  SELECT
    FROM_UNIXTIME(CAST(create_time AS BIGINT) / 1000, 'yyyy-MM-dd') AS day_value, id
  FROM edu_web_chat_ems_hudi
  WHERE part = CAST(CURRENT_DATE AS STRING) AND msg_count > 0
) GROUP BY  day_value;

Save the MySQL database:
- SQL Connector MySQL

CREATE TABLE realtime_stu_consult_mysql (
  report_date STRING,
  report_total BIGINT, 
  PRIMARY KEY (report_date) NOT ENFORCED
) WITH (
  'connector' = 'jdbc',
  'url' = 'jdbc:mysql://node1.oldlu.cn:3306/oldlu_rpt',
  'driver' = 'com.mysql.cj.jdbc.Driver',
  'username' = 'root',
  'password' = '123456',
  'table-name' = 'realtime_stu_consult'
);

– INSERT INTO insert

INSERT INTO  realtime_stu_consult_mysql SELECT day_value, total FROM view_tmp_stu_consult;

5.3.3 Today's Intentions

insert image description here

First create the input table: streaming loading, Hudi table data:

create table edu_customer_relationship_hudi(
  id string PRIMARY KEY NOT ENFORCED,
  create_date_time string,
  update_date_time string,
  deleted string,
  customer_id string,
  first_id string,
  belonger string,
  belonger_name string,
  initial_belonger string,
  distribution_handler string,
  business_scrm_department_id string,
  last_visit_time string,
  next_visit_time string,
  origin_type string,
  oldlu_school_id string,
  oldlu_subject_id string,
  intention_study_type string,
  anticipat_signup_date string,
  `level` string,
  creator string,
  current_creator string,
  creator_name string,
  origin_channel string,
  `comment` string,
  first_customer_clue_id string,
  last_customer_clue_id string,
  process_state string,
  process_time string,
  payment_state string,
  payment_time string,
  signup_state string,
  signup_time string,
  notice_state string,
  notice_time string,
  lock_state string,
  lock_time string,
  oldlu_clazz_id string,
  oldlu_clazz_time string,
  payment_url string,
  payment_url_time string,
  ems_student_id string,
  delete_reason string,
  deleter string,
  deleter_name string,
  delete_time string,
  course_id string,
  course_name string,
  delete_comment string,
  close_state string,
  close_time string,
  appeal_id string,
  tenant string,
  total_fee string,
  belonged string,
  belonged_time string,
  belonger_time string,
  transfer string,
  transfer_time string,
  follow_type string,
  transfer_bxg_oa_account string,
  transfer_bxg_belonger_name string,
  part STRING
)
PARTITIONED BY (part)
WITH(
  'connector'='hudi',
  'path'= 'hdfs://node1.oldlu.cn:8020/hudi-warehouse/edu_customer_relationship_hudi', 
  'table.type'= 'MERGE_ON_READ',
  'hoodie.datasource.write.recordkey.field'= 'id', 
  'write.precombine.field'= 'create_date_time',
  'read.streaming.enabled' = 'true',
  'read.streaming.check-interval' = '5',    
  'read.tasks' = '1'
);

Statistical results are stored in View:

CREATE VIEW IF NOT EXISTS view_tmp_stu_intention AS
SELECT day_value, COUNT(id) AS total FROM (
  SELECT
    FROM_UNIXTIME(CAST(create_date_time AS BIGINT) / 1000, 'yyyy-MM-dd') AS day_value, id
  FROM edu_customer_relationship_hudi
  WHERE part = CAST(CURRENT_DATE AS STRING) AND create_date_time IS NOT NULL AND deleted = 'false'
) GROUP BY  day_value;
保存MySQL数据库:
-- SQL Connector MySQL
CREATE TABLE realtime_stu_intention_mysql (
  report_date STRING,
  report_total BIGINT, 
  PRIMARY KEY (report_date) NOT ENFORCED
) WITH (
  'connector' = 'jdbc',
  'url' = 'jdbc:mysql://node1.oldlu.cn:3306/oldlu_rpt',
  'driver' = 'com.mysql.cj.jdbc.Driver',
  'username' = 'root',
  'password' = '123456',
  'table-name' = 'realtime_stu_intention'
);

– INSERT INTO insert

INSERT INTO  realtime_stu_intention_mysql SELECT day_value, total 
FROM view_tmp_stu_intention;

5.3.4 Today's Enrollment Number

insert image description here

Since today's intent vector and today's number of applicants are both querying the table in Hudi: edu_customer_relationship_hudi, after streaming and incrementally loading data, it is not needed here.
Statistical results are stored in View:

CREATE VIEW IF NOT EXISTS view_tmp_stu_apply AS
SELECT day_value, COUNT(id) AS total FROM (
  SELECT
    FROM_UNIXTIME(CAST(payment_time AS BIGINT) / 1000, 'yyyy-MM-dd') AS day_value, id
  FROM edu_customer_relationship_hudi
  WHERE part = CAST(CURRENT_DATE AS STRING) AND payment_time IS NOT NULL 
AND payment_state = 'PAID' AND deleted = 'false'
) GROUP BY  day_value;

Save the MySQL database:
- SQL Connector MySQL

CREATE TABLE realtime_stu_apply_mysql (
  report_date STRING,
  report_total BIGINT, 
  PRIMARY KEY (report_date) NOT ENFORCED
) WITH (
  'connector' = 'jdbc',
  'url' = 'jdbc:mysql://node1.oldlu.cn:3306/oldlu_rpt',
  'driver' = 'com.mysql.cj.jdbc.Driver',
  'username' = 'root',
  'password' = '123456',
  'table-name' = 'realtime_stu_apply'
);

– INSERT INTO insert

INSERT INTO  realtime_stu_apply_mysql SELECT day_value, total FROM view_tmp_stu_apply;

5.3.5 Today's effective leads

insert image description here

First create the input table: streaming loading, Hudi table data:

create table edu_customer_clue_hudi(
  id string PRIMARY KEY NOT ENFORCED,
  create_date_time string,
  update_date_time string,
  deleted string,
  customer_id string,
  customer_relationship_id string,
  session_id string,
  sid string,
  status string,
  `user` string,
  create_time string,
  platform string,
  s_name string,
  seo_source string,
  seo_keywords string,
  ip string,
  referrer string,
  from_url string,
  landing_page_url string,
  url_title string,
  to_peer string,
  manual_time string,
  begin_time string,
  reply_msg_count string,
  total_msg_count string,
  msg_count string,
  `comment` string,
  finish_reason string,
  finish_user string,
  end_time string,
  platform_description string,
  browser_name string,
  os_info string,
  area string,
  country string,
  province string,
  city string,
  creator string,
  name string,
  idcard string,
  phone string,
  oldlu_school_id string,
  oldlu_school string,
  oldlu_subject_id string,
  oldlu_subject string,
  wechat string,
  qq string,
  email string,
  gender string,
  `level` string,
  origin_type string,
  information_way string,
  working_years string,
  technical_directions string,
  customer_state string,
  valid string,
  anticipat_signup_date string,
  clue_state string,
  scrm_department_id string,
  superior_url string,
  superior_source string,
  landing_url string,
  landing_source string,
  info_url string,
  info_source string,
  origin_channel string,
  course_id string,
  course_name string,
  zhuge_session_id string,
  is_repeat string,
  tenant string,
  activity_id string,
  activity_name string,
  follow_type string,
  shunt_mode_id string,
  shunt_employee_group_id string,
  part STRING
)
PARTITIONED BY (part)
WITH(
  'connector'='hudi',
  'path'= 'hdfs://node1.oldlu.cn:8020/hudi-warehouse/edu_customer_clue_hudi', 
  'table.type'= 'MERGE_ON_READ',
  'hoodie.datasource.write.recordkey.field'= 'id', 
  'write.precombine.field'= 'create_date_time',
  'read.streaming.enabled' = 'true',
  'read.streaming.check-interval' = '5',    
  'read.tasks' = '1'
);
统计结果,存储至视图ViewCREATE VIEW IF NOT EXISTS view_tmp_stu_clue AS
SELECT day_value, COUNT(id) AS total FROM (
  SELECT
    FROM_UNIXTIME(CAST(create_date_time AS BIGINT) / 1000, 'yyyy-MM-dd') AS day_value, id
  FROM edu_customer_clue_hudi
  WHERE part = CAST(CURRENT_DATE AS STRING) AND clue_state IS NOT NULL AND deleted = 'false'
) GROUP BY  day_value;
保存MySQL数据库:
-- SQL Connector MySQL
CREATE TABLE realtime_stu_clue_mysql (
  report_date STRING,
  report_total BIGINT, 
  PRIMARY KEY (report_date) NOT ENFORCED
) WITH (
  'connector' = 'jdbc',
  'url' = 'jdbc:mysql://node1.oldlu.cn:3306/oldlu_rpt',
  'driver' = 'com.mysql.cj.jdbc.Driver',
  'username' = 'root',
  'password' = '123456',
  'table-name' = 'realtime_stu_clue'
);

– INSERT INTO insert

INSERT INTO  realtime_stu_clue_mysql SELECT day_value, total FROM view_tmp_stu_clue;

6 FineBI report visualization

Use FineBI to connect to the data MySQL database, load the business indicator report data, and display it in different charts
insert image description here

Guess you like

Origin blog.csdn.net/ZGL_cyy/article/details/130370560