Article directory
- One DWS layer-keyword subject table (FlinkSQL)
- Two data visualization interface
- Three Sugar data big screen
- Four total turnover interface
One DWS layer-keyword subject table (FlinkSQL)
1 Filter data
// TODO 4 将动态表中表示搜索行为的记录过滤出来
Table fullwordTable = tableEnv.sqlQuery("select " +
" page['item'] fullword,rowtime " +
" from " +
" page_view " +
" where " +
" page['page_id']='good_list' and page['item'] is not null");
2 Split using UDTF
(1) Split results
搜索内容:荣耀Play6T Pro 天玑810
fullword rowtime
荣耀Play6T Pro 天玑810 20221215
拆分后的效果:荣耀 Play6T Pro 天玑 810
keyword rowtime
荣耀 20221215
Play6T 20221215
Pro 20221215
天玑 20221215
810 20221215
(2) Join table function (UDTF)
Join the table with the result of the table function. Each row in the left table (outer) will be joined with the associated row in all results from calling the table function.
User-defined table functions (UDTFs) must be registered before execution. Please refer to the UDF documentation for more information on specifying and registering UDFs
(3) Code
// TODO 2 注册自定义UDTF函数
tableEnv.createTemporarySystemFunction("ik_analyze", KeywordUDTF.class);
// TODO 5 使用自定义UDTF函数对搜索关键词进行拆分
Table keywordTable = tableEnv.sqlQuery("SELECT rowtime, keyword FROM "+ fullwordTable +", LATERAL TABLE(ik_analyze(fullword)) AS T(keyword)");
3 Grouping, windowing, aggregation calculation
// TODO 6 分组、开窗、聚合计算
Table resTable = tableEnv.sqlQuery("select " +
" DATE_FORMAT(TUMBLE_START(rowtime, INTERVAL '10' SECOND),'yyyy-MM-dd HH:mm:ss') as stt, " +
" DATE_FORMAT(TUMBLE_END(rowtime, INTERVAL '10' SECOND),'yyyy-MM-dd HH:mm:ss') as edt, " +
" keyword, " +
" count(*) ct," +
" '"+ GmallConstant.KEYWORD_SEARCH +"' source," +
" UNIX_TIMESTAMP() * 1000 as ts" +
" from " +
" "+ keywordTable +" " +
" group by " +
" TUMBLE(rowtime, INTERVAL '10' SECOND),keyword ");
4 Convert to stream and write to ClickHouse
(1) Create a keyword statistics table in ClickHouse
create table keyword_stats_2022 (
stt DateTime,
edt DateTime,
keyword String ,
source String ,
ct UInt64 ,
ts UInt64
)engine =ReplacingMergeTree( ts)
partition by toYYYYMMDD(stt)
order by ( stt,edt,keyword,source );
(2) Encapsulate the KeywordStats entity class
package com.hzy.gmall.realtime.beans;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
/**
* Desc: 关键词统计实体类
*/
@Data
@AllArgsConstructor
@NoArgsConstructor
public class KeywordStats {
private String keyword;
private Long ct;
private String source;
private String stt;
private String edt;
private Long ts;
}
(3) Convert the stream in the main program and write to ClickHouse
// TODO 7 将表转换为流
DataStream<KeywordStats> keywordStatsDS = tableEnv.toAppendStream(resTable, KeywordStats.class);
keywordStatsDS.print(">>>");
// TODO 8 将流中数据写入ck中
keywordStatsDS.addSink(ClickhouseUtil.getJdbcSink(
// 字段顺序与实体类中属性顺序要一致
"insert into keyword_stats_2022(keyword,ct,source,stt,edt,ts) values(?,?,?,?,?,?) "
));
5 overall test
- Start ZK, Kafka, logger.sh, ClickHouse
- runBaseLogApp
- Run KeywordStatsApp
- Run the jar package in the rt_applog directory
- View console output
- View the keyword_stats_2022 table data in ClickHouse
Two data visualization interface
1 Design ideas
Previously, the data was processed hierarchically, and finally the lightly aggregated results were saved in ClickHouse. The main purpose was to provide real-time data query, statistics, and analysis services. These statistical services are generally displayed in two forms, one is a BI tool for professional data analysts, and the other is a more intuitive big data screen for non-professionals.
The following is mainly for the interface development of Baidu's sugar data large screen service.
2 Demand sorting
(1) Effect drawing
(2) Analysis and visualization large screen
Each component in the visual large screen needs a separate interface, and a total of 8 components are involved in the figure.
component name | components | query metrics | corresponding data sheet |
---|---|---|---|
Total transaction amount | number flop | The total amount of orders | product_stats |
Province and city heat map query | heat map | Province and city group order amount | province_stats |
time-sharing traffic | line chart | UV time-sharing PV time-sharing New user time-sharing | visitor_stats |
BrandTopN | horizontal histogram | Group order amount by brand | product_stats |
Category distribution | pie chart | Group order amount by category | product_stats |
Hot word character cloud | character cloud | Keyword Group Count | keyword_stats |
traffic table | pivot table | UV number (new and old users) PV number (new and old users) Bounce rate (new and old users) Average visit time (new and old users) Average number of pages visited (new and old users) | visitor_stats |
Popular product | carousel table | Group order amount by SPU | product_stats |
(3) Interface execution process
Previously, the DWS layer calculation was implemented and written into ClickHouse. The next step is to provide a data interface for querying the data in ClickHouse for the large-screen visualization service. There are two main jobs here
- Configure the visualized big screen service.
- Write a data query interface for access by the large visual screen.
Three Sugar data big screen
1 Product introduction
Sugar is an agile BI and data visualization platform launched by Baidu Cloud. Its goal is to solve the problems of BI analysis and visualization of data in reports and large screens, and liberate the development manpower of data visualization systems.
2 Access
https://cloud.baidu.com/product/sugar.html
3 Create a large data screen
-
After clicking [Use Now], log in to your Baidu account
-
First create organization
-
Select the product [big-screen early adopter version] during creation, and there is a one-month trial period for the first use
-
After creating a new organization, select [Enter Organization]
-
Then enter the default [first space]
-
In the space, select [New] after [Large screen to be created]
-
Choose a big screen template
-
You can choose an empty template, or modify it according to an existing template
(8) You can choose an empty template, or modify it according to an existing template
- Here select a blank template and specify the name of the large screen
- Enter the editing window of the big screen
Four total turnover interface
1 Sugar component: digital card flop
(1) Add components
Select [Indicators] → [Digital Flop] from the top of the editor on the big screen
(3) Query the data format required by the component
Select [Static JSON] at the position of data binding, and you can see the JSON format required by the data
(4) Interface access path and return format
1.1.4 Interface access path and return format
- Access path: /api/sugar/gmv
- return format
{
"status": 0,
"data": 1201012.694507823
}
(5) Execute SQL
a product_stats_2022 table data
b product_stats_2022 table structure
c SQL statement
toYYYYMMDD : Converts a Date or DateTime to a number of type UInt32 containing the year and month number (YYYY * 10000 + MM * 100 + DD).
select sum(order_amount) order_amount from product_stats_2022 where toYYYYMMDD(stt)=20221215;
2 Data Interface Realization
(1) Create a data interface module
a Create a new SpringBoot module gmall2022-publisher under the gmall2022-parent project
You can choose not to rely on it first, and then add it in pom.xml uniformly
b Add the required dependencies in the pom.xml file
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.7.6</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.hzy.gmall.publisher</groupId>
<artifactId>gmall2022-publisher</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>gmall2022-publisher</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.mybatis.spring.boot</groupId>
<artifactId>mybatis-spring-boot-starter</artifactId>
<version>2.1.3</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.junit.vintage</groupId>
<artifactId>junit-vintage-engine</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.11</version>
</dependency>
<dependency>
<groupId>ru.yandex.clickhouse</groupId>
<artifactId>clickhouse-jdbc</artifactId>
<version>0.1.55</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
(2) Code hierarchy
a code structure
layered | kind | processing content |
---|---|---|
controller control layer | SugarController | Query transaction amount interface and return parameter processing |
service service layer | ProductStatsService ProductStatsServiceImpl | Query product statistics |
mapper data mapping layer | ProductStatsMapper | Write SQL query commodity statistics table |
b Modify the Springboot core configuration file application.properties
server.port=8070
#配置ClickHouse驱动以及URL
spring.datasource.driver-class-name=ru.yandex.clickhouse.ClickHouseDriver
spring.datasource.url=jdbc:clickhouse://hadoop101:8123/default
c create package structure
(3) Layered code implementation
a Mapper layer: create ProductStatsMapper interface
package com.hzy.gmall.publisher.mapper;
/**
* 商品统计Mapper接口
*/
public interface ProductStatsMapper {
// 获取某天商品的总交易额
@Select("select sum(order_amount) order_amount from product_stats_2022 where toYYYYMMDD(stt)=#{date}")
BigDecimal selectGMV(Integer date);
}
b Add @MapperScan annotations in Application
package com.hzy.gmall.publisher;
@SpringBootApplication
@MapperScan(basePackages = "com.hzy.gmall.publisher.mapper")
public class Gmall2022PublisherApplication {
public static void main(String[] args) {
SpringApplication.run(Gmall2022PublisherApplication.class, args);
}
}
c Service layer: create ProductStatsService interface
package com.hzy.gmall.publisher.service;
/**
* 商品统计service接口
*/
public interface ProductStatsService {
// 获取某天的总交易额
BigDecimal getGMV(Integer date);
}
d Service layer: create a ProductStatsServiceImpl implementation class
package com.hzy.gmall.publisher.service.impl;
/**
* 商品统计service接口实现类
*/
@Service
public class ProductStatsServiceImpl implements ProductStatsService{
@Autowired
private ProductStatsMapper productStatsMapper;
@Override
public BigDecimal getGMV(Integer date) {
return productStatsMapper.selectGMV(date);
}
}
e Controller layer: Create SugarController class
This class mainly receives user requests and responds. According to different components of sugar, different formats are returned.
package com.hzy.gmall.publisher.controller;
/**
* 大屏展示控制层
*/
@RestController
@RequestMapping("/api/sugar")
public class SugarController {
@Autowired
private ProductStatsService productStatsService;
@RequestMapping("/gmv")
public String getGMV(@RequestParam(value = "date",defaultValue = "0") Integer date){
if (date == 0){
date = now();
}
// 调用service获取总交易额
BigDecimal gmv = productStatsService.getGMV(date);
String json = "{\"status\": 0,\"data\": "+gmv+"}";
return json;
}
// 获取当前日期
private Integer now() {
String yyyyMMdd = DateFormatUtils.format(new Date(), "yyyyMMdd");
return Integer.valueOf(yyyyMMdd);
}
}
(4) Test the local interface
a Start the SpringBoot application
Access the test interface with a browser
b output result
Output one:
Output result two: