[Real-time data warehouse] Keyword subject table (FlinkSQL), data visualization interface, Sugar data large screen, and total transaction amount interface implementation of DWS layer

Article directory

One DWS layer-keyword subject table (FlinkSQL)

1 Filter data

// TODO 4 将动态表中表示搜索行为的记录过滤出来
Table fullwordTable = tableEnv.sqlQuery("select " +
        "  page['item'] fullword,rowtime " +
        " from " +
        "  page_view " +
        " where " +
        "  page['page_id']='good_list' and page['item'] is not null");

2 Split using UDTF

(1) Split results

搜索内容:荣耀Play6T Pro 天玑810
	fullword				rowtime
荣耀Play6T Pro 天玑810		20221215

拆分后的效果:荣耀  Play6T  Pro  天玑  810
     keyword		   rowtime
	  荣耀			20221215
	  Play6T		  20221215
	  Pro			  20221215
	  天玑			20221215
	  810  		      20221215

(2) Join table function (UDTF)

Join the table with the result of the table function. Each row in the left table (outer) will be joined with the associated row in all results from calling the table function.

User-defined table functions (UDTFs) must be registered before execution. Please refer to the UDF documentation for more information on specifying and registering UDFs

(3) Code

// TODO 2 注册自定义UDTF函数
tableEnv.createTemporarySystemFunction("ik_analyze", KeywordUDTF.class);

// TODO 5 使用自定义UDTF函数对搜索关键词进行拆分
Table keywordTable = tableEnv.sqlQuery("SELECT rowtime, keyword FROM "+ fullwordTable +", LATERAL TABLE(ik_analyze(fullword)) AS T(keyword)");

3 Grouping, windowing, aggregation calculation

// TODO 6 分组、开窗、聚合计算
Table resTable = tableEnv.sqlQuery("select " +
        "  DATE_FORMAT(TUMBLE_START(rowtime, INTERVAL '10' SECOND),'yyyy-MM-dd HH:mm:ss') as stt, " +
        "  DATE_FORMAT(TUMBLE_END(rowtime, INTERVAL '10' SECOND),'yyyy-MM-dd HH:mm:ss') as edt, " +
        "  keyword, " +
        "  count(*) ct," +
        "  '"+ GmallConstant.KEYWORD_SEARCH +"' source," +
        "  UNIX_TIMESTAMP() * 1000 as ts" +
        " from " +
        "  "+ keywordTable +" " +
        " group by " +
        "  TUMBLE(rowtime, INTERVAL '10' SECOND),keyword ");

4 Convert to stream and write to ClickHouse

(1) Create a keyword statistics table in ClickHouse

create table keyword_stats_2022 (
    stt DateTime,
    edt DateTime,
    keyword String ,
    source String ,
    ct UInt64 ,
    ts UInt64
)engine =ReplacingMergeTree( ts)
        partition by  toYYYYMMDD(stt)
        order by  ( stt,edt,keyword,source );

(2) Encapsulate the KeywordStats entity class

package com.hzy.gmall.realtime.beans;

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
/**
 * Desc: 关键词统计实体类
 */
@Data
@AllArgsConstructor
@NoArgsConstructor
public class KeywordStats {
    
    
    private String keyword;
    private Long ct;
    private String source;
    private String stt;
    private String edt;
    private Long ts;
}

(3) Convert the stream in the main program and write to ClickHouse

// TODO 7 将表转换为流
DataStream<KeywordStats> keywordStatsDS = tableEnv.toAppendStream(resTable, KeywordStats.class);
keywordStatsDS.print(">>>");

// TODO 8 将流中数据写入ck中
keywordStatsDS.addSink(ClickhouseUtil.getJdbcSink(
        // 字段顺序与实体类中属性顺序要一致
        "insert into keyword_stats_2022(keyword,ct,source,stt,edt,ts) values(?,?,?,?,?,?) "
));

5 overall test

  • Start ZK, Kafka, logger.sh, ClickHouse
  • runBaseLogApp
  • Run KeywordStatsApp
  • Run the jar package in the rt_applog directory
  • View console output
  • View the keyword_stats_2022 table data in ClickHouse

Two data visualization interface

1 Design ideas

Previously, the data was processed hierarchically, and finally the lightly aggregated results were saved in ClickHouse. The main purpose was to provide real-time data query, statistics, and analysis services. These statistical services are generally displayed in two forms, one is a BI tool for professional data analysts, and the other is a more intuitive big data screen for non-professionals.

The following is mainly for the interface development of Baidu's sugar data large screen service.

2 Demand sorting

(1) Effect drawing

insert image description here

(2) Analysis and visualization large screen

Each component in the visual large screen needs a separate interface, and a total of 8 components are involved in the figure.

component name components query metrics corresponding data sheet
Total transaction amount number flop The total amount of orders product_stats
Province and city heat map query heat map Province and city group order amount province_stats
time-sharing traffic line chart UV time-sharing PV time-sharing New user time-sharing visitor_stats
BrandTopN horizontal histogram Group order amount by brand product_stats
Category distribution pie chart Group order amount by category product_stats
Hot word character cloud character cloud Keyword Group Count keyword_stats
traffic table pivot table UV number (new and old users) PV number (new and old users) Bounce rate (new and old users) Average visit time (new and old users) Average number of pages visited (new and old users) visitor_stats
Popular product carousel table Group order amount by SPU product_stats

(3) Interface execution process

insert image description here

Previously, the DWS layer calculation was implemented and written into ClickHouse. The next step is to provide a data interface for querying the data in ClickHouse for the large-screen visualization service. There are two main jobs here

  • Configure the visualized big screen service.
  • Write a data query interface for access by the large visual screen.

Three Sugar data big screen

1 Product introduction

Sugar is an agile BI and data visualization platform launched by Baidu Cloud. Its goal is to solve the problems of BI analysis and visualization of data in reports and large screens, and liberate the development manpower of data visualization systems.

2 Access

https://cloud.baidu.com/product/sugar.html

3 Create a large data screen

  • After clicking [Use Now], log in to your Baidu account

  • First create organization

  • Select the product [big-screen early adopter version] during creation, and there is a one-month trial period for the first use

  • After creating a new organization, select [Enter Organization]

insert image description here

  • Then enter the default [first space]

  • In the space, select [New] after [Large screen to be created]

insert image description here

  • Choose a big screen template

  • You can choose an empty template, or modify it according to an existing template

(8) You can choose an empty template, or modify it according to an existing template

  • Here select a blank template and specify the name of the large screen

insert image description here

  • Enter the editing window of the big screen

Four total turnover interface

1 Sugar component: digital card flop

(1) Add components

Select [Indicators] → [Digital Flop] from the top of the editor on the big screen

insert image description here

(3) Query the data format required by the component

Select [Static JSON] at the position of data binding, and you can see the JSON format required by the data

(4) Interface access path and return format

1.1.4 Interface access path and return format

  • Access path: /api/sugar/gmv
  • return format
{
  "status": 0,
  "data": 1201012.694507823
}

(5) Execute SQL

a product_stats_2022 table data

insert image description here

b product_stats_2022 table structure

insert image description here

c SQL statement

toYYYYMMDD : Converts a Date or DateTime to a number of type UInt32 containing the year and month number (YYYY * 10000 + MM * 100 + DD).

select sum(order_amount) order_amount from product_stats_2022 where toYYYYMMDD(stt)=20221215;

2 Data Interface Realization

(1) Create a data interface module

a Create a new SpringBoot module gmall2022-publisher under the gmall2022-parent project

insert image description here

You can choose not to rely on it first, and then add it in pom.xml uniformly

b Add the required dependencies in the pom.xml file

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.7.6</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>com.hzy.gmall.publisher</groupId>
    <artifactId>gmall2022-publisher</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>gmall2022-publisher</name>
    <description>Demo project for Spring Boot</description>
    <properties>
        <java.version>1.8</java.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-jdbc</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.mybatis.spring.boot</groupId>
            <artifactId>mybatis-spring-boot-starter</artifactId>
            <version>2.1.3</version>
        </dependency>

        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
            <exclusions>
                <exclusion>
                    <groupId>org.junit.vintage</groupId>
                    <artifactId>junit-vintage-engine</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-lang3</artifactId>
            <version>3.11</version>
        </dependency>

        <dependency>
            <groupId>ru.yandex.clickhouse</groupId>
            <artifactId>clickhouse-jdbc</artifactId>
            <version>0.1.55</version>
        </dependency>
    </dependencies>


    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>

</project>

(2) Code hierarchy

a code structure

layered kind processing content
controller control layer SugarController Query transaction amount interface and return parameter processing
service service layer ProductStatsService ProductStatsServiceImpl Query product statistics
mapper data mapping layer ProductStatsMapper Write SQL query commodity statistics table

b Modify the Springboot core configuration file application.properties

server.port=8070
#配置ClickHouse驱动以及URL
spring.datasource.driver-class-name=ru.yandex.clickhouse.ClickHouseDriver
spring.datasource.url=jdbc:clickhouse://hadoop101:8123/default

c create package structure

insert image description here

(3) Layered code implementation

a Mapper layer: create ProductStatsMapper interface

package com.hzy.gmall.publisher.mapper;
/**
 * 商品统计Mapper接口
 */
public interface ProductStatsMapper {
    
    
    // 获取某天商品的总交易额
    @Select("select sum(order_amount) order_amount from product_stats_2022 where toYYYYMMDD(stt)=#{date}")
    BigDecimal selectGMV(Integer date);
}

b Add @MapperScan annotations in Application

package com.hzy.gmall.publisher;

@SpringBootApplication
@MapperScan(basePackages = "com.hzy.gmall.publisher.mapper")
public class Gmall2022PublisherApplication {
    
    

    public static void main(String[] args) {
    
    
        SpringApplication.run(Gmall2022PublisherApplication.class, args);
    }

}

c Service layer: create ProductStatsService interface

package com.hzy.gmall.publisher.service;
/**
 * 商品统计service接口
 */
public interface ProductStatsService {
    
    
    // 获取某天的总交易额
    BigDecimal getGMV(Integer date);
}

d Service layer: create a ProductStatsServiceImpl implementation class

package com.hzy.gmall.publisher.service.impl;
/**
 * 商品统计service接口实现类
 */
@Service
public class ProductStatsServiceImpl implements ProductStatsService{
    
    

    @Autowired
    private ProductStatsMapper productStatsMapper;

    @Override
    public BigDecimal getGMV(Integer date) {
    
    
        return productStatsMapper.selectGMV(date);
    }
}

e Controller layer: Create SugarController class

This class mainly receives user requests and responds. According to different components of sugar, different formats are returned.

package com.hzy.gmall.publisher.controller;
/**
 * 大屏展示控制层
 */
@RestController
@RequestMapping("/api/sugar")
public class SugarController {
    
    

    @Autowired
    private ProductStatsService productStatsService;

    @RequestMapping("/gmv")
    public String getGMV(@RequestParam(value = "date",defaultValue = "0") Integer date){
    
    
        if (date == 0){
    
    
            date = now();
        }

        // 调用service获取总交易额
        BigDecimal gmv = productStatsService.getGMV(date);

        String json = "{\"status\": 0,\"data\": "+gmv+"}";
        
        return json;
    }

    // 获取当前日期
    private Integer now() {
    
    
        String yyyyMMdd = DateFormatUtils.format(new Date(), "yyyyMMdd");
        return Integer.valueOf(yyyyMMdd);
    }
}

(4) Test the local interface

a Start the SpringBoot application

Access the test interface with a browser

b output result

Output one:

insert image description here

Output result two:

insert image description here

Guess you like

Origin blog.csdn.net/weixin_43923463/article/details/128430503