[Big Data Learning Chapter 12] Spark Project Combat - Data Visualization

Learning Objectives/Target

Understand data visualization system architecture

Master Phoenix integrated HBase

 Familiar with establishing Phoenix and HBase table mapping

Understand the creation of Spring Boot projects

Master the creation of entity classes in Java Web projects

Master the creation of database access interfaces in Java Web projects

Master the creation of controller classes in Java Web projects

Familiar with the creation of HTML pages in Java Web projects

 Get familiar with how to run Spring Boot projects

overview

        Data visualization refers to the technology of representing data or information as visual objects in graphics to convey data or information. The goal is to clearly and effectively convey information to users so that users can easily understand the complex relationships in data or information. Users can intuitively see the data analysis results through the visual objects in the graph, making it easier to understand business trends or discover new business models. Data visualization is an important step in data analysis. This chapter will explain in detail how to build a data visualization system to display the analysis result data of this project.

1. System overview

1.1 Technology selection

        The design purpose of SpringBoot is to simplify the initial construction and development process of Spring applications, get rid of complicated manual configuration, and quickly build a Java Web project.

        MyBatis is an open source data persistence layer framework, which internally encapsulates the operation of JDBC to access the database, and supports common SQL queries, stored procedures and advanced mapping.

         Echarts is an open source visualization library implemented using JavaScript, which can run smoothly on PCs and mobile devices, and provides a variety of rich visualization types.

Learn more: Phoenix query engine and MyBatis

        MyBatis is a data persistence layer framework that supports SQL queries, but the HBase database used in the project does not support JDBC access and SQL query, which makes the data visualization system we built unable to use the MyBatis framework to access the HBase database. Therefore, it is necessary to use the Apache Phoenix query engine to enable HBase to support access through JDBC, and convert SQL queries into related operations of HBase. 

 1.2 System Architecture

 Offline data visualization display

 

 Real-time data visualization display

2. Data table design and implementation

2.1 Data Sheet Introduction

Top10 analysis results table of popular categories user_session_top10

Field Name

type of data

Relevant instructions

ROW

varchar

Primary key (corresponding to RowKey in HBase table)

cartcount

varchar

The total number of times items in the category were added to the shopping cart

category_id

varchar

category id

purchasecount

varchar

The total number of times the product in the category was purchased

viewcount

varchar

The total number of times items in the category were viewed

Top3 analysis results table of popular products in each region user_session_top3

Field Name

type of data

Relevant instructions

ROW

varchar

Primary key (corresponding to RowKey in HBase table)

product_id

varchar

commodity id

viewcount

varchar

The total number of times the item was viewed

area

varchar

area name

Page single-hop conversion rate statistics table conversion

Field Name

type of data

Relevant instructions

ROW

varchar

Primary key (corresponding to RowKey in HBase table)

convert_page

varchar

Convert pages (page slices)

convert_rage

varchar

conversion rate

User advertisement click stream real-time statistical table adstream

Field Name

type of data

Relevant instructions

ROW

varchar

Primary key (corresponding to RowKey in HBase table)

city

varchar

city ​​name

ad_count

varchar

Ad clicks

ad_id

varchar

advertising id

2.2 Phoenix integrates HBase

 Install Phoenix and integrate HBase in the virtual machine Spark01.

STEP  01

Download the Phoenix installation package:

        Visit the Phoenix official website to download the Phoenix installation package apache-phoenix-4.14.1-HBase-1.2-bin.tar.gz for the Linux operating system.

STEP  02

Upload the Phoenix installation package:

        Use the SecureCRT remote connection tool to connect to the virtual machine Spark01, and execute the "rz" command in the directory /export/software/ where the application installation package is stored to upload the Phoenix installation package.

STEP 03
Install Phoenix:

         Install Phoenix by decompression, and install Phoenix to the directory /export/servers/ where the application is stored.

tar -zxvf /export/software/apache-phoenix-4.14.1-HBase-1.2-bin.tar.gz -C /export/servers/

STEP 04
Phoenix integrates HBase (copy jar package):

        Enter the Phoenix installation directory, and copy phoenix-core-4.14.1-HBase-1.2.jar and phoenix-4.14.1-HBase-1.2-client.jar to the lib directory of the HBase installation directory.

$ cd /export/servers/apache-phoenix-4.14.1-HBase-1.2-bin/

$ cp {phoenix-core-4.14.1-HBase-1.2.jar,phoenix-4.14.1-HBase-1.2-client.jar} /export/servers/hbase-1.2.1/lib/

STEP  05

Phoenix integrates HBase (shut down the HBase cluster):

        Run the stop-hbase.sh command to shut down the HBase cluster.

STEP  06

Phoenix integrates HBase (modify the HBase configuration file):

        Enter the conf directory under the HBase installation directory, execute the "vi hbase-site.xml" command to edit the hbase-site.xml file, and add the namespace mapping configuration.

<property>

    <name>phoenix.schema.isNamespaceMappingEnabled</name>

    <value>true</value>

</property>

<property>

    <name>phoenix.schema.mapSystemTablesToNamespace</name>

    <value>true</value>

</property>

STEP 07
Phoenix integrated HBase (distribution file):

        Distribute the HBase installation directory to the other two virtual machines Spark02 and Spark03 in the cluster.

scp -r /export/servers/hbase-1.2.1/ root@spark02:/export/servers/

scp -r /export/servers/hbase-1.2.1/ root@spark03:/export/servers/

STEP  08

Phoenix integrates HBase (copy the HBase configuration file):

        Enter the conf directory under the HBase installation directory, and copy the hbase-site.xml file to the bin directory under the Phoenix installation directory.

cp hbase-site.xml /export/servers/apache-phoenix-4.14.1-HBase-1.2-bin/bin/

STEP  09

Phoenix integrates HBase (start HBase cluster):

        Execute the "start-hbase.sh" command to start the HBase cluster.

        Before starting the HBase cluster, ensure that the Hadoop and Zookeeper clusters start normally, and that the time of each service in the cluster is consistent. If the time is inconsistent, you need to execute the "systemctl restart chronyd" command on each server to restart the chronyd service for time synchronization.

2.3 Establish Phoenix and HBase table mapping

 

        Operate Phoenix and establish the mapping between Phoenix and HBase table.

         Phoenix provides three operation modes, namely command line interface, JDBC and Squirrel, among which the command line interface is the interactive tool sqlline provided by Phoenix by default; JDBC is an application programming interface used in the Java language to standardize how client programs access the database; Squirrel It is Phoenix's client tool that provides a visual operation window.

 

 Connect to Phoenix:
         There is a Python script file sqlline.py in the bin directory of the Phoenix installation directory to start sqlline. When starting sqlline, you need to enter the Zookeeper cluster address and port number to connect to Phoenix.

#Enter the Phoenix installation directory $ cd /export/servers/apache-phoenix-4.14.1-HBase-1.2-bin

# start sqlline

$ bin/sqlline.py spark01,spark02,spark03:2181

 
View Phoenix tables and views:

         Execute the "!table" command in sqlline to view Phoenix tables and views.

 

 

 Create table mapping (top10):

        Create table top10 in Phoenix through the CREATE statement to establish a mapping with table top10 in the HBase database.

> create table "top10"

> (

> "ROW" varchar primary key,

> "top10_category"."cartcount" varchar,

> "top10_category"."category_id" varchar ,

> "top10_category"."purchasecount" varchar ,

> "top10_category"."viewcount" varchar

> ) column_encoded_bytes=0;

        Create table mapping (top3). Create table top3 in Phoenix through the CREATE statement to establish a mapping with table top3 in the HBase database. 

> create table "top3"

> (

> "ROW" varchar primary key,

> "top3_area_product"."product_id" varchar,

> "top3_area_product"."viewcount" varchar,

> "top3_area_product"."area" varchar

> ) column_encoded_bytes=0;

Create table mapping (conversion):

        Use the CREATE statement to create the table conversion in Phoenix to establish a mapping with the table conversion in the HBase database. 

> create table "conversion"

> (

> "ROW" varchar primary key,

> "page_conversion"."convert_page" varchar,

> "page_conversion"."convert_rage" varchar

> ) column_encoded_bytes=0;

 Create table mapping (adstream):        

         Create the table adstream in Phoenix through the CREATE statement to establish a mapping with the table adstream in the HBase database.

> create table "adstream"

> (

> "ROW" varchar primary key,

> "area_ads_count"."city" varchar,

> "area_ads_count"."ad_count" varchar,

> "area_ads_count"."ad_id" varchar

> ) column_encoded_bytes=0; 

        Phoenix commands are case-sensitive. If you do not add double quotes, the default is uppercase. Therefore, when executing the command to create a table in Phoenix, you need to add double quotes around the table name, column family name, and column name. 

        If the delete table operation is performed in Phoenix, the tables with mapping relationships in HBase will also be deleted together, resulting in data loss. If the mapping table created in Phoenix is ​​only used for query operations, it is recommended to use the method of creating a view to establish the mapping. The method of establishing the view mapping is the same as the method of establishing the table mapping. Here, the mapping of the view adstream is used as an example.

> create view "adstream"

> ( > "ROW" varchar primary key,

> "area_ads_count"."city" varchar,

> "area_ads_count"."ad_count" varchar,

> "area_ads_count"."ad_id" varchar

> );

        If you want to delete a table in Phoenix, but do not want to delete the mapping table in HBase at the same time as deleting the Phoenix table, resulting in data loss, you can create a snapshot of the mapping table in HBase before performing the table deletion operation in Phoenix.

disable 'mapping table'

#Close table snapshot 'mapping table', 'snapshot name'

#create snapshot

        After deleting the table in Phoenix, use the created snapshot to restore the mapping table in HBase.

list_snapshots #Query all snapshots

clone_snapshot 'snapshot name', 'mapping table' #clone the snapshot to a new table

3. Create a Spring Boot project

Create and configure the Spring Boot project through the IntelliJ IDEA development tool to lay the foundation for the realization of the data visualization system.

Step 1: Create a project

        Open the IntelliJ IDEA development tool, use Spring Initializr to initialize the Spring Boot project, and build the Spring Boot project structure.

Select the JDK version to use

Step 2: Configure project information

Configure basic project information on the Project Metadata interface.

Project Organization Unique Identifier

Item unique identifier

JDK version

 Step 3: Configure project dependencies

Configure project dependencies on the "Dependencles" interface.

Choose the version to use Spring Boot

Add Spring Web dependency

Step 4: Configure project name and directory 

configuration item name

Configuration project directory

Step 5: Initialize the project

Step 6: Directory structure after initialization 

The Spring Boot project will generate the project startup class by default

Static resource folder (static)

Template page folder (templates)

Project global configuration file (application.properties)

The Spring Boot project will generate project test classes by default

Step 7: Adjust the project directory structure 

        In order to facilitate the distinction of different types of functions in the project, the default directory structure of the project is adjusted here, and the package entity for storing entity classes, the package dao for storing data access interfaces, and storage control are created under the package "cn.itcast.sparkweb" The package controller of the controller class.

 Step 8: Configure project dependencies

        The dependencies required for this project include Thymeleaf, Tomcat, Phoenix, MyBatis, and Joda-Time. Among them, Thymeleaf is a template engine for Java Web application development; Tomcat is a Web container for running Java Web applications; Phoenix is ​​used to operate Phoenix through Java API in projects; MyBatis is used to use MyBatis framework in projects; Joda- Time is a Java date-time processing library.

 Step 8: Configure the project global configuration file

Configure the global configuration file application.properties in the resources directory of the project, and add the following configuration content:

#Set up the JDBC driver to connect to Phoenix

spring.datasource.driver-class-name=org.apache.phoenix.jdbc.PhoenixDriver

#Set Phoenix connection address and port number spring.datasource.url=jdbc:phoenix:192.168.121.132,192.168.121.133,192.168.121.134:2181 #Set Thymeleaf template path

spring.thymeleaf.prefix=classpath:/templates/

# Set the Thymeleaf template suffix name

spring.thymeleaf.suffix=.html

4. Realize the top 10 data visualization of popular categories

4.1 Create entity class Top10Entity

        In order to facilitate the transfer of analysis result data of the popular category Top10, the entity class Top10Entity is created in the entity package of the project to store the data of table top10 in Phoenix.

public class Top10Entity {

    private String cartcount;

    private String category_id;

    private String purchasecount;

    private String viewcount;

    //Implement the getter/setter method of the property

    ...

}

4.2 Create database access interface Top10Dao

        Create a database access interface Top10Dao in the dao package of the project to read the data of table top10 in Phoenix.

import cn.itcast.sparkweb.entity.Top10Entity;

import org.apache.ibatis.annotations.Mapper;

import org.apache.ibatis.annotations.Select;

import java.util.List;

@Mapper

public interface Top10Dao {

    @Select("select \"cartcount\",\"category_id\",\"purchasecount\",\"viewcount\" from \"top10\"")

    List<Top10Entity> getTop10();

}

4.3 Create a controller class Top10Controller

        Create the controller class Top10Controller in the controller package of the project to implement the method getTop10() in the interface Top10Dao to read the data of table top10 and transfer the data to HTML through the Model object.

@Controller public class Top10Controller {

    @Autowired

    private Top10Dao top10Dao;

    @RequestMapping(value = "/top10",produces = "text/html;charset=utf-8")

    public String top10(Model model) {

        List<Top10Entity> top10 = top10Dao.getTop10();         model.addAttribute("top10",top10);

        return  "top10";

    }

}

        If the Autowired annotation is added to the interface in Top10Controller, the program reports an error, and the content of the error report is "Could not autowire. No beans of 'Top10Dao' type found." This is caused by the built-in inspection tool of IntelliJ IDEA and does not affect the program Start and compile, you can eliminate this problem by referring to the content shown in the figure. 

4.4 Create HTML file top10.html 

        Create an HTML file top10.html in the templates directory of the project, and use jQuery to obtain the data of the popular category Top10 passed from the Model object to the HTML in the file, and fill the obtained data into the ECharts histogram template to realize the popular category Visual display of Top10 data.

<!DOCTYPE html>

<html lang="en" xmlns:th="http://www.thymeleaf.org">

<head>

    <meta charset="UTF-8">

    <title>top10</title>

    <script src="https://apps.bdimg.com/libs/jquery/2.1.4/jquery.min.js"></script>

    <script src="https://cdn.staticfile.org/echarts/4.3.0/echarts.min.js"></script>

</head>

<body> …… </body>

</html>

4.5 Run the project to realize the top 10 data visualization of popular categories

        In order to avoid the problem that JDBC cannot operate Phoenix, you need to create the Hbase-site.xml file in the resources directory of the project before running the project, add the open namespace and support secondary index configuration in the file.

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

    <name>phoenix.schema.isNamespaceMappingEnabled</name>

    <value>true</value>

    <description>Enable Namespace</description>

</property>

<property>

    <name>hbase.regionserver.wal.codec</name>

<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>

    <description>Support secondary index</description>

</property>

</configuration>

Click the [Start] button in IntelliJ IDEA to run the project.

You can view the startup information of the project in the IntelliJ IDEA console.

 Enter "http://localhost:8080/top10" in the browser to view the display effect of Top10 data visualization of popular categories.

        First, install Hadoop in the Windows operating system by decompressing it; then, add Hadoop to the system environment variables, as shown in the figure.

        Edit the mapping file host in the C:\Windows\System32\drivers\etchost directory of the Windows operating system, and add the following content.

192.168.121.132 spark01

192.168.121.133 spark02

192.168.121.134 spark03

5. Realize data visualization of Top3 popular commodities in each region

5.1 Create entity class Top3Entity

        In order to facilitate the transfer of the top3 analysis result data of popular products in each region, the entity class Top3Entity is created in the entity package of the project to store the data of the table top3 in Phoenix.

public class Top3Entity {

    private String product_id;

    private String viewcount;

    private String area;

   // Implement getter and setter methods for properties

   ...

}

5.2 Create database access interface Top3Dao

        Create a database access interface Top3Dao in the dao package of the project to read the data of table top3 in Phoenix.

import cn.itcast.sparkweb.entity.Top3Entity;

import org.apache.ibatis.annotations.Mapper;

import org.apache.ibatis.annotations.Select;

import java.util.List;

@Mapper

public interface Top3Dao {

    @Select("select \"product_id\",\"viewcount\",\"area\" from \"top3\"")

    List<Top3Entity> getTop3();

}

5.3 Create a controller class Top3Controller

        Create the controller class Top3Controller in the controller package of the project, which is used to implement the method getTop3() in the interface Top3Dao to read the data of table top3, and transfer the data to HTML through the Model object.

@Controller

public class Top3Controller {

    @Autowired

    private Top3Dao top3Dao;

    @RequestMapping(value = "/top3",produces = "text/html;charset=utf-8")

    public String top3(Model model) {

     List<Top3Entity> top3 = top3Dao.getTop3();

     model.addAttribute("top3",top3);

     return  "top3";

    }

}

5.4 Create HTML file top3.html

        Create an HTML file top3.html under the templates directory in the project, and use jQuery to obtain the data of the top 3 popular products in each region passed to the HTML by the Model object in the file, and fill the obtained data into the ECharts histogram template to achieve Visual display of Top3 data of popular commodities in each region.

<!DOCTYPE html>

<html lang="en" xmlns:th="http://www.thymeleaf.org">

<head>

    <meta charset="UTF-8">

    <title>top3</title>

    <script src="https://apps.bdimg.com/libs/jquery/2.1.4/jquery.min.js"></script>

    <script src="https://cdn.staticfile.org/echarts/4.3.0/echarts.min.js"></script>

</head>

<body>

……

</body>

</html>

5.5 Run the project to realize the data visualization of Top3 popular products in each region

        Click the [Start] button in IntelliJ IDEA to run the project. After the project is successfully started, enter "http://localhost:8080/top3" in the browser to view the display effect of the Top3 data visualization of popular products in each region.

6. Realize data visualization of page single-hop conversion rate

6.1 Create entity class ConversionEntity

        In order to facilitate the transfer of page single-hop conversion rate data, an entity class ConversionEntity is created in the entity package of the project to store the data of the conversion table in Phoenix.

public class ConversionEntity {

    private String convert_page;

    private String convert_rage;

    // Implement getter and setter methods for properties

    ...

}

6.2 Create database access interface ConversionDao

        Create a database access interface ConversionDao in the dao package of the project to read the data of the conversion table in Phoenix.

import cn.itcast.sparkweb.entity.ConversionEntity;

import org.apache.ibatis.annotations.Mapper;

import org.apache.ibatis.annotations.Select;

import java.util.List;

@Mapper

public interface ConversionDao {

    @Select("select \"convert_page\",\"convert_rage\" from \"conversion\"")     List<ConversionEntity> getConversion();

}

6.3 Create a controller class ConversionController

        Create a controller class ConversionController in the controller package of the project, which is used to implement the method conversion() in the interface ConversionDao to read the data of the table conversion, and transfer the data to HTML through the Model object.

@Controller public class ConversionController {

    @Autowired

    private ConversionDao conversionDao;

    @RequestMapping(value = "/conversion",produces = "text/html;charset=utf-8")

    public String conversion(Model model){

        List<ConversionEntity> conversion = conversionDao.getConversion();

        model.addAttribute("conversion",conversion);

        return "conversion";

    }

}

6.4 Create HTML file conversion.html

        Create the HTML file conversion.html under the templates directory in the project, and use jQuery to obtain the single-hop conversion rate data of the page passed from the Model object to HTML through jQuery, and fill the obtained data into the ECharts histogram template to achieve Visual display of page single-hop conversion rate data.

<!DOCTYPE html>

<html lang="en" xmlns:th="http://www.thymeleaf.org">

<head>

    <meta charset="UTF-8">

    <title>conversion</title>

    <script src="https://apps.bdimg.com/libs/jquery/2.1.4/jquery.min.js"></script>

    <script src="https://cdn.staticfile.org/echarts/4.3.0/echarts.min.js"></script>

</head>

<body>

……

</body>

</html>

6.5 Run the project to realize the data visualization of page single-hop conversion rate

        Click the [Start] button in IntelliJ IDEA to run the project. After the project is successfully started, enter "http://localhost:8080/conversion" in the browser to view the visual display effect of the single-hop conversion rate data on the page.

7. Realize real-time statistical visualization of advertising click stream

7.1 Create entity class AdsEntity

        In order to facilitate the transmission of the real-time statistical result data of the advertisement click stream, the entity class AdsEntity is created in the entity package of the project to store the data of the table adstream in Phoenix.

public class AdsEntity {

    private String city;

    private String ad_count;

    private String ad_id;

    // Implement getter and setter methods for properties

    ...

}

7.2 Create database access interface ConversionDao

        Create a database access interface AdsDao in the dao package of the project to read the data of the table adstream in Phoenix.

import cn.itcast.sparkweb.entity.AdsEntity;

import org.apache.ibatis.annotations.Mapper;

import org.apache.ibatis.annotations.Select;

import java.util.List;

@Mapper public interface AdsDao {

    @Select("select \"city\",\"ad_count\",\"ad_id\" from \"adstream\"")

    List<AdsEntity>

ads();

}

7.3 Create the controller class AdsController

        Create the controller class AdsController in the controller package of the project to implement the method adsData () in the interface AdsDao to read the data of the table adstream, and pass this data to HTML as the return value of the method.

@Controller public class AdsController {

    @Autowired

    private AdsDao adsDao;

    @RequestMapping(value = "/adsdata",method = RequestMethod.POST)     @ResponseBody

    public List<AdsEntity> adsData(){

        List<AdsEntity> ads = adsDao.ads();

        return ads;

    }

}

7.4 Create HTML file ads.html

        Create the HTML file ads.html under the templates directory in the project, in which the real-time statistical data of the advertisement click stream returned by the adsData() method in the controller class AdsController is processed through jQuery's Ajax, and the obtained data is filled in real time In the ECharts histogram template, the visual display of real-time statistics of advertising click stream is realized.

<!DOCTYPE html>

<html lang="en" xmlns:th="http://www.thymeleaf.org">

<head>

    <meta charset="UTF-8">

    <title>ads</title>

    <script src="https://apps.bdimg.com/libs/jquery/2.1.4/jquery.min.js"></script>

    <script src="https://cdn.staticfile.org/echarts/4.3.0/echarts.min.js"></script>

</head>

<body>

……

</body>

</html>

7.5 Run the project to realize real-time statistical visualization of advertising click stream

        Click the [Start] button on the main interface of the project sparkweb to run the project. After the project is successfully started, enter "http://localhost:8080/ads" in the browser to view the real-time statistics and visualization of the advertisement click stream. 

summary

        This article mainly explains how to realize the visual display of data. First, it explains in detail the visualization technology and system architecture, so that readers have a preliminary understanding of data visualization. Then, the data in HBase is mapped to Phoenix by integrating Phoenix and HBase, and the analysis results are obtained by connecting to Phoenix through JDBC. Then, explain how to create and configure a Spring Boot project. Finally, write related classes, interfaces, and HTML pages in the Spring Boot project to realize the visualization of Top3 popular products in each region, Top10 popular categories, page single-hop conversion rates, and real-time statistics of advertising click streams. Through the study of this chapter, readers should master the use of Phoenix and how to realize data visualization through the Spring Boot project.

Ask for attention, likes, and build three companies! ! !

Guess you like

Origin blog.csdn.net/m0_56073435/article/details/130970157