Learning Objectives/Target
Understand data visualization system architecture
Master Phoenix integrated HBase
Familiar with establishing Phoenix and HBase table mapping
Understand the creation of Spring Boot projects
Master the creation of entity classes in Java Web projects
Master the creation of database access interfaces in Java Web projects
Master the creation of controller classes in Java Web projects
Familiar with the creation of HTML pages in Java Web projects
Get familiar with how to run Spring Boot projects
overview
Data visualization refers to the technology of representing data or information as visual objects in graphics to convey data or information. The goal is to clearly and effectively convey information to users so that users can easily understand the complex relationships in data or information. Users can intuitively see the data analysis results through the visual objects in the graph, making it easier to understand business trends or discover new business models. Data visualization is an important step in data analysis. This chapter will explain in detail how to build a data visualization system to display the analysis result data of this project.
1. System overview
1.1 Technology selection
The design purpose of SpringBoot is to simplify the initial construction and development process of Spring applications, get rid of complicated manual configuration, and quickly build a Java Web project.
MyBatis is an open source data persistence layer framework, which internally encapsulates the operation of JDBC to access the database, and supports common SQL queries, stored procedures and advanced mapping.
Echarts is an open source visualization library implemented using JavaScript, which can run smoothly on PCs and mobile devices, and provides a variety of rich visualization types.
Learn more: Phoenix query engine and MyBatis
MyBatis is a data persistence layer framework that supports SQL queries, but the HBase database used in the project does not support JDBC access and SQL query, which makes the data visualization system we built unable to use the MyBatis framework to access the HBase database. Therefore, it is necessary to use the Apache Phoenix query engine to enable HBase to support access through JDBC, and convert SQL queries into related operations of HBase.
1.2 System Architecture
Offline data visualization display
Real-time data visualization display
2. Data table design and implementation
2.1 Data Sheet Introduction
Top10 analysis results table of popular categories user_session_top10
Field Name |
type of data |
Relevant instructions |
ROW |
varchar |
Primary key (corresponding to RowKey in HBase table) |
cartcount |
varchar |
The total number of times items in the category were added to the shopping cart |
category_id |
varchar |
category id |
purchasecount |
varchar |
The total number of times the product in the category was purchased |
viewcount |
varchar |
The total number of times items in the category were viewed |
Top3 analysis results table of popular products in each region user_session_top3
Field Name |
type of data |
Relevant instructions |
ROW |
varchar |
Primary key (corresponding to RowKey in HBase table) |
product_id |
varchar |
commodity id |
viewcount |
varchar |
The total number of times the item was viewed |
area |
varchar |
area name |
Page single-hop conversion rate statistics table conversion
Field Name |
type of data |
Relevant instructions |
ROW |
varchar |
Primary key (corresponding to RowKey in HBase table) |
convert_page |
varchar |
Convert pages (page slices) |
convert_rage |
varchar |
conversion rate |
User advertisement click stream real-time statistical table adstream
Field Name |
type of data |
Relevant instructions |
ROW |
varchar |
Primary key (corresponding to RowKey in HBase table) |
city |
varchar |
city name |
ad_count |
varchar |
Ad clicks |
ad_id |
varchar |
advertising id |
2.2 Phoenix integrates HBase
Install Phoenix and integrate HBase in the virtual machine Spark01.
STEP 01
Download the Phoenix installation package:
Visit the Phoenix official website to download the Phoenix installation package apache-phoenix-4.14.1-HBase-1.2-bin.tar.gz for the Linux operating system.
STEP 02
Upload the Phoenix installation package:
Use the SecureCRT remote connection tool to connect to the virtual machine Spark01, and execute the "rz" command in the directory /export/software/ where the application installation package is stored to upload the Phoenix installation package.
STEP 03
Install Phoenix:
Install Phoenix by decompression, and install Phoenix to the directory /export/servers/ where the application is stored.
tar -zxvf /export/software/apache-phoenix-4.14.1-HBase-1.2-bin.tar.gz -C /export/servers/
STEP 04
Phoenix integrates HBase (copy jar package):
Enter the Phoenix installation directory, and copy phoenix-core-4.14.1-HBase-1.2.jar and phoenix-4.14.1-HBase-1.2-client.jar to the lib directory of the HBase installation directory.
$ cd /export/servers/apache-phoenix-4.14.1-HBase-1.2-bin/
$ cp {phoenix-core-4.14.1-HBase-1.2.jar,phoenix-4.14.1-HBase-1.2-client.jar} /export/servers/hbase-1.2.1/lib/
STEP 05
Phoenix integrates HBase (shut down the HBase cluster):
Run the stop-hbase.sh command to shut down the HBase cluster.
STEP 06
Phoenix integrates HBase (modify the HBase configuration file):
Enter the conf directory under the HBase installation directory, execute the "vi hbase-site.xml" command to edit the hbase-site.xml file, and add the namespace mapping configuration.
<property>
<name>phoenix.schema.isNamespaceMappingEnabled</name>
<value>true</value>
</property>
<property>
<name>phoenix.schema.mapSystemTablesToNamespace</name>
<value>true</value>
</property>
STEP 07
Phoenix integrated HBase (distribution file):
Distribute the HBase installation directory to the other two virtual machines Spark02 and Spark03 in the cluster.
scp -r /export/servers/hbase-1.2.1/ root@spark02:/export/servers/
scp -r /export/servers/hbase-1.2.1/ root@spark03:/export/servers/
STEP 08
Phoenix integrates HBase (copy the HBase configuration file):
Enter the conf directory under the HBase installation directory, and copy the hbase-site.xml file to the bin directory under the Phoenix installation directory.
cp hbase-site.xml /export/servers/apache-phoenix-4.14.1-HBase-1.2-bin/bin/
STEP 09
Phoenix integrates HBase (start HBase cluster):
Execute the "start-hbase.sh" command to start the HBase cluster.
Before starting the HBase cluster, ensure that the Hadoop and Zookeeper clusters start normally, and that the time of each service in the cluster is consistent. If the time is inconsistent, you need to execute the "systemctl restart chronyd" command on each server to restart the chronyd service for time synchronization.
2.3 Establish Phoenix and HBase table mapping
Operate Phoenix and establish the mapping between Phoenix and HBase table.
Phoenix provides three operation modes, namely command line interface, JDBC and Squirrel, among which the command line interface is the interactive tool sqlline provided by Phoenix by default; JDBC is an application programming interface used in the Java language to standardize how client programs access the database; Squirrel It is Phoenix's client tool that provides a visual operation window.
Connect to Phoenix:
There is a Python script file sqlline.py in the bin directory of the Phoenix installation directory to start sqlline. When starting sqlline, you need to enter the Zookeeper cluster address and port number to connect to Phoenix.
#Enter the Phoenix installation directory $ cd /export/servers/apache-phoenix-4.14.1-HBase-1.2-bin
# start sqlline
$ bin/sqlline.py spark01,spark02,spark03:2181
View Phoenix tables and views:
Execute the "!table" command in sqlline to view Phoenix tables and views.
Create table mapping (top10):
Create table top10 in Phoenix through the CREATE statement to establish a mapping with table top10 in the HBase database.
> create table "top10"
> (
> "ROW" varchar primary key,
> "top10_category"."cartcount" varchar,
> "top10_category"."category_id" varchar ,
> "top10_category"."purchasecount" varchar ,
> "top10_category"."viewcount" varchar
> ) column_encoded_bytes=0;
Create table mapping (top3). Create table top3 in Phoenix through the CREATE statement to establish a mapping with table top3 in the HBase database.
> create table "top3"
> (
> "ROW" varchar primary key,
> "top3_area_product"."product_id" varchar,
> "top3_area_product"."viewcount" varchar,
> "top3_area_product"."area" varchar
> ) column_encoded_bytes=0;
Create table mapping (conversion):
Use the CREATE statement to create the table conversion in Phoenix to establish a mapping with the table conversion in the HBase database.
> create table "conversion"
> (
> "ROW" varchar primary key,
> "page_conversion"."convert_page" varchar,
> "page_conversion"."convert_rage" varchar
> ) column_encoded_bytes=0;
Create table mapping (adstream):
Create the table adstream in Phoenix through the CREATE statement to establish a mapping with the table adstream in the HBase database.
> create table "adstream"
> (
> "ROW" varchar primary key,
> "area_ads_count"."city" varchar,
> "area_ads_count"."ad_count" varchar,
> "area_ads_count"."ad_id" varchar
> ) column_encoded_bytes=0;
Phoenix commands are case-sensitive. If you do not add double quotes, the default is uppercase. Therefore, when executing the command to create a table in Phoenix, you need to add double quotes around the table name, column family name, and column name.
If the delete table operation is performed in Phoenix, the tables with mapping relationships in HBase will also be deleted together, resulting in data loss. If the mapping table created in Phoenix is only used for query operations, it is recommended to use the method of creating a view to establish the mapping. The method of establishing the view mapping is the same as the method of establishing the table mapping. Here, the mapping of the view adstream is used as an example.
> create view "adstream"
> ( > "ROW" varchar primary key,
> "area_ads_count"."city" varchar,
> "area_ads_count"."ad_count" varchar,
> "area_ads_count"."ad_id" varchar
> );
If you want to delete a table in Phoenix, but do not want to delete the mapping table in HBase at the same time as deleting the Phoenix table, resulting in data loss, you can create a snapshot of the mapping table in HBase before performing the table deletion operation in Phoenix.
disable 'mapping table'
#Close table snapshot 'mapping table', 'snapshot name'
#create snapshot
After deleting the table in Phoenix, use the created snapshot to restore the mapping table in HBase.
list_snapshots #Query all snapshots
clone_snapshot 'snapshot name', 'mapping table' #clone the snapshot to a new table
3. Create a Spring Boot project
Create and configure the Spring Boot project through the IntelliJ IDEA development tool to lay the foundation for the realization of the data visualization system.
Step 1: Create a project
Open the IntelliJ IDEA development tool, use Spring Initializr to initialize the Spring Boot project, and build the Spring Boot project structure.
Select the JDK version to use
Step 2: Configure project information
Configure basic project information on the Project Metadata interface.
Project Organization Unique Identifier
Item unique identifier
JDK version
Step 3: Configure project dependencies
Configure project dependencies on the "Dependencles" interface.
Choose the version to use Spring Boot
Add Spring Web dependency
Step 4: Configure project name and directory
configuration item name
Configuration project directory
Step 5: Initialize the project
Step 6: Directory structure after initialization
The Spring Boot project will generate the project startup class by default
Static resource folder (static)
Template page folder (templates)
Project global configuration file (application.properties)
The Spring Boot project will generate project test classes by default
Step 7: Adjust the project directory structure
In order to facilitate the distinction of different types of functions in the project, the default directory structure of the project is adjusted here, and the package entity for storing entity classes, the package dao for storing data access interfaces, and storage control are created under the package "cn.itcast.sparkweb" The package controller of the controller class.
Step 8: Configure project dependencies
The dependencies required for this project include Thymeleaf, Tomcat, Phoenix, MyBatis, and Joda-Time. Among them, Thymeleaf is a template engine for Java Web application development; Tomcat is a Web container for running Java Web applications; Phoenix is used to operate Phoenix through Java API in projects; MyBatis is used to use MyBatis framework in projects; Joda- Time is a Java date-time processing library.
Step 8: Configure the project global configuration file
Configure the global configuration file application.properties in the resources directory of the project, and add the following configuration content:
#Set up the JDBC driver to connect to Phoenix
spring.datasource.driver-class-name=org.apache.phoenix.jdbc.PhoenixDriver
#Set Phoenix connection address and port number spring.datasource.url=jdbc:phoenix:192.168.121.132,192.168.121.133,192.168.121.134:2181 #Set Thymeleaf template path
spring.thymeleaf.prefix=classpath:/templates/
# Set the Thymeleaf template suffix name
spring.thymeleaf.suffix=.html
4. Realize the top 10 data visualization of popular categories
4.1 Create entity class Top10Entity
In order to facilitate the transfer of analysis result data of the popular category Top10, the entity class Top10Entity is created in the entity package of the project to store the data of table top10 in Phoenix.
public class Top10Entity {
private String cartcount;
private String category_id;
private String purchasecount;
private String viewcount;
//Implement the getter/setter method of the property
...
}
4.2 Create database access interface Top10Dao
Create a database access interface Top10Dao in the dao package of the project to read the data of table top10 in Phoenix.
import cn.itcast.sparkweb.entity.Top10Entity;
import org.apache.ibatis.annotations.Mapper;
import org.apache.ibatis.annotations.Select;
import java.util.List;
@Mapper
public interface Top10Dao {
@Select("select \"cartcount\",\"category_id\",\"purchasecount\",\"viewcount\" from \"top10\"")
List<Top10Entity> getTop10();
}
4.3 Create a controller class Top10Controller
Create the controller class Top10Controller in the controller package of the project to implement the method getTop10() in the interface Top10Dao to read the data of table top10 and transfer the data to HTML through the Model object.
@Controller public class Top10Controller {
@Autowired
private Top10Dao top10Dao;
@RequestMapping(value = "/top10",produces = "text/html;charset=utf-8")
public String top10(Model model) {
List<Top10Entity> top10 = top10Dao.getTop10(); model.addAttribute("top10",top10);
return "top10";
}
}
If the Autowired annotation is added to the interface in Top10Controller, the program reports an error, and the content of the error report is "Could not autowire. No beans of 'Top10Dao' type found." This is caused by the built-in inspection tool of IntelliJ IDEA and does not affect the program Start and compile, you can eliminate this problem by referring to the content shown in the figure.
4.4 Create HTML file top10.html
Create an HTML file top10.html in the templates directory of the project, and use jQuery to obtain the data of the popular category Top10 passed from the Model object to the HTML in the file, and fill the obtained data into the ECharts histogram template to realize the popular category Visual display of Top10 data.
<!DOCTYPE html>
<html lang="en" xmlns:th="http://www.thymeleaf.org">
<head>
<meta charset="UTF-8">
<title>top10</title>
<script src="https://apps.bdimg.com/libs/jquery/2.1.4/jquery.min.js"></script>
<script src="https://cdn.staticfile.org/echarts/4.3.0/echarts.min.js"></script>
</head>
<body> …… </body>
</html>
4.5 Run the project to realize the top 10 data visualization of popular categories
In order to avoid the problem that JDBC cannot operate Phoenix, you need to create the Hbase-site.xml file in the resources directory of the project before running the project, add the open namespace and support secondary index configuration in the file.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>phoenix.schema.isNamespaceMappingEnabled</name>
<value>true</value>
<description>Enable Namespace</description>
</property>
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
<description>Support secondary index</description>
</property>
</configuration>
Click the [Start] button in IntelliJ IDEA to run the project.
You can view the startup information of the project in the IntelliJ IDEA console.
Enter "http://localhost:8080/top10" in the browser to view the display effect of Top10 data visualization of popular categories.
First, install Hadoop in the Windows operating system by decompressing it; then, add Hadoop to the system environment variables, as shown in the figure.
Edit the mapping file host in the C:\Windows\System32\drivers\etchost directory of the Windows operating system, and add the following content.
192.168.121.132 spark01
192.168.121.133 spark02
192.168.121.134 spark03
5. Realize data visualization of Top3 popular commodities in each region
5.1 Create entity class Top3Entity
In order to facilitate the transfer of the top3 analysis result data of popular products in each region, the entity class Top3Entity is created in the entity package of the project to store the data of the table top3 in Phoenix.
public class Top3Entity {
private String product_id;
private String viewcount;
private String area;
// Implement getter and setter methods for properties
...
}
5.2 Create database access interface Top3Dao
Create a database access interface Top3Dao in the dao package of the project to read the data of table top3 in Phoenix.
import cn.itcast.sparkweb.entity.Top3Entity;
import org.apache.ibatis.annotations.Mapper;
import org.apache.ibatis.annotations.Select;
import java.util.List;
@Mapper
public interface Top3Dao {
@Select("select \"product_id\",\"viewcount\",\"area\" from \"top3\"")
List<Top3Entity> getTop3();
}
5.3 Create a controller class Top3Controller
Create the controller class Top3Controller in the controller package of the project, which is used to implement the method getTop3() in the interface Top3Dao to read the data of table top3, and transfer the data to HTML through the Model object.
@Controller
public class Top3Controller {
@Autowired
private Top3Dao top3Dao;
@RequestMapping(value = "/top3",produces = "text/html;charset=utf-8")
public String top3(Model model) {
List<Top3Entity> top3 = top3Dao.getTop3();
model.addAttribute("top3",top3);
return "top3";
}
}
5.4 Create HTML file top3.html
Create an HTML file top3.html under the templates directory in the project, and use jQuery to obtain the data of the top 3 popular products in each region passed to the HTML by the Model object in the file, and fill the obtained data into the ECharts histogram template to achieve Visual display of Top3 data of popular commodities in each region.
<!DOCTYPE html>
<html lang="en" xmlns:th="http://www.thymeleaf.org">
<head>
<meta charset="UTF-8">
<title>top3</title>
<script src="https://apps.bdimg.com/libs/jquery/2.1.4/jquery.min.js"></script>
<script src="https://cdn.staticfile.org/echarts/4.3.0/echarts.min.js"></script>
</head>
<body>
……
</body>
</html>
5.5 Run the project to realize the data visualization of Top3 popular products in each region
Click the [Start] button in IntelliJ IDEA to run the project. After the project is successfully started, enter "http://localhost:8080/top3" in the browser to view the display effect of the Top3 data visualization of popular products in each region.
6. Realize data visualization of page single-hop conversion rate
6.1 Create entity class ConversionEntity
In order to facilitate the transfer of page single-hop conversion rate data, an entity class ConversionEntity is created in the entity package of the project to store the data of the conversion table in Phoenix.
public class ConversionEntity {
private String convert_page;
private String convert_rage;
// Implement getter and setter methods for properties
...
}
6.2 Create database access interface ConversionDao
Create a database access interface ConversionDao in the dao package of the project to read the data of the conversion table in Phoenix.
import cn.itcast.sparkweb.entity.ConversionEntity;
import org.apache.ibatis.annotations.Mapper;
import org.apache.ibatis.annotations.Select;
import java.util.List;
@Mapper
public interface ConversionDao {
@Select("select \"convert_page\",\"convert_rage\" from \"conversion\"") List<ConversionEntity> getConversion();
}
6.3 Create a controller class ConversionController
Create a controller class ConversionController in the controller package of the project, which is used to implement the method conversion() in the interface ConversionDao to read the data of the table conversion, and transfer the data to HTML through the Model object.
@Controller public class ConversionController {
@Autowired
private ConversionDao conversionDao;
@RequestMapping(value = "/conversion",produces = "text/html;charset=utf-8")
public String conversion(Model model){
List<ConversionEntity> conversion = conversionDao.getConversion();
model.addAttribute("conversion",conversion);
return "conversion";
}
}
6.4 Create HTML file conversion.html
Create the HTML file conversion.html under the templates directory in the project, and use jQuery to obtain the single-hop conversion rate data of the page passed from the Model object to HTML through jQuery, and fill the obtained data into the ECharts histogram template to achieve Visual display of page single-hop conversion rate data.
<!DOCTYPE html>
<html lang="en" xmlns:th="http://www.thymeleaf.org">
<head>
<meta charset="UTF-8">
<title>conversion</title>
<script src="https://apps.bdimg.com/libs/jquery/2.1.4/jquery.min.js"></script>
<script src="https://cdn.staticfile.org/echarts/4.3.0/echarts.min.js"></script>
</head>
<body>
……
</body>
</html>
6.5 Run the project to realize the data visualization of page single-hop conversion rate
Click the [Start] button in IntelliJ IDEA to run the project. After the project is successfully started, enter "http://localhost:8080/conversion" in the browser to view the visual display effect of the single-hop conversion rate data on the page.
7. Realize real-time statistical visualization of advertising click stream
7.1 Create entity class AdsEntity
In order to facilitate the transmission of the real-time statistical result data of the advertisement click stream, the entity class AdsEntity is created in the entity package of the project to store the data of the table adstream in Phoenix.
public class AdsEntity {
private String city;
private String ad_count;
private String ad_id;
// Implement getter and setter methods for properties
...
}
7.2 Create database access interface ConversionDao
Create a database access interface AdsDao in the dao package of the project to read the data of the table adstream in Phoenix.
import cn.itcast.sparkweb.entity.AdsEntity;
import org.apache.ibatis.annotations.Mapper;
import org.apache.ibatis.annotations.Select;
import java.util.List;
@Mapper public interface AdsDao {
@Select("select \"city\",\"ad_count\",\"ad_id\" from \"adstream\"")
List<AdsEntity>
ads();
}
7.3 Create the controller class AdsController
Create the controller class AdsController in the controller package of the project to implement the method adsData () in the interface AdsDao to read the data of the table adstream, and pass this data to HTML as the return value of the method.
@Controller public class AdsController {
@Autowired
private AdsDao adsDao;
@RequestMapping(value = "/adsdata",method = RequestMethod.POST) @ResponseBody
public List<AdsEntity> adsData(){
List<AdsEntity> ads = adsDao.ads();
return ads;
}
}
7.4 Create HTML file ads.html
Create the HTML file ads.html under the templates directory in the project, in which the real-time statistical data of the advertisement click stream returned by the adsData() method in the controller class AdsController is processed through jQuery's Ajax, and the obtained data is filled in real time In the ECharts histogram template, the visual display of real-time statistics of advertising click stream is realized.
<!DOCTYPE html>
<html lang="en" xmlns:th="http://www.thymeleaf.org">
<head>
<meta charset="UTF-8">
<title>ads</title>
<script src="https://apps.bdimg.com/libs/jquery/2.1.4/jquery.min.js"></script>
<script src="https://cdn.staticfile.org/echarts/4.3.0/echarts.min.js"></script>
</head>
<body>
……
</body>
</html>
7.5 Run the project to realize real-time statistical visualization of advertising click stream
Click the [Start] button on the main interface of the project sparkweb to run the project. After the project is successfully started, enter "http://localhost:8080/ads" in the browser to view the real-time statistics and visualization of the advertisement click stream.
summary
This article mainly explains how to realize the visual display of data. First, it explains in detail the visualization technology and system architecture, so that readers have a preliminary understanding of data visualization. Then, the data in HBase is mapped to Phoenix by integrating Phoenix and HBase, and the analysis results are obtained by connecting to Phoenix through JDBC. Then, explain how to create and configure a Spring Boot project. Finally, write related classes, interfaces, and HTML pages in the Spring Boot project to realize the visualization of Top3 popular products in each region, Top10 popular categories, page single-hop conversion rates, and real-time statistics of advertising click streams. Through the study of this chapter, readers should master the use of Phoenix and how to realize data visualization through the Spring Boot project.
Ask for attention, likes, and build three companies! ! !