Data Warehouse Visualization Tool: Superset

Chapter 1 Getting Started with Superset

1.1 Superset overview

Apache Superset is a modern data exploration and visualization platform. It is powerful and easy to use. It can connect to various data sources, including many modern big data analysis engines. It has rich chart display forms and supports custom dashboards.

1.2 Environmental Description

The server operating system used in this course is CentOS 7, and the data source connected to Superset is the MySQL database.

Chapter 2 Superset Installation

Superset official website address: http://superset.apache.org/

2.1 Install the Python environment

Superset is a web application written in Python language and requires a Python3.7 environment.

2.1.1 Install Miniconda

Conda is an open source package and environment manager that can be used to install different Python versions of software packages and their dependencies on the same machine, and can switch between different Python environments. Anaconda includes Conda, Python, and a lot of Installed toolkits, such as: numpy, pandas, etc. Miniconda includes Conda and Python.

Here, we don't need so many toolkits, so we choose MiniConda.

1 ) Download Miniconda (Python3 version)

Download address: https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

2 ) Install Miniconda

(1) Execute the following command to install, and follow the prompts until the installation is complete.

[wyr@hadoop102 lib]$ bash Miniconda3-latest-Linux-x86_64.sh

(2) During the installation process, when the following prompt appears, you can specify the installation path

(3) When the following words appear, the installation is complete

3 ) Load the environment variable configuration file to make it effective

[wyr@hadoop102 lib]$ source ~/.bashrc

4 ) Deactivate the base environment

After the installation of Miniconda is complete, its default base environment will be activated every time the terminal is opened. We can disable the activation of the default base environment through the following command.

[wyr@hadoop102 lib]$ conda config --set auto_activate_base false

2.1.2 Entering and exiting the environment

  • Description: Common commands for conda environment management
  • Create environment: conda create -n env_name
  • View all environments: conda info --envs
  • Remove an environment: conda remove -n env_name --all

Activate the base environment:

[wyr@hadoop102 ~]$ conda activate base

Exit the current environment:

(base) [wyr@hadoop102 ~]$ conda deactivate

2.1.4 Environment for installing other versions of python

The latest version of superset needs to install python3.9 or above, and the built-in superset environment cannot meet our needs, so create a special environment for superset:

[wir@hadoop102 ~]$ conda create -n superset python=3.9

2.2 Superset deployment

2.2.1 Installation dependencies

Enter the superset environment.

Before installing Superset, the following dependencies need to be installed.

(superset) [wyr@hadoop102 ~]$ sudo yum install -y gcc gcc-c++ libffi-devel python-devel python-pip python-wheel python-setuptools openssl-devel cyrus-sasl-devel openldap-devel

2.2.2 Install Superset

1 ) Install (update) setuptools and pip

(superset) [wyr@hadoop102 ~]$ pip install --upgrade setuptools pip -i https://pypi.douban.com/simple/

Description: pip is a package management tool for python, which can be compared with yum in centos.

2 ) Install Supetset

(superset) [wyr@hadoop102 ~]$ pip install apache-superset==1.4.2 -i https://pypi.douban.com/simple/

Explanation: The function of -i is to specify the image, here select the domestic image.

Note: If you encounter a network error and cannot download, you can try to replace the mirror.

(superset) [wyr@hadoop102 ~]$  pip install apache-superset --trusted-host https://repo.huaweicloud.com -i https://repo.huaweicloud.com/repository/pypi/simple

3 ) Initialize the Supetset database

Run the following command to roll back the version that markupsafe depends on to 2.0.1.

(superset) [wyr@hadoop102 ~]$ pip install --force-reinstall MarkupSafe==2.0.1

Initialize the database :

(superset) [wyr@hadoop102 ~]$ export FLASK_APP=superset

(superset) [wyr@hadoop102 ~]$ superset db upgrade

4 ) Create an admin user

(superset) [wyr@hadoop102 ~]$ superset fab create-admin

Description: flask is a python web framework, and Superset uses flask.

5 ) Superset initialization

(superset) [wyr@hadoop102 ~]$ superset init

2.2.3 Start Supterset

1 ) Install gunicorn

(superset) [wyr@hadoop102 ~]$ pip install gunicorn -i https://pypi.douban.com/simple/

Description: gunicorn is a Python Web Server, which can be compared with TomCat in java.

2 ) Start Superset

(superset) [wyr@hadoop102 ~]$ gunicorn --workers 5 --timeout 120 --bind hadoop102:8787  "superset.app:create_app()" 

illustrate:

  • workers : Specify the number of processes
  • timeout : The worker process timeout time, the timeout will automatically restart
  • bind : Bind the local machine address, which is the Superset access address

3 ) Login to Superset

Visit http://hadoop102:8787 and log in with the administrator account created in step 4 of Section 2.2.2.

Chapter 3 Using Superset

3.1 Connect to MySQL data source

3.1.1 Installation dependencies

(superset) [wyr@hadoop102 ~]$ conda install mysqlclient

Note: To connect to different data sources, different dependencies need to be installed. The following address is the official website description.

https://superset.apache.org/docs/datasupersets/installing-datasuperset-drivers

Then you need to restart SuperSet to load the driver just installed

3.1.2 Data source configuration

1 ) Datasuperset configuration

Step1 : Click Data/Datasupersets.

Step2 : Click + DATA superset.

Step3 : Click to fill in Datasuperset and SQL Alchemy URI

Note: SQL Alchemy URI writing specification: mysql://username:password@hostname:port number/database name.

Fill in here:

mysql://root:000000@hadoop102:3306/gmall_report?charset=utf8

Step4 : Click Test Connection, and the prompt "Connection looks good!" appears, which means the connection is successful

Step5 : Click ADD

2 ) Table configuration

Step1 : Click Data/Datasets

Step2 : Click Data/ Datasets

Step3 : Configure Table

3.2 Make a Dashboard

3.2.1 Create a blank dashboard

1 ) Click Dashboards/+DASHBOARDS

2 ) Name and save

3.2.2 Create a chart

1 ) Click Charts/+CHART

2 ) Select the data source and chart type

3 ) Select any chart type

4 ) Create the graph

5 ) Follow the instructions to configure the graph

6 ) Click "Run Query "

7 ) If the configuration is correct, the following icons will appear

8 ) Name the chart and save it to the dashboard

 

3.2.3 Edit Dashboard

1 ) Open the dashboard and click the edit button

2 ) Adjust the chart size and chart layout

3 ) Click the arrow in the figure below to adjust the automatic refresh time of the dashboard

Chapter 4 Superset in Action

4.1 Making a map

4.1.1 Configuring Tables

4.1.2 Configure Chart

 

4.2 Making a pie chart

4.2.1 Configuring Tables

4.2.2 Configure Chart

Guess you like

Origin blog.csdn.net/u013250861/article/details/130072603