Getting Started with Superset Data Exploration and Visualization Platform and Case Practice

1. Superset background

1.1. Overview of Superset

Apache Superset is a modern data exploration and visualization platform. It is powerful and easy to use. It can connect to various data sources, including many modern big data analysis engines. It has rich chart display forms and supports custom dashboards.

insert image description here

1.2. Environmental Description

The server operating system used in this case is CentOS 7, and the data source connected to Superset is the MySQL database.

2. Superset installation

Superset official website address: http://superset.apache.org/

2.1. Install the Python environment

Superset is a web application written in Python language and requires a Python3.7 environment.

2.1.1. Install Miniconda

Conda is an open source package and environment manager that can be used to install different Python versions of software packages and their dependencies on the same machine, and can switch between different Python environments. Anaconda includes Conda, Python, and a lot of Installed toolkits, such as: numpy, pandas, etc. Miniconda includes Conda and Python. Here, there is no need for so many toolkits, so choose MiniConda.

2.1.1.1. Download Miniconda (Python3 version)

download link:

2.1.1.2. Install Miniconda

  1. Execute the following command to install and follow the prompts until the installation is complete.
[song@hadoop102 lib]$ bash Miniconda3-latest-Linux-x86_64.sh
  1. During the installation process, when the following prompt appears, you can specify the installation path
    insert image description here

  2. When the following words appear, the installation is complete
    insert image description here

2.1.1.3. Load the environment variable configuration file to make it take effect

[song@hadoop102 lib]$ source ~/.bashrc

insert image description here

2.1.1.4. Deactivate the base environment

After the installation of Miniconda is complete, its default base environment will be activated every time the terminal is opened. We can disable the activation of the default base environment through the following command.

[song@hadoop102 lib]$ conda config --set auto_activate_base false

insert image description here

2.1.2, create a Python3.7 environment

2.1.2.1. Configure conda domestic image

[song@hadoop102 ~]$ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
[song@hadoop102 ~]$ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[song@hadoop102 ~]$ conda config --set show_channel_urls yes

2.1.2.2, create a Python3.7 environment

[atguigu@hadoop102 ~]$ conda create --name superset python=3.7
  • Description: Common commands for conda environment management
  • Create environment: conda create -n env_name
  • View all environments: conda info --envs
  • Delete an environment: conda remove -n env_name --all

2.1.2.3, activate the superset environment

[song@hadoop102 ~]$ conda activate superset

insert image description here

2.1.2.4, exit the superset environment

(superset) [song@hadoop102 ~]$ conda deactivate

insert image description here

2.1.2.5, Execute the python command to view the python version

insert image description here

2.2. Superset deployment

2.2.1. Installation dependencies

Before installing Superset, the following dependencies need to be installed.

(superset) [song@hadoop102 ~]$ sudo yum install -y gcc gcc-c++ libffi-devel python-devel python-pip python-wheel python-setuptools openssl-devel cyrus-sasl-devel openldap-devel

2.2.2. Install Superset

2.2.2.1. Install (update) setuptools and pip

(superset) [song@hadoop102 ~]$ pip install --upgrade setuptools pip -i https://pypi.douban.com/simple/

Description: pip is a package management tool for python, which can be compared with yum in centos.

2.2.2.2. Install Supetset

(superset) [song@hadoop102 ~]$ pip install apache-superset -i https://pypi.douban.com/simple/

Explanation: The function of -i is to specify the image, here select the domestic image.
Note: If you encounter a network error and cannot download, you can try to replace the mirror.

(superset) [song@hadoop102 ~]$  pip install apache-superset --trusted-host https://repo.huaweicloud.com -i https://repo.huaweicloud.com/repository/pypi/simple

2.2.2.3. Initialize the Supetset database

(superset) [song@hadoop102 ~]$ superset db upgrade

If the database is initialized, an error is reported as follows.
insert image description here
Execute the following command to roll back the version that markupsafe depends on to 2.0.1.

(superset) [song@hadoop102 ~]$ pip install --force-reinstall MarkupSafe==2.0.1

Create an admin user

(superset) [song@hadoop102 ~]$ export FLASK_APP=superset
(superset) [song@hadoop102 ~]$ superset fab create-admin

Description: flask is a python web framework, Superset uses flask, and will create a username and password here

insert image description here
Superset initialization

(superset) [song@hadoop102 ~]$ superset init

insert image description here

2.2.3. Start Supterset

2.2.3.1. Install gunicorn

(superset) [song@hadoop102 ~]$ pip install gunicorn -i https://pypi.douban.com/simple/

insert image description here
gunicorn is a Python Web Server, which can be compared with TomCat in java.

2.2.3.2, start Superset

  1. Make sure the current conda environment is superset
  2. start up
(superset) [song@hadoop102 ~]$ gunicorn --workers 5 --timeout 120 --bind hadoop102:8787  "superset.app:create_app()" --daemon 
  1. Log in to Superset
    to access http://hadoop102:8787, and log in with the previously created administrator account.
    insert image description here

  2. stop superset

Stop the gunicorn process.

(superset) [song@hadoop102 ~]$ ps -ef | awk '/superset/ && !/awk/{print $2}' | xargs kill -9

Exit the superset environment.

(superset) [song@hadoop102 ~]$ conda deactivate

2.2.3.3. Write Superset start and stop scripts

  1. write content
#!/bin/bash

superset_status(){
    
    
    result=`ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | wc -l`
    if [[ $result -eq 0 ]]; then
        return 0
    else
        return 1
    fi
}
superset_start(){
    
    
        source ~/.bashrc
        superset_status >/dev/null 2>&1
        if [[ $? -eq 0 ]]; then
            conda activate superset ; gunicorn --workers 5 --timeout 120 --bind hadoop102:8787 --daemon 'superset.app:create_app()'
        else
            echo "superset正在运行"
        fi

}

superset_stop(){
    
    
    superset_status >/dev/null 2>&1
    if [[ $? -eq 0 ]]; then
        echo "superset未在运行"
    else
        ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | xargs kill -9
    fi
}


case $1 in
    start )
        echo "启动Superset"
        superset_start
    ;;
    stop )
        echo "停止Superset"
        superset_stop
    ;;
    restart )
        echo "重启Superset"
        superset_stop
        superset_start
    ;;
    status )
        superset_status >/dev/null 2>&1
        if [[ $? -eq 0 ]]; then
            echo "superset未在运行"
        else
            echo "superset正在运行"
        fi
esac
  1. Add permissions
chmod +x superset.sh

3. Use of Superset

3.1. Connect to MySQL data source

3.1.1. Installation dependencies

(superset) [atguigu@hadoop102 ~]$ conda install mysqlclient

Note: To connect to different data sources, different dependencies need to be installed. The following address is the official website description.

https://superset.apache.org/docs/databases/installing-database-drivers

3.1.2. Restart Superset

(superset) [atguigu@hadoop102 ~]$ superset.sh restart

3.1.3. Data source configuration

3.1.3.1, Database configuration

  1. Click Data/Databases.
    insert image description here
  2. Click + DATABASE
    insert image description here
  3. Click to fill in Database and SQL Alchemy URI

Note: SQL Alchemy URI writing specification: mysql://username:password@hostname:port number/database name.
Fill in here:
mysql://root:000000@hadoop102:3306/gmall_report?charset=utf8

  1. test connection
    insert image description here
  2. click save
    insert image description here

3.1.3.2, Table configuration

  1. Click Data/Datasets
    insert image description here

  2. Click Data/Datasets
    insert image description here

insert image description here

3.2. Make dashboard

3.2.1. Create a blank dashboard

  1. Click Dashboards/+DASHBOARDS
    insert image description here

  2. name and save
    insert image description here
    insert image description here

3.2.2. Create a chart

  1. Click Charts/+CHART
    insert image description here

  2. Choose what type of chart to use and create a chart
    insert image description here

  3. Follow the instructions to configure the chart
    insert image description here

  4. Click "Run Query"
    insert image description here

  5. If the configuration is correct, the following icons will appear
    insert image description here

  6. Name the chart and save it to the dashboard
    insert image description here
    insert image description here

3.2.3. Edit dashboard

  1. Open the dashboard and click the edit button
    insert image description here

  2. Adjust the chart size and chart layout
    insert image description here

  3. Click the arrow in the figure below to adjust the automatic refresh time of the dashboard
    insert image description here

Guess you like

Origin blog.csdn.net/prefect_start/article/details/129406881