1. Superset background
1.1. Overview of Superset
Apache Superset is a modern data exploration and visualization platform. It is powerful and easy to use. It can connect to various data sources, including many modern big data analysis engines. It has rich chart display forms and supports custom dashboards.
1.2. Environmental Description
The server operating system used in this case is CentOS 7, and the data source connected to Superset is the MySQL database.
2. Superset installation
Superset official website address: http://superset.apache.org/
2.1. Install the Python environment
Superset is a web application written in Python language and requires a Python3.7 environment.
2.1.1. Install Miniconda
Conda is an open source package and environment manager that can be used to install different Python versions of software packages and their dependencies on the same machine, and can switch between different Python environments. Anaconda includes Conda, Python, and a lot of Installed toolkits, such as: numpy, pandas, etc. Miniconda includes Conda and Python. Here, there is no need for so many toolkits, so choose MiniConda.
2.1.1.1. Download Miniconda (Python3 version)
2.1.1.2. Install Miniconda
- Execute the following command to install and follow the prompts until the installation is complete.
[song@hadoop102 lib]$ bash Miniconda3-latest-Linux-x86_64.sh
-
During the installation process, when the following prompt appears, you can specify the installation path
-
When the following words appear, the installation is complete
2.1.1.3. Load the environment variable configuration file to make it take effect
[song@hadoop102 lib]$ source ~/.bashrc
2.1.1.4. Deactivate the base environment
After the installation of Miniconda is complete, its default base environment will be activated every time the terminal is opened. We can disable the activation of the default base environment through the following command.
[song@hadoop102 lib]$ conda config --set auto_activate_base false
2.1.2, create a Python3.7 environment
2.1.2.1. Configure conda domestic image
[song@hadoop102 ~]$ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
[song@hadoop102 ~]$ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[song@hadoop102 ~]$ conda config --set show_channel_urls yes
2.1.2.2, create a Python3.7 environment
[atguigu@hadoop102 ~]$ conda create --name superset python=3.7
- Description: Common commands for conda environment management
- Create environment: conda create -n env_name
- View all environments: conda info --envs
- Delete an environment: conda remove -n env_name --all
2.1.2.3, activate the superset environment
[song@hadoop102 ~]$ conda activate superset
2.1.2.4, exit the superset environment
(superset) [song@hadoop102 ~]$ conda deactivate
2.1.2.5, Execute the python command to view the python version
2.2. Superset deployment
2.2.1. Installation dependencies
Before installing Superset, the following dependencies need to be installed.
(superset) [song@hadoop102 ~]$ sudo yum install -y gcc gcc-c++ libffi-devel python-devel python-pip python-wheel python-setuptools openssl-devel cyrus-sasl-devel openldap-devel
2.2.2. Install Superset
2.2.2.1. Install (update) setuptools and pip
(superset) [song@hadoop102 ~]$ pip install --upgrade setuptools pip -i https://pypi.douban.com/simple/
Description: pip is a package management tool for python, which can be compared with yum in centos.
2.2.2.2. Install Supetset
(superset) [song@hadoop102 ~]$ pip install apache-superset -i https://pypi.douban.com/simple/
Explanation: The function of -i is to specify the image, here select the domestic image.
Note: If you encounter a network error and cannot download, you can try to replace the mirror.
(superset) [song@hadoop102 ~]$ pip install apache-superset --trusted-host https://repo.huaweicloud.com -i https://repo.huaweicloud.com/repository/pypi/simple
2.2.2.3. Initialize the Supetset database
(superset) [song@hadoop102 ~]$ superset db upgrade
If the database is initialized, an error is reported as follows.
Execute the following command to roll back the version that markupsafe depends on to 2.0.1.
(superset) [song@hadoop102 ~]$ pip install --force-reinstall MarkupSafe==2.0.1
Create an admin user
(superset) [song@hadoop102 ~]$ export FLASK_APP=superset
(superset) [song@hadoop102 ~]$ superset fab create-admin
Description: flask is a python web framework, Superset uses flask, and will create a username and password here
Superset initialization
(superset) [song@hadoop102 ~]$ superset init
2.2.3. Start Supterset
2.2.3.1. Install gunicorn
(superset) [song@hadoop102 ~]$ pip install gunicorn -i https://pypi.douban.com/simple/
gunicorn is a Python Web Server, which can be compared with TomCat in java.
2.2.3.2, start Superset
- Make sure the current conda environment is superset
- start up
(superset) [song@hadoop102 ~]$ gunicorn --workers 5 --timeout 120 --bind hadoop102:8787 "superset.app:create_app()" --daemon
-
Log in to Superset
to access http://hadoop102:8787, and log in with the previously created administrator account.
-
stop superset
Stop the gunicorn process.
(superset) [song@hadoop102 ~]$ ps -ef | awk '/superset/ && !/awk/{print $2}' | xargs kill -9
Exit the superset environment.
(superset) [song@hadoop102 ~]$ conda deactivate
2.2.3.3. Write Superset start and stop scripts
- write content
#!/bin/bash
superset_status(){
result=`ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | wc -l`
if [[ $result -eq 0 ]]; then
return 0
else
return 1
fi
}
superset_start(){
source ~/.bashrc
superset_status >/dev/null 2>&1
if [[ $? -eq 0 ]]; then
conda activate superset ; gunicorn --workers 5 --timeout 120 --bind hadoop102:8787 --daemon 'superset.app:create_app()'
else
echo "superset正在运行"
fi
}
superset_stop(){
superset_status >/dev/null 2>&1
if [[ $? -eq 0 ]]; then
echo "superset未在运行"
else
ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | xargs kill -9
fi
}
case $1 in
start )
echo "启动Superset"
superset_start
;;
stop )
echo "停止Superset"
superset_stop
;;
restart )
echo "重启Superset"
superset_stop
superset_start
;;
status )
superset_status >/dev/null 2>&1
if [[ $? -eq 0 ]]; then
echo "superset未在运行"
else
echo "superset正在运行"
fi
esac
- Add permissions
chmod +x superset.sh
3. Use of Superset
3.1. Connect to MySQL data source
3.1.1. Installation dependencies
(superset) [atguigu@hadoop102 ~]$ conda install mysqlclient
Note: To connect to different data sources, different dependencies need to be installed. The following address is the official website description.
https://superset.apache.org/docs/databases/installing-database-drivers
3.1.2. Restart Superset
(superset) [atguigu@hadoop102 ~]$ superset.sh restart
3.1.3. Data source configuration
3.1.3.1, Database configuration
- Click Data/Databases.
- Click + DATABASE
- Click to fill in Database and SQL Alchemy URI
Note: SQL Alchemy URI writing specification: mysql://username:password@hostname:port number/database name.
Fill in here:
mysql://root:000000@hadoop102:3306/gmall_report?charset=utf8
- test connection
- click save
3.1.3.2, Table configuration
-
Click Data/Datasets
-
Click Data/Datasets
3.2. Make dashboard
3.2.1. Create a blank dashboard
-
Click Dashboards/+DASHBOARDS
-
name and save
3.2.2. Create a chart
-
Click Charts/+CHART
-
Choose what type of chart to use and create a chart
-
Follow the instructions to configure the chart
-
Click "Run Query"
-
If the configuration is correct, the following icons will appear
-
Name the chart and save it to the dashboard
3.2.3. Edit dashboard
-
Open the dashboard and click the edit button
-
Adjust the chart size and chart layout
-
Click the arrow in the figure below to adjust the automatic refresh time of the dashboard