[Big Data] Visual Dashboard - Installation and Use of Superset

Written in the front: The blogger is a "little mountain pig" who has devoted himself to the training business after actual development and experience. His nickname is taken from "Peng Peng" in the cartoon "The Lion King". He always treats the surrounding people with an optimistic and positive attitude. thing. My technical route has gone all the way from a Java full-stack engineer to the field of big data development and data mining, and now I have achieved some success. I would like to share with you what I have gained in the past. I hope it will be helpful to you on the way of learning. At the same time, the blogger also wants to build a complete technical library through this attempt. Any exceptions, errors, and precautions related to the technical points of the article will be listed at the end. You are welcome to provide materials in various ways.

  • Please criticize and point out any mistakes in the article, and correct them in time.
  • If you have any questions you want to discuss and learn, please contact me: [email protected].
  • The style of publishing articles varies from column to column, and they are all self-contained. Please correct me if there are any deficiencies.

Visual dashboard - installation and use of Superset

Keywords in this article: superset, visualization, Ubuntu, installation

1. Introduction to Superset

Apache Superset is a modern, enterprise-grade data exploration and visualization platform designed to help data engineers and scientists create and share various types of data insights on a web interface.

1. Function of the software

The bottom layer of Apache Superset is a Flask application, and its core functions include data visualization, dashboard production, data slicing and dicing, and SQL Lab. In Superset's application structure, the Flask application handles routing, view functions, and template rendering, while SQLAlchemy provides abstract access to various databases.
Apache Superset supports multiple data sources and can be connected to any SQL-speaking database or data engine (such as MySQL, Postgres, BigQuery, Redshift, etc.), and also supports various big data components such as Hive, Presto, Druid, etc., just need to install Some components are fine.

2. Software features

  • It has a rich data visualization component library and provides a variety of chart types to meet various data display needs
  • Use SQL Lab to directly execute SQL queries, which is convenient and fast
  • Mobile-friendly with responsive design
  • With powerful data rights management function, it can finely control the data access rights of each user

2. Superset installation

1. Pre-environment

The software needs to run on Python 3.6 and above. It is recommended to use a virtual environment. The official installation steps are: https://superset.apache.org/docs/installation/installing-superset-from-scratch/ .

  • Virtualenv installation: pip install virtualenv
  • Create a virtual environment: python3 -m venv superset
  • Activate the virtual environment: .superset/bin/activate
  • Pre-environment installation

Before starting the installation, you need to ensure that the system environment and python virtual environment have installed the following:

sudo apt-get update
sudo apt-get install build-essential
sudo apt-get install python3-dev

pip install wheel

2. Installation configuration

  • Install superset: pip install apache-superset

  • Preset
# 推荐添加到环境变量配置文件中
export FLASK_APP=superset
superset fab create-admin


There will be a warning at this point, I can follow the steps below to solve it:

touch superset_config.py

# 推荐添加到环境变量配置文件中
export SUPERSET_CONFIG_PATH=/home/hadoop/superset/superset_config.py
superset fab create-admin

Configure the path of SUPERSET_CONFIG_PATH , point to the config file just created, and start it again after completion.

  • Error resolution

The author reported the following error in the process of executing the command:

This is caused by the incompatibility of sqlparse , 0.4.4 is installed by default, and the version can be confirmed by the following command:

pip show sqlparse

At this time, it needs to be downgraded to 0.4.3. This is because the minimum version interval of the superset I installed is 0.4.3. If you choose another version, please handle it according to the actual situation:

pip uninstall sqlparse
pip install sqlparse==0.4.3

3. Initiate access

After the previous initialization steps are completed, some sample data can be imported and then started. Before performing all operations, initialization is required.

  • Import sample data
# 初始化命令
superset db upgrade
# 加载数据,耗时较长
superset load_examples
  • superset build

First, you need to download the source code of the front-end project, and then make sure that the system has installed the Node environment. The version currently used by the editor requires node 16.9.1 or above, npm 7.5.4 || 8.1.2 or above, here is an example of installing node 16.x.

git clone https://github.com/apache/superset.git


Strong reminder : Please ensure that the major versions of NodeJS are consistent, otherwise you need to find ways to solve various construction problems by yourself. If you encounter an RpcIpcMessagePortClosedError error, it is usually caused by insufficient memory. Please try to increase the memory.

# 安装构建所需环境
sudo apt  install curl
curl -fsSL https://deb.nodesource.com/setup_16.x | sudo -E bash -
sudo apt-get install -y nodejs
sudo npm install -g npm@latest
sudo npm install -g node-gyp
# ARM架构需要手动安装chromium-browser
sudo apt install chromium-browser
# 构建前端项目
npm ci
# 预先解决一些构建中遇到的问题
npx update-browserslist-db@latest
# 该步骤耗时较长 - 需要保证可用内存在4GB以上
npm run build
  • superset start
# 切换到superset-frontend的上一级目录
cd ..
superset run -p 8088 --with-threads --reload --debugger

  • superset access

After startup, just access port 8088 in the browser, and log in with the password set in the previous initialization:

You can see that the previously imported samples have been displayed:

3. Data source configuration

In the upper right corner of the operation interface, multiple ways to add data sources are supported. This article will introduce the database connection method.

1. PostgreSQL

  • Dependency installation

When connecting to PostgreSQL, you need to install related dependencies before starting the project. After activating the superset virtual environment, execute the following command:

pip install psycopg2-binary
  • connection configuration

In the configuration interface, two direct import methods, PostgreSQL and SQLite, are supported by default:

select PostgreSQL to enter the configuration interface:

after the connection is successful, you can start to create a DATASET , or you can also use: after

clicking FINISH , click the ➕ in the upper right corner again, and then the Data menu Create dataset appears under .

2. MySQL

After adding a database connection, if you want to add another data source again, you can follow the steps below:

Then you can see the button to add DATABASE again in the operation interface, as shown in the figure:

  • Dependency installation

When connecting to MySQL, you need to install related dependencies before starting the project. After activating the superset virtual environment, execute the following command:

sudo apt-get install libmysqlclient-dev
pip install mysqlclient
  • connection configuration

In the configuration interface, select Other, and configure directly through the connection string:

the connection string is in the format of SQLAlchemy URI -> mysql://username:password@hostname:port/database

3. Hive

  • Dependency installation

When connecting to Hive, you need to install related dependencies before starting the project. After activating the superset virtual environment, execute the following command:

pip install PyMySQL
pip install pyhive
pip install thrift
sudo apt-get install python-dev libsasl2-dev
pip install sasl
pip install thrift_sasl

Before connecting, make sure that Hive-related services have been started. For specific steps, please refer to: Hive 3.x Installation and Deployment - Ubuntu

  • connection configuration

In the configuration interface, select Other, and configure directly through the connection string:

the connection string is in the format of SQLAlchemy URI -> hive://username:password@hostname:port/database

After the connection test passes, a message may appear when you click the CONNECT button An exception that cannot be connected, but after the actual test, the editor found that it did not have any impact. At this point, the connection has been successfully created, we just need to close the pop-up window, and then refresh the page, and the subsequent use will be normal.

4. Other instructions

When we continue to add dependencies required for various connections to the superset virtual environment and create corresponding types of connections, the operation interface will become more and more abundant: when the type of data source we need is basically stable, we

can Hang the superset process to run in the background, so that we can focus on the visualization work:

# 进入到对应目录后执行
nohup superset run -p 8088 --with-threads --reload --debugger &

Scan the QR code below, join the official CSDN fan WeChat group, you can communicate with me directly, and there are more benefits~
insert image description here

Guess you like

Origin blog.csdn.net/u012039040/article/details/131256295
Recommended