Open Source Solutions for Data Visualization: Superset vs Redash vs Metabase

Humans are visual animals. To tell a story with data, charts are essential. If you often see a data analysis colleague, execute a query in the SQL client, copy/paste the result into Excel and make a chart, it means that your company lacks a reliable data visualization platform. Data visualization is the core function of Business Intelligence (BI for short), and there are many mature commercial solutions, such as the old Tableau, Qilk, the new generation of Looker, the domestic FineBI and so on. However, for many small companies, the license fee for these services is a lot of overhead, and there is a feeling of "killing a chicken with a knife". So today, when open source software is so developed, what are the reliable solutions to choose from in terms of data visualization? Today I will introduce three well-known projects, namely Superset, Redash and Metabase. The first two I have actually used in production environments and will be highlighted in this article. I just tried Metabase, but I think it is a very thoughtful project, so I will also share with you my thoughts on it.

Choosing a tool that fits my needs in terms of functionality is definitely a priority. Let's start with the functional requirements, our data warehouse uses Amazon Redshift (if you haven't heard of Redshift, think of it as PostgreSQL optimized for big data), so most of the practical use cases are to Visualization of the results of an SQL query. The types of charts we need are those that are commonly used, including line charts, column charts, pie charts, etc. After having the chart, the next step is to format the relevant chart and generate the report page (Dashboard). From a data security perspective, I don't want every employee to have free access to all Dashboards, so each Dashboard needs to set a different access level. In addition, I will pay attention to whether it has a REST API, and whether it can create and manage reports through the API, which we will talk about in a future article.

In addition to meeting functional requirements, ease of use and documentation are also very important when judging a tool. Who doesn't want a product that's easy to use and well documented?

Let's take a look at the actual performance of these three open source projects in terms of functionality, ease of use, and documentation.

Superset

Superset Demo

Superset was originally open sourced by Airbnb's data team, and has now entered Apache Incubator, which is considered a star-level open source project. To be honest, I was also attracted to the pit by the two golden signs of Airbnb and Apache. At present, most of the company's reports are on Superset, and there are 50 Dashboards, including nearly 900 charts. Before using Superset, we used Looker (a very good commercial BI tool, but it is too expensive). We migrated all Dashboards on Looker to Superset a year and a half ago, and the whole process went very smoothly. It has been used for more than a year. Although I am not satisfied in many small places, in general, Superset has well met the company's current needs in data visualization and business reporting.

After you connect a database to Superset, you define each table you want to use. The definition of a table in Superset not only includes fields, but also needs to define metrics. An indicator is a certain statistical result of a field, such as the sum, average, maximum, minimum, etc. of the values ​​on the field. Are you a little confused? But remember, BI tools are usually used to do business analysis. Imagine an e-commerce database. Although we store the transaction amount of each order in the data table, we do not care about a single transaction in business analysis. What we care about may be the total transaction amount in a period of time, or the average transaction. Forehead. When you draw a monthly statement of transactions, you don't draw each transaction on the graph, but instead represent the total daily transaction value as a bar on the graph. That's why Superset introduces the concept of "indicators".

对于数据分析人员来说,由于在Superset上他们不是直接写SQL,而是通过选择指标(Metric), 分组条件(Group)和过滤条件(Filter)来画图表,所以在构建复杂查询时可能会有些不适应。另一个难题是Superset里的表不支持join,如果一个图表里的数据要从多个数据表里取,那只能通过建视图来实现。Superset在0.11版本之后加入SQL Lab功能,支持从SQL查询结果直接生成图表。可惜,由于这个功能与Superset的核心设计格格不入,所以实现得比较差,没什么实用价值。

客观地讲,Superset里引入自己的表与指标的概念,在逻辑上是合理的,在统一各种异型的数据源时也是必要的。但实际操作中仍会让人觉得有些麻烦,不够直接了当。

Superset在可视化方面做得很出色,不但是开源领域中的佼佼者,也把很多商用BI工具甩在身后。在0.20版本中支持的图表类型已经达到了36种,而且在选择图表类型时,你可以看到每一种图表的缩略图,下面这张截图大家可以感受一下 
Superset Chart Types

Superset的另一个亮点是可以在多个时间维度上观察,因为商业分析中的很多问题都是与时间密切相关的。Superset有4种专门针对时间序列的图表,使用这些图表时,你需要指定一个字段为时间维度,之后就可以对时间维度做丰富的操作

  • 从不同时间粒度去查看你关心的指标(小时/日/周/月/季度/年)
  • 对时间序列做rolling average,比如看一个指标的7日平均线
  • 可以对时间序列做偏移,再做对比,比如把本周的销售业绩与上周同期放在一张图表中对比
  • 不在图表上显示指标的绝对值,而是显示它随着时间变化的增长速度

以上这些都是在数据分析中非常实用的功能。

说完优点,再说说Superset的槽点,最大的槽点是当图表与报表多了以后,管理不方便。这个问题其实很好解决,只要在图表和报表管理时,加上分组或是文件夹的概念就可以了,但至今未见类似的功能。现在公司900多个图表都在一个大列表下,虽然Superset支持Search, Filter或是Favorite,但查找起来还是太麻烦。

Superset的文档也比较糟糕,虽然在安装与快速入门方面提供了很完整的文档,但在具体功能的介绍方面文档严重缺失。就算有些功能有文档,文档的结构也很混乱,所以大部分功能只能自己去尝试,好在这个工具本身并不难用,自己去摸索各个功能也不太困难。

Redash

Redash Demo

如果说Superset是构建一个BI平台,那Redash目标就是更纯粹地做好数据查询结果的可视化。Redash支持很多种数据源,除了最常用的SQL数据库,也支持MongoDB, Elasticsearch, Google Spreadsheet甚至是一个JSON文件。Redash的官方文档里列出了它所支持的所有数据源

它不需要像Superset那样在创建图表前先定义表和指标,而是可以非常直观地将一个SQL查询的结果可视化,这使得它上手很简易。或者说Redash仅仅实现了Superset中SQL Lab的功能,但却把这个功能做到了极致。

Redash有两个非常实用的功能,Query Snippet与Query Parameters。

Query Snippet很好地解决了查询片段的复用问题。做数据报表时经常要用到十分复杂的SQL语句,这些语句是肯定有一些片段是可以在多个Query中复用的。在Redash中我们可以将这些片段定义成Snippet,之后方便地复用。

Query Parameters可以为查询添加可定制参数,让这个图表变得更灵活。比如一个App的日活指标,我可能有时要按iOS/Android切分,有时要按地域切分,或是按新老用户切分。在Superset的Dashboard上我要做三个表图。Redash里我可以把Query的groupby做为一个参数,这样就可以在一张图上搞定。用的时候,运营人员可以图表上方的一个下拉框里选择切分的方式,非常直观好用。

Parameterized Query

Redash的Dashboard可以通过命名来进行分组,Dashboard的名字可以有一个前缀并以冒号结尾,前缀相同的Dashboard就会自动被分为一组。例如“Growth: Daily”,“Growth: Weekly”这两个Dashboard都会被分到“Growth”组下。

相比Superset,Redash在文档方面做得更好,除了快速入门教程以外,每一个功能模块都有文档且条理清晰。

当然Redash也有自己的不足之处,它的可视化种类比Superset逊色不少(不过其实也够用了)。另外,由于它只是纯粹地把数据查询结果可视化,所以也没有Superset里那些对时间维度上的聚合与对比的操作。

Metabase

Metabase Demo

由于我并没有在生产环境下使用过Metabase,只在自己本本上试用过这个工具。所以我只能说一下对它的第一印象。

刚开始用的就觉得这个工具的界面好漂亮,明显是经过UI设计师仔细调校过的。相对的,Superset与Redash一看就是程序员充当设计师的产物。

After using it for a while, I feel that although Metabase and Superset both want to build a complete BI platform, they have different concepts. Metabase pays great attention to the experience of non-technical people (eg product managers, market operators) when using this tool, giving them the freedom to explore data and answer their own questions. In Superset or Redash, non-technical personnel can basically only read the pre-built Dashboard, and it is difficult for them to explore by themselves if they do not understand SQL or database structure. I really like the idea of ​​Metabase, which is closer to a full-fledged commercial product. Of course, it is very challenging to turn this concept into reality. At present, I don't know if Metabase is as good as imagined in the face of complex real business environment.

It is also worth mentioning that the documentation of Metabase is also the best and most complete among the three projects, and the content is very rich.

If there is a chance in the future, I would love to experience this product more deeply.

summary

This article briefly introduces three open source data visualization tools, Superset, Redash and Metabase. All three have their own strengths. I don’t think there is an absolute strongest. For companies that are just starting to build BI platforms, I believe they can meet most reporting and business analysis needs.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326161265&siteId=291194637