Superset vs Redash vs Metabase

attention and activity

Looking at the stars of a project on Github is the fastest way to judge the maturity of a project. So other than the number of stars, what else is important on the project's Github page? Here I suggest everyone to take a look at the Insights of the project. First, let's look  at the activity of Superset in the last month

This picture tells us the following information

  • The project has had 53 commits in the last month, indicating that the project is still under active development. The figure shows that the project has added 210,000 lines of code in the last month, mainly because a huge geographic data file was submitted. After removing this file, the actual number of new lines of code is about 2,000 lines.
  • Judging from the newly added and processed Issues and PRs, the project community is very active, and the project developers are also actively solving problems. From the short-term indicators, the activity of the project is healthy, and the developers are actively interacting with the community.

Next, let's look at the long-term metrics of the project,  commit history and distribution of contributors

The development of the project has been smooth, and the number of submissions in the last year has remained at a stable level compared to the previous year. But a potential risk of Superset is that almost all the commits in the past six months have come from the same developer. Not a good thing for a project of this scale. First of all, with the increase of attention, the output of a single person may not be able to meet the various needs of the community, and the iteration speed of the project will be affected accordingly. Secondly, the project may be greatly affected by the main developer's personal factors, such as his waning enthusiasm for the project, or his inability to continue to invest due to work reasons, etc. So the ideal situation is to have two or more main developers.

We can analyze the two projects Redash and Metabase in a similar way. The overall situation of the three projects is in the table below

It can be found from this that although Superset's stars on Github are far ahead of the other two projects, it lags behind in terms of iteration speed and number of developers.

Redash has had only one main developer for the previous three years, but another main developer joined after 2017.10. Overall, Superset and Redash are still solo shows, only Metabase has a team behind it. In terms of product completion and update speed, Metabase is also the best among the three projects.

Technology Architecture

The technical architecture mentioned here includes the language used for development, the core framework, and the third-party components used. At present, when I look at the technical architecture of open source projects, I mainly examine the following aspects

  1. The project uses a technology stack, is your team familiar with it?
  2. Will the current technical architecture hinder the future development of the project?
  3. Is the project difficult to deploy in a production environment?
  4. Is there a complete docking interface?

The importance of 1), 2), 3) is obvious, here is a supplementary explanation for 4). The external interface here generally refers to the RESTful API. Why is this important? First, if a project has a great RESTful API, it's easy to interface with other systems. And you can do a lot of secondary development without changing the source code. Secondly, although the operation on the interface is very intuitive, when a lot of repetitive work is required, it is more efficient to write a script to call the API to complete the operation. For example, many of the company's reports are similar, and using API to create these reports can avoid repetitive operations on the interface and reduce errors.

Superset's technical architecture

Next, we will make a technical analysis of Superset from the above four aspects.

The backend of Superset is developed in Python, and the main open source components used include:

- Flask App Builder(简称 FAB) - SQLAlchemy

My current team's main language is Python, so that's a plus. SQLAlchemy is a very mature database ORM solution, and there is nothing wrong with it. But the problem is with the FAB. Note, don't confuse this open source component with Flask. FAB is an application development framework built on top of Flask. It can automatically generate front-end interface for adding, deleting, checking, and modifying according to the table structure of the database, similar in function to Django Admin.

FAB may save some time to write front-end code for the development of Superset in the early stage, but in the medium and long term, it seriously limits the flexibility of the Superset interface. In the last article, I complained about the inconvenient management of Dasbhoard in Superset and the complicated permission system, which is actually subject to FAB. In addition, FAB itself is in a semi-dead state. From the records on Github, it has not been updated since 2016.

In my opinion, the developers of Superset were ill-considered when choosing FAB as the core framework. When choosing a frame, I feel that I should choose a set of hand-sized tools for myself, not a semi-finished product. Good tools can improve your development efficiency from start to finish. While the semi-finished product will allow you to quickly build an MVP in the early stage, it will definitely stand in your way in the long run. FAB falls into the latter category. If you are doing simple management systems or developing prototypes, FAB is a good choice. But Superset aims to be an excellent open source business analytics platform, and FAB is destined to be a stumbling block.

On the front end, Superset uses FAB to generate most of the management interface, while the interface that requires high interactivity, such as charts or SQL editors, is implemented by React + Redux. This mixed mode makes the front-end code a bit messy, and in the final analysis, it is the bane of FAB.

Recommendation: Open Source Solutions for Data Visualization: Superset vs Redash vs Metabase (1)

[Humans are visual animals. To tell a story with data, charts are essential. If you often see colleagues doing data analysis, after executing the query in the SQL client, copy/paste the results into Excel, and then make a chart, it means that your company

The deployment of Superset is still very simple. The Web server is a standard WSGI application, and the storage layer supports any SQL database (as long as SQLAlchemy supports it), so it is very convenient for deployment whether it is high availability or horizontal expansion.

As for the API interface, FAB natively supports RESTful API and can perform CRUD operations on most objects. However, the authentication method is not flexible enough, only through cookies, which is not very friendly to scripts or server-side calls, so the first extension we made to Superset is to add the authentication method of API Token.

Redash's technical architecture

The server side of Redash is also written in Python. The web framework is based on Flask and makes full use of Flask's plugin ecosystem, mainly using the following components

- API Framework: Flask-RESTful - Database: Flask-SQLAlchemy - Authentication: Flask-Login

Personally, I think Redash's choice is more sensible than Superset. It uses typical tool-type components, which will not limit the future development of the project. And the three open source components listed above are very mature projects and are widely used in the Python community.

The front end of Redash is a pure single-page application, implemented with AngularJS (1.5), with a clear structure and clean code. But as we all know, AngularJS made huge architectural changes after v2, so AngularJS v1 is a bit awkward. This is similar to the current situation with Python 2. There will be no problem in the short term, but it will be a hidden danger in the long run.

In terms of deployment, Redash also relies on Redis in addition to the SQL database, but Redis is only used to save query locks (to prevent multiple identical queries from running at the same time), and does not require persistence. In general, deployment of Redash is also relatively simple. In addition, Redash directly provides images on AWS and the docker-compose configuration of the development environment, which is very considerate for both operators and developers.

Redash also provides a complete RESTful API interface, and its front-end single-page application communicates with the back-end through this API, so theoretically anything done on the front-end interface can be done with the API. Its API natively supports the authentication method of API Token.

Overall, Redash does a better job than Superset in terms of technical architecture.

Metabase's technical architecture

The backend of Metabase is written in Clojure, and the frontend is a single page application written in React + Redux.

Since I know almost nothing about Clojure, it's hard to comment on the backend architecture. React + Redux is one of the most popular front-end development frameworks. Metabase's system segmentation and modularization are very good, so I give Metabase full marks in front-end architecture.

Metabase is the only project among the three projects that provides complete API documentation, which enables developers to complete a lot of secondary development with rich APIs and documentation even if they do not know Clojure at all.

In terms of deployment, Metabase provides Jar files, Mac applications, and Docker images that allow users to quickly try the project locally. In production, it provides thorough documentation on how to deploy on AWS, Heroku, Kubernetes, and is thoughtful.

Size and quality of source code

Below are the lines of source and test code for the three projects as of January 21, 2017.

From the source code scale, the scale of Metabase is significantly larger than the other two projects, which shows that Metabase has richer functions. On the other hand, the huge code base will make it more difficult to read the source code and secondary development. The scale of Superset is slightly larger than Redash, which is also in line with the positioning of the two projects.

The quality of source code can be analyzed quantitatively and qualitatively, and the ratio of lines of functional code to test code can be used as an important quantitative indicator. Metabase is far ahead of the other two projects in this regard. Superset and Redash also do well, basically at the same level. In addition, quantitative analysis can also look at the coverage of unit tests and the results of some static code analysis tools, so I won't go into details here.

Qualitative analysis is to make a subjective assessment of the logical order and readability of the code by reading the source code. In this regard, my subjective evaluation is that Metabase has the highest code quality, not only the overall code structure is clear, the module segmentation is reasonable, but also the code cleanliness has reached a rather abnormal level, which looks pleasing to the eye. Redash's code structure is also very clean, it can be ranked second, and Superset is slightly better at the bottom. This result is basically consistent with the conclusion of quantitative analysis.

summary

This article takes the comparison of the three projects of Superset, Redash and Metabase as examples to introduce some principles for the selection of open source projects. At the beginning of this article, I mentioned that if I had to choose another data visualization tool, I would choose Redash. This is mainly because Redash itself has a reasonable technical architecture, and Python also happens to be the language the team is most familiar with.

Of course, I have to mention that the more I know about the Metabase project, the more I feel that there must be a very good team behind it. From the inside to the outside, this product permeates a sense of quality that pursues excellence. I hope it will gain more attention and greater success!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325628279&siteId=291194637