From 0 to 1, introduce the open source big data service platform dataService

1. Background & current situation

I have also worked in the field of big data for many years. Regardless of whether I have worked in a large company or a small company, the statistical data often needs to be queried and displayed, for example: for large screens or reports or to provide data sources for some online services. You will need to write a set of interface services in code, and you need to carry out a set of processes such as development-test-online. The development efficiency is very low, which leads to the development of a service that takes close to 0.5 days or 1 day.

In fact, many large manufacturers have also built this platform, and even sold it separately ((such as the data service in Ali datawork, Netease Mammoth EasyDS)), but the charges are really not low, so I want to develop such a platform based on my work experience. A set of products, because the previous open source data comparison platform was named dataCompare, so this platform is named dataService, namely: data service

It mainly solves the following problems:

(1) Developing an api interface service requires a set of processes such as development-test-online, which usually takes at least 0.5 days to 1 day

(2) In order to meet different data volume requirements, different data storage is selected, which leads to the diversity of data storage (such as: Mysql, Oracle, Hbase, Doris, etc.), so the development code for different storage is inconsistent

(3) The interface service is not standardized, and different developers have inconsistent development of the interface

(4) There is no way to reuse data and interfaces. Different businesses choose the same data table and establish their own interface services, but there are interface and data redundancy problems.

(5) It is not clear which data is accessed by which applications, resulting in the suspension of data processing tasks and it is not clear which businesses will be affected

2. Goal

In order to solve the above problems, an open source big data service platform - dataService was developed


(1) The development, testing and launch of api services can be done by writing sql low code or interface interaction and checking. At the same time, code adaptation development for different data storage is avoided, and the development efficiency is increased by at least 50%.

(2) Interface standardization: Realize interface standardization through the data service platform to avoid inconsistent development habits of different developers resulting in inconsistent standards

(3) Build an api market to realize the reusability of the interface and avoid data and interface redundancy problems

(4) Open up the whole link of data processing and service through data lineage and interface lineage

3. Introduction to the core functions of the system

Currently dataService has completed the following functions:

(1) The development and testing of api services can be realized by simple configuration or writing sql and other low-code methods

(2) Data sources such as Mysql, Doris, and Hive are currently supported

The overall process is as follows:

(1) Process 1: Create a new data source - create an API service

Process one

(2) Process 2: Create a data source connection according to the data source type

Process two

(3) Process 3: Create a new API service - configure sql - test API service - go online

SELECT 
  name, 
  addr as address, 
  sum(num) as total_num 
FROM 
  table_name 
WHERE 
  user_id = ${uid}; 
(1) SELECT查询的字段即为API返回参数 
(2) 如果定义了字段别名,则返回参数名称为字段别名 
(3) 支持SQL函数 
(4) WHERE条件中的参数为API请求参数,参数格式为${参数名}

 (4) Process 4: Publish the API service to the API market for business parties to call

4. System architecture design

The data service platform can solve the unification of data services and facilitate the governance of data services and the unification of indicators. It can improve business development efficiency and face business changes faster. The data service platform is mainly divided into the following three layers:

(1) Data application access layer : mainly for external application access, including: HTTP service, RPC service, Client service

(2) Data service parsing layer : access various data stores mainly through SQL, and then generate corresponding data services. Core functions: SQL parsing, SQL verification, SQL routing, data query

(3) Data storage layer : mainly includes data storage management, MySQL, Redis, Doris, Hive, etc. Can be well supported and provide API services

5. System function demonstration

home page

 add data source

 Data source management

New api service

 api service test

6. Subsequent planning

(1) Automatic push and acquisition of data in the pull mode of the online api service

(2) API service current limiting and monitoring

(3) Service lineage detection and service arrangement

7. The core code is open source

https://github.com/zhugezifang/dataService

https://gitee.com/ZhuGeZiFang/dataService

Guess you like

Origin blog.csdn.net/weixin_43291055/article/details/128782380