DataSphere Studio release, one-stop data management portal application development

"  DataSphere Studio (referred to as DSS) is a public bank micro-stop data from the research application development management portal based on pluggable integration framework design, and computing middleware Linkis, can easily access a variety of top Web system that allows data development becomes simple and easy to use. "

Open source address (GitHub https://github.com/WeBankFinTech/DataSphereStudi

gitee https://gitee.com/WeBank/DataSphereStudio

01

What DSS that?

DataSphere Studio (referred to as DSS) positioning data application development portal, closed-loop data covering the whole process of application development. The UI in a unified, workflow-style graphical development experience drag and drop, to meet the data import, desensitization cleaning, mining analysis, quality inspection, visual display, to the data output timing of scheduling applications, data applications of the whole process scenes demand.

DSS through plug-in integration framework designed to allow users to easily develop customized DSS, simple and fast integration of various Web systems, in a single page, you can meet all the needs of business users.

If necessary, the user can easily and quickly replace DSS has integrated various functional components, or add functional components.

Linkis calculated by means of the intermediate connector, and the ability to simplify reuse, the DSS will inherently have a high financial level concurrency, high availability, multi-tenant isolation and resource control and the like performed scheduling capabilities.

02

Why do you need DSS?

With the extensive application of large data technology, is now developing data applications, has been far longer is the processing and production of several reports.

How fast data services and interact, how data quickly and efficiently generate reports to help business decision, almost all of the company's core demands.

However, the reality is that business users face many feature-rich data applications, often at a loss, I do not know how to choose.

 

The following six pain points, almost all enterprises are faced with a very difficult problem:

  1. Data applications and more, there is no unified user entry, the user experience fragmented sense.

  2. A business process involving multiple cooperating systems, the user must frequently switch to achieve business systems available.

  3. Many data applications boundary is not clear, overlapping of functions not only a great waste of manpower, it is difficult to collaborative exchange between systems, users need to spend more time repeated research compare to finalize the program.

  4. Cross-sectoral cross-business data-dependent, thanks to the oral agreement ready time, if the upstream data delay, downstream will have a ripple effect, causing data disaster.

  5. Data and information sharing between systems, we need to develop adaptation twenty-two, call complex and high coupling.

  6. Do not have a unified integration framework, system integration between various development needs adaptation.

03

The core concept of DSS

DSS proposed five core concepts, focus on solving the pain points of the six issues mentioned above.

 

1. stop

One-stop, DSS is a first step towards improving explore business users to actively participate in the development of data.

DSS by providing a one-stop data management application development interface, so that users no longer need to explore making inquiries in order to confirm whether there are tools to meet the needs of all of the data can be found on the DSS development component is completed.

DSS high degree of integration, the latest version of the open source systems have been integrated:

  1. Data Development to explore Scriptis

  2. Data visualization Visualis (based on the secondary development CreditEase Davinci)

  3. Data quality Qualitis

  4. Scheduling system Azkaban

DSS mode of a plug-in frame design, allows users to quickly replace the individual Web DSS integrated system . Such as: The Scriptis replace Zeppelin, replaced Azkaban into DolphinScheduler.

DSS-stop data entry applications development, allowing users to develop a search when there is a demand DSS, DSS good habit to explore other functional components when not need.

2. All connections

In the DSS Workflow drag and drop editing page, DSS has integrated all data applications, all will be in the form of workflow node appears a node corresponds to a system feature that allows functional boundaries clear and bright, users do not need to do multiple-choice questions.

DSS workflow node, supports embedded data applications already integrated front-end interface, allowing users in a workflow page, you can edit to modify all business functions.

DSS workflow, allowing users to connect multiple business perspective of business functions together, organized to support real-time execution and timing scheduling workflow, simply drag and drop to complete the whole process of the development of data applications.

在微众银行,通过DSS的工作流,业务数据应用的迭代周期从原来的1周,降低到1天,效率提升了600%倍。

DSS工作流,做到让用户可以简单快速地实现业务,同时帮助用户更好地理解业务。

 

3.插拔式

插拔式是DSS作为数据应用集成框架的最大特色。

DSS就像是插槽,插拔式的设计方式,几乎不侵入原有的外部系统,且只需做简单适配,即可快速的集成进来。

DSS通过插拔式集成架设计,让用户可以轻松定制化开发DSS,简单快速集成各种Web系统,在一个统一的页面,便可满足用户的所有业务需求

通过插拔式,让WeDataSphere的各个功能组件既能相互独立、系统边界清晰明确,又能有机融合在一起,共同构成WeDataSphere一站式、全连通的大数据使用体验。

 

4.上下文

何谓上下文?

保持某种操作继续进行的所有必需信息。如:同时看三本书,每本书已翻看的页码就是继续看这本书的上下文。

DSS上下文,解决DSS工作流跨多个系统节点间的数据和信息共享问题。

例如,B系统需要使用A系统产生的一份数据,通常的做法如下:

  1. B系统调用A系统开发的数据访问接口

  2. B系统读取A系统写入某个共享存储的数据

DSS借助Linkis计算中间件实现的WorkflowContext,允许接入的外部系统作为分享节点或读取节点,将节点信息和节点数据共享给其他外部系统节点,无需外部系统两两开发适配,降低系统间调用复杂度和耦合度。

借助于DSS上下文,微众银行WeDataSphere实现彻底解耦,各功能组件的复杂度至少降低了30%。

 

5.信号化

跨部门跨业务的数据依赖问题,一向是业界公认的老大难。

例如:部门B的数据集市依赖于部门A DWD(Data Warehouse Detail数据明细层)的部分数据,

如何确保部门A的数据处理完成后,部门B才正式开始数据处理?

通常的做法是:双方约定一个时间窗口,部门A保证数据已准备就绪。

中间的空窗期,不仅极大降低了数据处理时效性,且一旦部门A数据处理延迟,下游将引发灾难。

DSS作为数据应用开发门户,提出了一套基于信号的数据依赖解决方案。

接入到DSS的数据应用系统,只需在其前面加上一个信息节点,即可实现跨业务、多系统间的数据依赖协作执行问题。

微众银行通过DSS信号化,让各业务跨多系统的数据依赖,变得简单、清晰又高效,平均提速了业务30%的数据产出,数据延迟率降低了90%。

04

DSS核心设计理念

AppJoint,DSS插拔式架构的插口,是DSS构建一站式、全连通、插拔式、上下文的基石。

AppJoint,是DSS可以简单快速集成各种上层Web系统的核心概念。

 

什么是AppJoint?

AppJoint——应用关节,基于Linkis计算中间件构建,定义了一套统一且规范的前后台接入规范,

让外部数据应用系统可简单快速地接入到DSS。

AppJoint的四大规范,让DSS的数据应用系统接入,变得清晰又便捷。

Security规范和Project规范,是实现一站式的核心抽象。

  1. Security规范,打通DSS与外部系统前后台的登录跨域问题。

  2. Project规范,打通DSS与外部系统的组织结构、权限体系,是实现DSS协同开发的通用标准。

NodeService规范和NodeExecution规范,是实现全连通的核心基石。

  1. NodeService规范,打通DSS工作流节点与外部系统的关联互通。

  2. NodeExecution规范,实现DSS工作流节点与外部系统的任务交互执行。

AppJoint还引入了Linkis计算中间件,让接入的外部数据应用系统,可快速具备Linkis的并发限流、用户资源打通等能力。

且基于Linkis实现的WorkflowContext,允许上下文信息跨系统节点级共享,彻底告别应用孤岛。

05

DSS已集成的数据应用组件

DSS通过实现多个AppJoint,已集成了丰富多样的各种上层Web应用系统,基本可满足用户的数据开发需求。

用户如果有需要,也可以轻松集成新的Web应用系统,以替换或丰富DSS的数据应用开发流程。

1、数据开发——Scriptis

什么是Scriptis?

Scriptis是一款支持在线写SQL、Pyspark、HiveQL等脚本,提交给Linkis执行的数据分析Web工具,且支持UDF、函数、资源管控和智能诊断等企业级特性。

Scriptis AppJoint为DSS集成了Scriptis的数据开发能力,并允许Scriptis的各种脚本类型,作为DSS工作流的节点,参与到应用开发的流程中。

目前已支持HiveSQL、SparkSQL、Pyspark、Scala等脚本节点类型。

 

2、数据可视化——Visualis

什么是Visualis?

Visualis是一个数据可视化的BI工具,基于宜信开源组件Davinci二次定制化开发而成。

Visualis AppJoint为DSS集成了Visualis的数据可视化能力,并允许数据大屏和仪表盘,作为DSS工作流的节点,与上游的数据集市关联起来。

 

3、DSS的调度能力——Azkaban

用户的很多数据应用,通常希望具备周期性的调度能力。

目前市面上已有的开源调度系统,与上层的其他数据应用系统整合度低,且难以融通。

DSS通过实现Azkaban AppJoint,允许用户将一个编排好的工作流,一键发布到Azkaban中进行定时调度。

DSS还为调度系统定义了一套标准且通用的Linkis工作流解析发布规范,让其他调度系统可以轻松与DSS实现低成本对接。

 

4、数据质量——Qualitis

Qualitis AppJoint 为DSS集成数据质量校验能力,将数据质量系统集成到DSS工作流开发中,对数据完整性、正确性等进行校验。

5、数据发送——Sender

Sender AppJoint为DSS集成数据发送能力,目前支持SendEmail节点类型,所有其他节点的结果集,都可以通过邮件发送。

例如:SendEmail节点可直接将Display数据大屏作为邮件发送出来。

6、数据信号——Signal

Signal AppJoint用于强化业务与流程之间的解耦和相互关联。

DataChecker节点:检查库表分区是否存在。

EventSender: 跨工作流和工程的消息发送节点。

EventReceiver: 跨工作流和工程的消息接收节点。

7、功能节点

空节点和子工作流节点。

8、节点扩展

根据需要,用户可以简单快速替换DSS已集成的各种功能组件,或新增功能组件。

06

总结

DSS作为数据应用开发门户,致力于提供一套通用的数据应用接入和开发的标准,让业务用户具备参与到数据应用开发的能力和可能。

由于篇幅所限,本文不再详细论述DSS的架构设计和实现。

期待更多的社区力量,一起推动DSS + Linkis生态圈的成长。

Guess you like

Origin www.oschina.net/news/111781/dataspherestudio-released