Official dataspace design for metadata-driven architecture

1401e92689cdbc0aa89d87be67841cc9.gif

The Taobao open platform is an important open way for Ali to communicate with the external ecology. Through open product technology, a series of basic services in the Ali economy, such as water, electricity, and coal, are delivered to our merchants, developers, community media, and other cooperation partners. Partners, promote the customization, innovation, and evolution of the industry, and ultimately contribute to the new commercial civilization ecosystem.

Open business scenarios often follow changes in internal business, and frequently change at the data level. Traditional databases cannot meet the needs of ecological mutation scenarios in terms of cost and ease of use. The exploration of data space is to provide a product that can store massive amounts of data, automatically expand the capacity of a single table, expand fields infinitely, and query efficiency not lower than MySQL database on the basis of supporting rapid business growth in ecological scenarios. How to use a unified data architecture to support different users to customize the data model on demand, to ensure that the expansion and change of the data definition level will not affect the availability of its own and other tenants' business functions, and to integrate data and capabilities into the platform itself. To this end, we create an official elastic storage space, depositing data in a safe and controllable situation to support standardized open integration of more business scenarios.

This article is the second article in this series, see the previous article——

Part 1: Evolution of Open Gateway Architecture

a381b73d06160d16d61c30d189e6dcd7.png

Multi-tenant data model 

Let's stand on the shoulders of giants to see what general multi-tenant storage models are available in the industry.

592a181616e53a492d7a86151f1d1702.png

The data space adopts a large and wide table model design similar to Salesforce, and is open to open businesses through SAAS. For the platform, metadata stores the data of all tenants. For tenants, each tenant/organization can only see and define its own version of metadata and data isolated according to its own tenant OrgID, and can only perform actions authorized by its own tenant OrgID, so that each tenant has its own version of the SAAS solution.

▐Metadata   -driven multi-tenant data architecture

The data space isolates each tenant's own version of metadata and data through the business scenario & tenant dimension , and the entire architecture is divided into the following five logical levels.

  1. The underlying data storage layer : stores the underlying data structure of tenants and the object and field definition information of the underlying data, and supports multiple storage types

  2. Storage engine routing layer : according to business scenarios and tenant customization, routing to different storage products for execution

  3. Metadata layer : The metadata model mainly consists of objects , fields , and underlying data tables . The metadata execution engine is the core ability to map the business logic model to the unified metadata storage model. The ORM function we often say is also its core function, but it is much more complicated than that.

  4. Platform interface layer: It provides two calling methods of data space SDK and standard MySQL protocol . The data space SDK includes operations such as object model creation and simplified SQL execution. The standard MySQL protocol provides external database services through the self-built MySQL proxy layer

  5. Business scenario layer: characteristic business scenarios for data space external services

f9683806d689c3db68fc00e462dbf381.png

▐Metadata   storage model

The entire storage of the data space adopts a wide table design that has nothing to do with the business , and the scalability is very good. The underlying data is stored in dynamic columns with additional metadata description information. The data model design consists of the following parts:

  1. Object metadata table : store user-defined and customized objects

  2. Field metadata table : the definition of the fields under the object, including the field type, the location of the field mapping in the real storage, whether indexes are needed, field verification rules, etc.

  3. Data table : store users' real business data, composed of flexible and extensible elastic columns , mapped to real storage through metadata definitions of objects and fields

Unlike Salesforces, there is no index table maintained here, and the maintenance of the index is very complicated. There is a full-column index for ADB, and there is no need to build an index. For relational databases, without using the shared wide table mode, it is more flexible to build indexes independently by DDL.

be23a4d12d20b724db17459d0e574a30.png

Based on this design, flexible column-insensitive DDL of the data space can be realized . When adding a data column, it is only necessary to select a data column of the same value type in the Data table and add it to the logical table field model without explicit DDL.

▐ Metadata Execution Engine

The metadata execution engine consists of five parts: metadata mapping, constructing statements, assembling SQL, executing statements, and anti-metadata mapping. After the user business SQL is executed by the engine, it is mapped to a metadata storage model SQL that has nothing to do with the business, and is routed to the corresponding logical tenant database for execution. After successful execution, the metadata is mapped back to the real object data of the user. Through the metadata execution engine, the unified multi-tenant metadata storage model can carry multi-tenant customized data requirements.

3cf7618e635d32f016e2ca5b9c401f61.png

Below is a simple example.

a9001d4fa85e7aebcd7a517610a78331.png

b555f5f0400f0f4831e0c234fca97425.png

Generic storage protocol support

In order to facilitate the user's usage habits, in addition to providing its own SDK calling method, the data space also needs to adapt to the standard MySQL protocol.

External Interface

illustrate

advantage

shortcoming

Data Space SDK

Brief description of MySQL statement

Data access can be realized through code , suitable for java programming habits

  1. poor scalability

  2. Complex SQL description is difficult

  3. Type conversion is complicated

  4. The versatility is not strong, and it is difficult to connect with industry query engines

MySQL protocol

Provide MySQL proxy to externally adapt to MySQL protocol

Strong versatility, governance and control can be done through database agents

The technical cost is relatively high, and the platform needs to build its own proxy layer, and the stability of the proxy needs to be guaranteed

▐  MySQL Proxy

Under the abstract multi-tenant metadata model, the technical cost and difficulty of protocol layer development are still very high. The data space uses the open source middleware Sharing-Proxy as the MySQL proxy to provide database services to the outside world, hiding the underlying metadata storage model, and users do not need to perceive the underlying data storage. Sharding-Proxy is positioned as a transparent database proxy, which encapsulates the database binary protocol and is used to support heterogeneous languages. Let's first look at its original architecture diagram.

7faa5310bf7d75857f4c71be030fd99c.png

The Frontend layer implements the encoding and decoding of the MySQL protocol. After the original Sharing-Proxy gets the decoded MySQL command, it starts to call the Sharding-Core to perform core functions such as SQL parsing, routing, rewriting, and result merging, and interacts with the real database at the Backend layer.

Since the data space has a set of custom metadata models, which decouples the user logic model and the underlying physical model, the Sharing-Proxy needs to be transformed.

  1. Frontend layer: MySQL protocol layer encoding/decoding remains unchanged

  2. Backend layer: It is not necessary to interact with the real database. According to the storage route, it can be stored in the corresponding physical library

  3. Core-module layer: After obtaining the decoded SQL command, it needs to parse the syntax tree, map the metadata, and call the execution engine to execute

9d6950cb439470eb27761048c53a3583.png

The core part of the transformation lies in the Core Module layer. In the threading model of Sharding-Proxy, the Frontend layer uses IO multiplexing to process client requests, the backend uses the HiKari connection pool to synchronize the request database, and the User Executor Group is used to execute MySQL commands.

d55ec794d0b1078cb95df3f46737db87.png

The execution process of the core layer (Core-module) is entrusted to CommandExecuteEngine to complete, and the process is as follows:

88763840fa9ec33330a8d36f038b15cf.png

The process of obtaining the SQL command executor by Sharding-Proxy remains unchanged. We mainly modify the parsing Statement statement and the SQL parsing execution engine .

▐ SQL parsing and execution engine

In the data space, we divide the overall structure of SQL execution into the following parts.

  1. parser. ShardingSphere uses the completely self-developed antlr for analysis. It is found through testing that it is very time-consuming . After the data space is changed to Druid analysis, the performance is improved by 5 times

  2. extractor. Here you need to analyze the AST nodes in depth and encapsulate them into a simplified version of the data space SDK

  3. builder. This refers to the field definition information required to construct a query request package, including basic parameters such as field name, table name, field type, and field length, which are obtained from metadata.

  4. Actuator. Handed over to the data space metadata execution engine for execution

95150549329c9b08f65805d2c7d247d0.png

▐Engine   upgrade

In the process of parsing and executing the above, the SQL protocol layer needs to parse the SQL into the protocol format specified by the data space before executing, and there are the following problems.

  1. Poor scalability. Every time a new SQL statement is adapted, the extractor and the underlying metadata execution engine need to be modified, and the transformation cost is relatively large

  2. Complex SQL is difficult to describe. The protocol format stipulated by the data space is relatively easy to describe for simple SQL, but it is more difficult to describe for complex SQL such as subquery and multi-table query

  3. Type conversion is complicated. In the process of executing parameter preparation and result return parameters, type conversion is required, and the burden is still relatively large. For the standard query syntax supported by MySQL, the conversion logic needs to be considered very comprehensively

Based on the above considerations, in order to ensure the stability of the proxy layer, the above SQL parsing and execution engine has been upgraded as a whole. The process of parsing and building remains unchanged, but a rewriting engine is added , and the rewritten SQL statement is processed by the execution engine.

bfad7d8634c7991399452d17f583dc55.png

Under the metadata storage model, SQL rewriting needs to rewrite the user's real fields to the elastic columns of the underlying real storage . The detailed rewriting rules are as follows.

1d77f9c03b3e5d14c99d95b54d29bb90.png

  • identifier rewriting

Since the underlying real storage of metadata is actually an elastic column, the table name and field name on the business side are logical concepts, and the identifiers that need to be rewritten include table names and field names. Table name rewriting refers to the process of parsing the AST, obtaining the logical table name, finding the metadata definition, and rewriting it into a real table. Field name rewriting refers to the process of parsing the abstract syntax tree and rewriting all field columns into real field columns.

  • Supplement

Under the multi-tenant data architecture, the data between tenants is logically isolated depending on the business type & tenant ID . The only situation that needs to be supplemented is that all statements need to add tenant query conditions (tenant + business type + table model).

▐Standard   protocol layer adaptation

For the protocol layer SQL needs to be individually adapted. include:

  1. information_schema library

  2. MySQL system variables

  3. set/commit statement, etc.

For example, for the database table and field metadata information in the information_schema metadata database, we need to query the metadata definition information and encapsulate the returned result package to return, so I won’t go into details here.


36afc84112b54d0c8763673dffe8c350.png

Overall Technical Architecture

So far, the basic form and framework of the entire official data space have been constructed. With metadata management as the core , it provides upward tenant model management, data service, data analysis and other capabilities, and completes capability exposure through MySQL Proxy and data space SDK.

The specific technical picture is as follows:

32a607864bcb33164ad2552febad4667.png

66df08f49c0175041de7048dcc660a9d.png

Application Scenario

The official data space has many applications in business scenarios. Currently, low-code platforms and push services are in use. Next, let's take a look at how to use the recently popular low-code construction and push services.


   Scenario 1 low-code construction

With the advent of the low-code & zero-code craze, non-developers can also build applications through free assembly and drag-and-drop components, which greatly reduces the cost of businesses in daily operations. In building the platform, merchants carry merchant data storage, management, analysis and other needs through smart forms . The requirements for storage in low-code scenarios are mainly manifested in the following aspects.

  1. Multi-tenant data model : supports user-defined data model building, and the model and data are isolated by tenant

  2. Free operation and maintenance : non-developer users have no database experience and operation and maintenance capabilities, how to ensure the stability and controllability under various data & traffic models

  3. General storage protocol : It is necessary to support the industry's general database protocol to build products with reuse groups or open source reports 

d0d2957b005c921da2ed9867ce720653.png

Compared with traditional database storage products, the advantages of Dataspace as the storage target for smart application worksheets mainly include the following points.

  1. Non-inductive DDL : decoupled from the physical model, the change of the business model only needs to change the data of the metadata layer, without changing the physical structure of the database, that is, there is no runtime DDL operation

  2. Resilience: Use ADB for underlying data storage. ADB is a distributed database. The architecture of separating storage and computing is naturally elastic, and the data is scattered on each shard. The shard will automatically expand, which can effectively solve the problem of data expansion in the shared table mode.

  3. Report building and query analysis: the data space supports the standard MySQL protocol, and the DeepInsight designer can be used to make reports, and data and authority control can be specified, and different merchants can be designated to access data within their authority

  4. Low operation and maintenance costs: the platform hosts most of the operation and maintenance operations, and users do not need to be aware of the existence of database instances

c8141bcc1700ae818e7ac356aba00062.png

   Scenario 2 push cloud storage

The performance of push service has been difficult to control in the mid-middle and lower service providers during the big promotion over the years. On the one hand, there are many sellers with a single database mounting large traffic, and the centralized traffic is difficult to carry, resulting in continuous accumulation of push packets and push delays. On the other hand, the stability of rds cannot be guaranteed, and the guarantee and operation and maintenance methods of the big promotion are difficult to reach. It is not uncommon for rds quality to fluctuate or continue to be low during the big promotion, resulting in push delays.

The push cloud storage product is the official cloud storage space provided by the data space for the push service. In this mode, the database service controlled by the service provider is hosted on the platform side, and the platform provides ecological stability and performance guarantee for it. Compared with traditional RDS access, it has better flexibility and business stability. 

  • link change

The push service can choose to push the order data to the push shared cloud storage built by the platform. The service provider reads the order data from the cloud storage, performs packaging, delivery and other processing, and at the same time ensures the real-time push. 

Push service-RDS

Push service-cloud storage

illustrate

Data attribution service provider

Data attribution platform

The data belongs to the service provider, and the service provider owns the core data of the platform. In the cloud storage mode, the data belongs to the platform, and the service provider only has read-only access to the data. At the same time, the Proxy agent uniformly controls and manages the data, making it more secure to open.

RDS

RDS/Polar/ADB etc.

The bottom layer of the data space can support multiple storage products

not support

There is no DDL for field changes, custom fields can be opened at any time, and custom fields can be used for order screening

The business fields of the push service table structure are all in the entire JSON string returned by the API, and the service provider needs to parse the JSON string after transferring the order. In the cloud storage mode, the service provider can customize the table structure

Unable to know the SQL processing in the service provider job link

The platform can view the service provider SQL template

The platform acquires service provider job link SQL to help analyze the bottleneck of the ecological performance link capability

Service provider guarantee

Platform Guarantee

Proxy agent performs data control and governance, ensures push write rate, and manages illegal service providers and slow SQL

High operation and maintenance costs for service providers

Platform hosting operation and maintenance operation

The platform hosts most of the operation and maintenance operations, and deeply integrates push services

  • Network Architecture

The internal deployment of the data space shell supports low-code construction. For the push service, the data space deploys a separate set of services in the production network (the network area where the second-party business components or applications are deployed) under the Jushi Tower.


Cloud storage-Core mainly manages instances, accounts, permissions, connection strings, and table structures. Call the data space service through the cloud channel. Use privatelink to connect the tenant VPC and the production network, and the push client deployed in Jushi Tower and the service provider's order transfer service access the push cloud storage service through the allocated database logic instance connection string.

7a21e86f4a4d9aaa87ab6264b8cceb7c.png

This solution is feasible on the network, but due to the multi-hop network information, the protocol side has lost the connection address information, and how to obtain the logical instance information is a difficulty. The current solution is to enable the ProxyProtocol configuration of SLB TCP four-layer monitoring, carry the source address of the client to the backend server through the Proxy Protocol protocol, and Netty on the Proxy proxy side parses the HAProxy protocol, extracts the terminal node information for user identification, and then performs Decoding of the MySQL protocol. 

  • flow control

In order to ensure the service quality of the service provider’s online push and write service, the platform needs to manage the service provider’s SQL to prevent the database from being broken down, and it needs to perform flow control on the Proxy side to limit the connection to the underlying database. Since multiple cloud storage instances are connected to one database instance in the shared storage mode, a distributed concurrency flow control solution is required. Although it is possible to evenly allocate flow control quotas to each application server and convert the problem into a stand-alone flow control, this method is not effective in scenarios such as uneven traffic, machine downtime, and temporary expansion and contraction.

The core algorithm idea of ​​flow control in a distributed environment is actually the same as that of a stand-alone flow control. The difference is that a synchronization mechanism needs to be implemented to ensure the global quota. Push cloud storage is implemented using ZK+Redis.

The detailed process is as follows:

  1. When the server starts, register the IP to Zookeeper, and dynamically monitor the offline and offline nodes

  2. The MySQL connection is successfully established. After passing the authentication, you need to add the cloud storage instance to the IP node in Redis. When the connection is destroyed, clear the instance under the IP node

  3. Before executing the real business SQL, try to obtain the token, and return the token after the SQL is executed

  4. When the server goes offline, clear the instance list under the IP node

Obtaining tokens is mainly to record the number of connections of the current cloud storage instance on the machine in Redis. The detailed logic is as follows.

//获取令牌逻辑
lock
  int count = count (hvals instanceId)
    if(count + 1 > limit)
         ERROR(RATE_LIMIT)
     else
        hincrby instanceId IP 1    
unlock


//释放令牌逻辑
lock
    //先获取一次,防止redis重启
  int value = hget instanceId Ip
    if value == false
        return
    else if value == "0"
        return
    else
        hincrby instanceId IP -1
unlock

a02c09f7ec8128fc964754a276e0fb86.png

Summarize

At present, the preliminary technical structure of Data Space has been completed. As an official storage, it has been used in low-code platforms and push services, and a milestone has been completed. There are still many technical details involved. Of course, the applicable scenarios and businesses of Data Space are not perfect. After all, it is not realistic to use the self-developed SOQL query language like Salesforce, and the workload of adapting the complete MySQL protocol is relatively large. At present, the basic capabilities need to be improved. The future planning and problems to be solved are:

  1. Storage flexibility. Multi-tenant databases share computing and storage resources across all tenants. How to ensure elastic scaling without affecting tenants is a problem we need to solve.

  2. Protocol adaptation. While the metadata storage model brings flexibility to the data model, how to open it to merchants and ecological partners without threshold is one of the problems that need to be solved urgently. At present, Dataspace opens a set of simplified SDK for the second party, and for the third party, it adapts the MySQL protocol through a self-built proxy, allowing customers to connect to the platform database without awareness. The workload of protocol adaptation is huge, and it will take a long process to fully conform to the MySQL standard protocol, but the advantage of self-built Proxy is that SQL can be completely controlled by ourselves, for unsupported SQL or SQL that the platform does not want to execute Can be intercepted. 

  3. Business stability governance. The ultimate goal of the data space is to serve various types of business at the upper level. How to customize different rules for different types of business is also a difficulty that needs to be solved. For example: for push services, based on the design of data space-insensitive DDL, open custom fields; for non-standard SQL usage, use a series of measures such as isolation, current limiting, and concurrency control; you can also read based on query SQL priority At the same time, another major benefit of the platform's control over SQL can provide data support for the ecological full-link performance monitoring service of brand merchants and help merchants find performance link problems.

d624d5a996d67b8d21409c13d21866f1.png

team introduction

The big Taobao technology open platform is an important open way for Ali to communicate with the external ecology. Through open product technology, a series of basic services of the Ali economy, such as water, electricity, and coal, are delivered to our merchants, developers, and community media. And other partners to promote the customization, innovation and evolution of the industry, and ultimately contribute to the new business civilization ecosystem.

We are a technical team with strong technical ability and glorious history and tradition. In the Double Eleven battlefield over the years, the team has performed excellent results. It carries millions of business processes per second, and 90% of the orders are pushed to the merchant's ERP system in real time through the order push service to complete the e-commerce operation. The ERP-WMS scenario opened by Qimen has become the standard of the warehousing industry. With the continuous exploration and rapid development of the new retail business, we are eager for experts from all walks of life to join and participate in the technically challenging work such as core system architecture design, performance tuning, and open model innovation.

¤  Extended reading  ¤

3DXR Technology  |  Terminal Technology  |  Audio and Video Technology

Server Technology  |  Technical Quality  |  Data Algorithms

Guess you like

Origin blog.csdn.net/Taobaojishu/article/details/131335848