Information Security-Data Security-Data Security Platform Construction Practice

background

In the era of big data, data has become the core competitiveness of the company. Previously, we introduced the construction and practice of the Meituan Wine Travel Origin Data Governance Platform , which mainly provides data support for the company's development decision-making and business development through various data analysis and mining methods.

Recently, frequent data security incidents in the industry have caused irreparable losses to related companies, and it has sounded the alarm for companies with weak awareness of data security protection. How to control the authority of various data products such as data analysis, data service, and data governance, which have the most concentrated internal data in the company, has become the most important task in the construction of data security.

From the perspective of control, authority control can be divided into function-level authority control and data-level authority control. Most of the early data security products used the traditional permission model, which could only implement function-level permission control, but not data-level permission control. Based on the higher security requirements of data products, we need to build a platform that satisfies the data security of various products at the same time.

To this end, the Meituan user platform application research and development team not only designed a permission model that can express and control various complex relationships, but also designed three subsystems of approval, permission, and audit for the three scenarios before, during, and after the event. Guarantee the complete closed loop of data security, and then meet various requirements of data security.

Figure 1 permission background

The permission expression of functional application products is generally "whether there is permission", while the relationship of data product permission expression is more complicated. For example, a report of a data product needs to express not only whether the user can access the report, but also which dimensions, indicators, and dimension value ranges in the report the user can access. It is also necessary to inform which library table models these dimension indicators come from, whether they have permission to access and create reports.

permission model

Traditional permission models include ACL (Access Control List) access control list, RBAC (Role-Based Access Control) role-based access control, etc. The above model is more suitable for the authority control of application-type products, while data-type products have higher requirements for information security, and the relationship between various resources is also more complicated. It is difficult to clearly express the internal relationship using the traditional model, so Based on the RBAC permission model, we extended and designed a new permission model.

Figure 2 Traditional permission model

As shown in Figure 2, the traditional permission model:

In the ACL access control list, users are directly associated with permissions, and the relationship between users and resources in the list is directly maintained to achieve the purpose of permission control.
In the RBAC model, roles are associated with permissions, and users become corresponding roles to obtain corresponding permissions.

Why design a new permissions model?

The ACL model is to directly establish a relationship between users and resources, without the concept of roles. When some users need a batch of permissions for the same resource, the authorization operation becomes very complicated, and this model is not suitable for this situation.
The RBAC model introduces the concept of roles, which establish relationships with resources. When some users need permissions for a batch of the same resources, it is only necessary to construct a role and grant permissions to use these resources. When a user joins this role, he has all the permissions of this role. Solved the problem of complex empowerment operations.

However, both the ACL model and the RBAC model have the following problems:

The relationship between data product resources is complex, and this complex relationship cannot be well expressed. For example: there are multiple tabs under one report, multiple components under one tab, and multiple dimensions and indicators under one component. At the same time, dimensions and indicators come from different data models, database tables, etc. There is a relationship between resources. When an administrator grants a user all or part of the permissions for a report, the sub-resources under the report need to obtain the corresponding permissions at the same time.
There is no correspondence between roles and roles in the RBAC model. For example: In the organizational structure, the organizational structure of the employees is as follows: East China/Sales District 1/Sales Group, and the roles that employees have are the roles of the Sales Group. When there is no relationship between the roles, if employees need the permissions of the East China role, they need to be added to the East China role. And if there is a subordinate relationship between roles, this problem can be solved well.

How does the new permission model solve the above problems:

When designing a resource model, there is a subordinate relationship between resources, and resources allow multiple levels and are displayed in a tree structure. For example, a report is a parent resource, and tags, components, and dimension indicators are all sub-resources under the report. In this way, the relationship between the report resource and the sub-resources below can be clearly displayed during authorization, and various permission controls can be satisfied during authorization and authentication. requirements.
There is an affiliation relationship between roles. For example, in the organizational structure of East China Region/Sales Region 1/Sales Group, the three roles of East China Region/Sales Region 1/Sales Group have a parent-child relationship. , when the employee is under the sales group department, he has all the permissions of the East China region, the first sales region, and the sales group. When the permissions do not conflict, all permissions are directly merged, and if there is a conflict, the "proximity principle" is used to overwrite.

Figure 3 New permission model

As shown in Figure 3, the new authority model includes three parts, user center, resource center, and authority center.

User Center: User Management, Role Management

Roles are divided into three types: individual, organization, and custom. A user can have multiple roles at the same time. For example, a user corresponds to a personal role by default, and can also have an organization role in the company's organizational structure and a custom role in a custom organization.
Roles support multiple levels to satisfy the expression of permission inheritance between roles.
User and department information Mafka (a distributed message middleware comprehensive solution developed by Meituan based on Kafka) is updated in real time, and ETL is synchronized regularly every day to ensure that personnel entry, transfer, and transfer permissions are synchronized in real time.

Figure 4 User Center

Resource Center: Resource Management

The resource type supports customization, and supports customized resource access on the basis of common resource types, so as to meet the unified management and control of different resources of each system.
Resources support multiple levels, and the resource display method in a tree structure is convenient for unified authorization and authentication of resources; when a report resource is authorized, resources such as dimensions and indicators hanging under the report can be uniformly authorized.
Support resource packaging to simplify the authorization process.
Resource security level, resource person in charge, supports self-service permission application according to different approval templates for resource configuration.

Figure 5 Resource Center

Authority Center: Various strategic expressions of the relationship between roles and resources

Scope strategy: For example, the dimension value of the platform dimension in the report includes Meituan and Dianping. When empowering, it is supported to grant some or all permissions to users as required; when authenticating, it is analyzed according to the rules that someone has some or all permissions of a certain dimension .
Expression strategy: When authorizing the report to the user, set the expression to limit 10, which means that the current user can only return the first 10 records on the basis of other permissions of the report.
Automatic merging of permissions: A user has multiple roles, and the permissions of the same resource with multiple roles are automatically merged according to the rules; when the rules are parsed, the permission data is combined when there is no conflict, and the corresponding value is taken according to the priority when there is a conflict.
Black-and-white list: Supports the comprehensive development and ban of a person targeting a certain resource according to specific rules. The black-and-white list strategy has the highest priority, and the black list is higher than the white list.

Figure 6 Authority Center

challenge

In the process of building a data security platform, the main challenges are as follows:

With the increase of supported business lines, when the general platform cannot meet the customization needs of each business line, it is necessary to ensure the flexibility and scalability of the system.
Provide a common data security platform to meet most of the data security requirements and ensure the versatility of the system.
As a system with high QPS access, how to ensure the high availability of the system.

Solutions

Provide flexible and pluggable Plugin services to meet the flexible authority control requirements of each business line on the basis of general authority.
Provide a common data security platform to meet the basic functions of authority, approval and audit.
Microservice architecture, separation of core and non-core services, and data cache degradation meet high availability of the system.

solution

Figure 7 The overall structure of the general order

As shown in Figure 7, the general order is divided into three parts, the data content authority platform, the approval flow platform, and the audit log platform:

Provide various flexible and pluggable Plugin services, and support custom development based on general services.
Provide basic services to meet various common data security requirements.
A management workbench is provided to support administrators to manage and configure pages for various data and rules.

specific plan

Plugin service layer to ensure flexible and scalable system

On the basis of meeting the general permissions, each business line will inevitably have customized permission control requirements, so the permission Plugin module is designed.

General services provide user management, resource management, authentication and authorization services, and Plugin calls basic services to implement special authority control. The application and data of the Plugin module are managed separately, and the general service is called through RPC to achieve flexible pluggability. Subsequent services of the Plugin module support individual customized development of each connected application.

Figure 8 Plugin service

As shown in Figure 8, the general permission service is separated from the Plugin service, and multiple Plugin services are supported to be flexible and pluggable:

General services provide general services such as users, resources, authentication and authorization, and most systems can implement authority control requirements based on general services.
The Plugin service is expanded based on the SDK provided by the general service, and each Plugin service is deployed independently to ensure that the systems are independent of each other.

The final authority realizes hierarchical control, which is divided into core data layer (users, resources, authority data) and application layer. The data in the core data layer is managed by general services to meet the requirements of unified management and control of authority data. The application layer is accessed through the Plugin service, and the Plugin reads and writes permission data through the external SDK of the general service layer to meet the customized control requirements. Data in the application layer is stored separately, and control rules can be customized. Calls between interfaces are authenticated through BA authentication to ensure the security of calls between services.

Basic service layer to ensure system versatility

General permission system architecture

Using microservice architecture design, the system is divided into access layer, service layer, database layer, and external service layer. It mainly includes the following core services:

User service: mainly includes user and department information synchronization, role management.
Resource service: including resource registration, resource timing synchronization, resource confidentiality and administrator management, resource package management.
Empowerment service: Self-service application for permissions, administrator empowerment.
Authentication service: Provide various authentication SDKs for users to call.

Figure 9 permission system architecture

As shown in Figure 9:

Access layer: All external systems call services through a unified SDK.
Service layer: Microservice architecture, each service provides services to each other.
Database layer: Reasonable use of cache and data degradation to ensure high availability of services.
Integrate the company's public services to ensure the stable operation of the system.

Approval System Architecture

Provide general approval services, provide multi-level approval templates, select the template to start the approval process when using it, and the approval system will analyze the rules according to the parameters started, and automatically adapt to the corresponding approval process. The simplified access process supports one-key access.

Figure 10 Approval system architecture

As shown in Figure 10: optimize the approval access process, provide general approval services, and reduce system access development costs:

Early development of an approval function requires 6 steps, drawing a flow chart, configuring approval groups and members, configuring notification messages, configuring event mapping, starting the approval flow, and developing a callback interface to change the status.
However, we package it on the basis of the platform's approval service and provide a general approval template. To access the approval system, you only need to select a template to start the approval process and provide a callback interface. It can meet most of the approval functions.

Provide a common rule parsing engine to support the dynamic parsing and matching of approvers, approval conditions, and approval notifications according to rules. Flexible realization of various common functions such as automatic approval, multi-person multi-level approval, and scheduled reminders.

Docking authority and auditing system to ensure the data security of the approval system:

Connect to the authority system to provide administrator authority control.
Docking with the audit system, the operation data falls into the audit system to facilitate subsequent data audit.

Audit System Architecture

Provides general data auditing services, client logs are buried and reported, and audit logs are stored in Elasticsearch by type. Connect with Ruyi Visual Report to generate audit reports, and connect with the permission system to control data permissions.

Figure 11 Audit system architecture

As shown in Figure 11: the audit data model layer supports automatic expansion:

Each application corresponds to an appkey, and each appkey automatically creates an index according to the template and date, and supports automatic expansion.
Each type of audit log corresponds to a type in the Elasticsearch index. When adding an operation log, the type is automatically created.
The fields in the audit log correspond to the fields in type, and are automatically expanded when new fields are added.

Ensure high availability of the system

Microservice Architecture Service Separation

As the system has more and more module functions, the single architecture model is no longer suitable for agile development. The larger the module, the slower the system starts. If any module fails, the services of the entire system will be unavailable.

In order to ensure the high availability and scalability of the service, the modules are split with the micro-service architecture, and the core and non-core services are separated.

Figure 12 Microservice architecture

As shown in Figure 12:

The front-end access layer accesses through HTTP, BA authentication verifies the legitimacy of the request, and load balances through Nginx.
The management console realizes unified management by calling various services of the service layer.
The service layer abstracts each module of the system. Each module is a microservice. Each microservice is deployed independently and can be deployed on demand according to the scale of each service.
The client layer provides a unified Pigeon (Meituan internal distributed service RPC communication framework) interface to the outside world, and introduces and calls various services in the service layer through the POM.

permission inheritance

Since the resource supports multiple levels, permission inheritance is supported when designing the permission model. When inheritance is enabled during authorization, the user has all the permissions of the resource and all the resources below by default. When storing data, only the relationship between the ancestor resource and the user needs to be stored. Significantly reduced permission matrix size.

Figure 13 Permission inheritance

Permission data storage

The more systems connected, the more resources and users. As the system runs longer, the corresponding permission data will also increase rapidly. How to ensure the performance and high availability of the interface while the data grows.

Permission backup and restore

Referring to the HBase version number and the design idea of MySQL's Binlog, the authority only stores the latest authority data of the current user during authorization, and the historical authority data and operation records are stored in Elasticsearch in the form of version number. User authentication only needs to query the permission data of MySQL to ensure the efficiency of the authentication interface.

Figure 14 Permission backup and recovery

As shown in Figure 14:

During the authorization operation, the authorization data is managed through the version number, and the version number is incremented by 1 after each operation, and only the latest authorization data is stored in MySQL and Redis.
Historical permission data is stored in Elasticsearch by version number. Every time you view historical operation records or restore permission data, you can backtrack according to the version number.

Permission expiration cleanup

Through Crane timing scheduling, according to the configured notification rules, scan the permission data that is about to expire, and send a message to notify the user to renew the permission.
Scan the expired permission data, clean up the expired permission data in MySQL and Redis, and dump it to Elasticsearch for storage, and prepare for subsequent permission audit.

Data read and write separation, caching, backup and service downgrade

Each service uses MySQL sub-database storage, uses Zebra (Meituan database access layer middleware) to separate read and write; reasonably uses data cache and backup, and supports service downgrade to ensure high availability of services.

Figure 15 Data read-write separation, cache, backup, and service downgrade

As shown in Figure 15:

Each service uses MySQL sub-database storage; core services are separated from non-core services, and services and databases support elastic expansion on demand.
Hot data such as roles and resources are cached using Redis, and automatically sink to MySQL for query when the Redis cache is unavailable.
Inactive data such as operation records and historical data are sent to Elasticsearch for auditing and data recovery.
When the service is unavailable, it supports circuit breaker downgrade to ensure the availability of core services.

Reasonable use of message queues, task scheduling, thread pools, and distributed locks

Use message queues, task scheduling, and thread pools for asynchrony, peak clipping, and decoupling to reduce service response time and improve user experience. And use distributed locks to ensure data consistency.

Figure 16 Improve service response speed

As shown in Figure 16:

Use the message queue to process user requests, and return the operation success in real time. The background processes and modifies the status asynchronously according to the received MQ message. The page polling status displays the final result or sends an elephant (Meituan internal communication tool) message to push the final result.
Tasks that require timing synchronization are scheduled and executed through the Crane distributed task scheduling platform.
The thread pool is used to handle approval result callbacks and failure retries during approval callbacks, reducing the overhead of creating and destroying threads.
Distributed locks ensure that the same method can only be executed by one thread on one machine for the same operation, avoiding data inconsistency caused by repeated submissions by users or repeated processing by multiple machines.

Outlook

As a general-purpose data security platform, it is impossible to meet the various customization needs of each business line. At present, the system architecture supports the provision of multiple pluggable Plugin services, and realizes customized authority control on the basis of general services. Subsequent general orders will provide Plugin development specifications for permissions, approvals, and audits, and the systems that support access will be customized and developed on the existing basis.

Figure 17 Overall Architecture and Outlook

As shown in Figure 17:

In the future, a unified Plugin development specification will be provided to the outside world, and each access system will be supported in the form of Plugin services for customized development on top of the platform's basic services to meet their own special authority control requirements. In this way, centralized management and control of data product permissions can be realized to ensure data security.
Separate the rules in the general order from the existing services, abstract a general rule engine service, and realize flexible and configurable rules.

About the Author

Yishan, a technical expert of Meituan Dianping, is currently the chairman of TechClub-Java Club. He graduated from Wuhan University in 2006 and has worked in IBM, UFIDA, Fengxing and Alibaba. Joined Meituan in 2014 and has been working on BI tools, data security and data quality for a long time.
Zhonghua, Meituan Dianping data system research and development engineer, joined the Meituan Dianping Data Center in 2017, and has been engaged in BI tools and data security related work for a long time.