Dry goods sharing | CloudQuery data protection capability of dynamic data desensitization!

In the process of enterprise digital transformation, especially with the rapid development of information technology and communication technology such as Internet +, cloud computing, and big data, massive data is stored and processed on various information systems, including a large amount of valuable sensitive data. , which means the risk of data leakage is also increasing.

Data breaches can be caused by a variety of factors, including malicious attacks, hacking, employee error, lost or stolen equipment, and more. Once the data is leaked, personal privacy will face the risk of leakage, which may lead to problems such as identity theft, financial fraud, and misuse of personal information. In addition, some industries and regulations require organizations to take measures to protect sensitive data, such as financial institutions need to protect customer personally identifiable information, and medical institutions need to protect patient medical record data.

To address these challenges, data desensitization technology emerged. Data desensitization is a data protection method that protects the security and privacy of data by modifying, transforming or hiding sensitive data. Desensitized data still retains the structure and format of its original data, but does not contain sensitive information that directly identifies individuals.

Data desensitization has also become a compulsory course for most enterprises in data management and control.

CloudQuery’s data masking capabilities

CloudQuery currently supports two data masking methods: static masking and dynamic masking.

Static desensitization is a method of desensitizing data during data storage or transmission, using the method of "desensitization first and distribution later". It usually desensitizes sensitive information during the data collection and storage stages to ensure the security of data during storage and transmission. The goal of static desensitization is to protect personal privacy by irreversibly processing sensitive information so that it cannot be restored to the original data.

Dynamic desensitization is a method of dynamically desensitizing sensitive information during data use. Different from static desensitization, dynamic desensitization is to encrypt sensitive information when it needs to be used, and keep the plaintext state of the data in other cases. It can be said that the core concept of dynamic desensitization is to flexibly adjust the level and method of desensitization during data use according to actual needs and usage scenarios, so as to achieve a higher level of privacy protection and a balance between data availability.

Dynamic data desensitization and static data desensitization are suitable for different scenarios, and there is no difference between the two. The main reason is to choose the appropriate desensitization mode based on the usage scenario. Currently, CloudQuery's static desensitization function is only available to Enterprise Edition customers. This article will mainly introduce the common capability of several versions - dynamic desensitization.

CloudQuery dynamic desensitization solution

The current mainstream dynamic desensitization technology routes are divided into two paths: "result set parsing" and "statement rewriting":

  • Result set analysis:
    Do not rewrite the statement sent to the database, you need to know the data table structure in advance, after the database returns the result, judge which data in the set needs to be desensitized according to the table structure, and rewrite the result data one by one.

  • Statement rewriting:
    rewrite the statement containing the sensitive field query, and rewrite the sensitive field (table column) involved in the query by means of an outer nested function, so that the database returns a result set that does not contain sensitive data when running the query statement.

But both methods have their own pros and cons. "Result set parsing" has higher flexibility and data availability, but at the same time, it will bring a certain performance overhead when processing large-scale data. "Statement rewriting" is more efficient. The way it rewrites sensitive fields in the query statement ensures that sensitive data will not be transmitted to the application layer or other links, ensuring higher data security. At the same time, statement rewriting also has good portability. . But relatively, because of the modification in the query statement, "statement rewriting" will inevitably limit some complex queries, and its customizability is weak.

In view of the difference between the two methods, CloudQuery combines the advantages of SQL statement rewriting and result set rewriting in the implementation of dynamic desensitization, achieving a balance between performance and applicability. Users can use different SQL statements based on different scenarios. , trigger different methods of desensitization, and fully cover the real-time data desensitization requirements in operation and maintenance scenarios and business scenarios.

For example:select * from table1;

Since the statement contains all the columns in the query, we will not perform pre-desensitization at this time. Instead, after the execution is completed, compare the column names in the result set to determine whether they are the same as the configured column names, and then proceed to the result set Analytical desensitization.

And the following:select a from table1;

If there is an explicit query for column a in the statement, a will be rewritten and the statement will be rewritten asselect func(a) from table1;

To execute, this will only use pre-desensitization without having to traverse the result set.

During use, you can use different SQL statements to query according to the data volume and performance requirements.

How to use the dynamic desensitization function of CQ?

The dynamic desensitization function of CloudQuery Community Edition v2.0.0 can perform differentiated desensitization on the data returned by the database according to the user level and data level without changing the data in the production database. It can ensure that users with different roles can access sensitive data in the database in a differentiated manner. Support desensitization algorithms such as interception, encryption, hiding, and replacement.

Click the "Data Protection Management" menu button on the main page of CloudQuery to enter the data protection setting page.


Here we set an interception algorithm for the AAA field to retain the first three digits.

You can also choose whether to enable the rule here or at the parent node of the table.

The picture before and after rewriting is shown below:

(before rewriting)

(After rewriting)

CloudQuery dynamic masking outlook

In later versions, CloudQuery will gradually introduce more functions to improve the dynamic data desensitization map, such as:

  • Custom result set data parsing rules: specify a regular expression under a schema, such as a mobile phone number or ID number, and if the query result has data that conforms to the format, it will be desensitized according to the corresponding modification rule. It also supports custom addition of built-in desensitization rules, which can capture sensitive data that has not been intercepted by the configured desensitization algorithm, further ensuring data security.

  • Desensitization through hierarchical methods: that is, it supports setting levels for fields and setting levels for users. Different users can only query the data of their corresponding level, and the data that does not meet the requirements will be desensitized, providing more personalized and precise privacy protection capabilities.

  • Data scanning: Automatically identify sensitive data items and locations of sensitive data. After scanning, data can be classified into different sensitivity levels or categories, so that corresponding desensitization rules can be applied to each category. At the same time, data scanning can analyze the correlation between sensitive data to ensure the consistency and integrity of data during the desensitization process. By understanding the relationships between sensitive data, you can ensure that the masked data remains usable and useful.

  • Field desensitization algorithm recommendation: recommend different field desensitization algorithms for different application scenarios and requirements, and combine multiple algorithms and strategies to achieve flexible, safe and efficient desensitization processing for different types of sensitive data.

Dynamic data masking is an important part of CloudQuery's data protection management capabilities. By desensitizing sensitive data in real time, the visibility of sensitive information is reduced during data use and sharing, thereby reducing the risk of data leakage and misuse.

In the future, CloudQuery will not only improve the dynamic desensitization function, but also continuously add data protection capabilities. At present, capabilities such as national secret support, audit logs, and data backup have been implemented in CloudQuery Enterprise Edition. Using these data protection measures comprehensively, enterprises can establish a comprehensive data security and privacy protection system to ensure data security, integrity and availability.

Guess you like

Origin blog.csdn.net/weixin_46201409/article/details/131070713