Data desensitization (reproduced from Network Security Academy)

C.1 Overview

When financial institutions carry out financial data security protection work, the protection of sensitive information is a particularly important link. There are many types and a large number of financial institutions. With the continuous acceleration of my country's informatization and digitalization, the forms and contents of financial products and services are becoming more and more diverse. In the process of business development and daily operation, financial institutions have accumulated a large amount of data. Most of these data are directly related to the property and data security of financial consumers, and even related to national economic construction and social stability, which is highly sensitive. Therefore, the protection of sensitive information has become the primary problem to be solved in the process of financial data security application. Financial sensitive information usually includes sensitive information stipulated by the state, sensitive information of business data, and sensitive information of personal financial information. In the actual application process, it is necessary to select appropriate data desensitization according to factors such as actual business scenarios and data security levels. way to prevent leakage of sensitive information.

C.2 Definition of data masking

Data desensitization refers to the data processing process that eliminates the sensitivity of data in the original environment through certain methods when exchanging sensitive data from the original environment to the target environment, and retains the data characteristics or content required by the business in the target environment. Commonly used data desensitization The sensitive method technology is shown in Table C.1. The data desensitization in this appendix is ​​mainly aimed at personal financial information and important financial data in the financial industry. The desensitization of personal financial information is a common way of privacy protection in the financial field. Financial institutions use data desensitization technology to eliminate personal The sensitivity of financial information effectively guarantees the security of personal financial information in the process of enterprise data analysis, regulatory collaboration, and open testing.

C.3 Basic principles of data masking

Data desensitization must ensure that the sensitivity of the data is eliminated, and balance the cost of data desensitization, the business needs of the user, and other factors as much as possible. Therefore, in order to ensure that the process and cost of data desensitization are controllable, the results obtained are correct and meet business needs, the following principles should be followed when implementing data desensitization:

a) Effectiveness: refers to the effectiveness of the data desensitization process. After the original data is desensitized, the sensitive information contained in the original information has been eliminated, and the sensitive information cannot be obtained through the processed data, preventing the use of non-sensitive data for inference , Reconstruct and restore sensitive original data.

b) Efficiency: refers to the efficiency of the data desensitization process. By using computer programs to realize desensitization automation and repeatable execution, without affecting the effectiveness, it balances the intensity and cost of desensitization, and the data desensitization work Control within a certain time and economic cost.

c) Reproducibility: That is, when the same original data is configured with the same algorithm and parameters, the desensitized data is consistent, except for random algorithms.

d) Relevance: For structured and semi-structured data, there is a corresponding relationship between a certain field and another field in the same data table. If the desensitization algorithm destroys this relationship, the use value of this field will no longer exist, usually in In the case where reference quantities are required for data statistics, the relevance of the data is high.

e) Configurability: refers to the configurability of the data desensitization process. Due to the different security requirements in different scenarios, the processing methods and processing fields of data desensitization are also different, so it needs to be configured according to the input conditions. , to generate different desensitization results, so that different desensitization data can be provided for different needs according to factors such as data usage scenarios.

C.4 Data desensitization method technology

C.4.1 Generalization

Generalization refers to the use of general values ​​to replace the original data on the premise of retaining the local characteristics of the original data. The specific technical methods include but are not limited to:

a) Truncation: Directly discard information that is not needed for the business, and only retain some key information. The result of data truncation often cannot better maintain the original business attributes. Therefore, when truncating data, select the number of truncation digits as appropriate according to the characteristics of the data.

Example: 1) Truncate the mobile number 12300010001 to 1230001.

2) Truncated the ID number 123184198501184115 to 198501184115.

b) Offset rounding: The data is offset up or down according to a certain granularity, which can hide the original attributes of the data while ensuring a certain distribution characteristic of the data. The method of offset rounding is mainly by discarding a certain precision. Guaranteeing the security of original data can maintain the distribution density of data business characteristics to a certain extent, and is suitable for rough statistical analysis scenarios.

Example: 1) The time 2020032218:08:19 is rounded down according to the granularity of 10 seconds to get 2020032218:08:10.

2) The amount of 5123.62 yuan is obtained according to the hundredth granularity to obtain 5100 yuan.

c) Regularization: Regularize the data into multiple predefined positions according to the size. Although the regularization method maintains a certain business meaning, it will largely lose the original accuracy of the data. You can choose a generalized format according to actual business needs. implementation method of the technology.

Example: 1) Divide customer assets into three levels: high, medium, and low according to their size, and replace customer asset data with these three levels.

2) The business expenses incurred by customers are divided into three levels: high, medium and low according to the amount, and the customer business expenses are replaced by these three levels.

C.4.2 Inhibition

Suppression refers to converting the value of the original data by hiding part of the information in the data, also known as hiding technology.

a) Mask shielding: refers to retaining part of the information, and uniformly replacing part of the content of sensitive data with common characters (such as "X, *", etc.), so that part of the content of sensitive data remains open, but for information holders easy to identify.

Example: 1) Mask the mobile phone number 12300010001 to get 123 * * * * 0001.

2) Mask the ID number 123184198501184115 to get 123184000000004115.

b) In the process of displaying personal financial information through interfaces such as computer screens and client application software, information masks are used to shield or intercept

Example: Mask the bank card number 1234701202106563320 to get 1234 * * * * * * * * * * * 3320.

C.4.3 Disruption

Disturbance refers to disturbing the original data by adding noise, so as to realize the distortion and change of the original data, and the disturbed data

a) Rearrangement: rearrange the original data according to specific rules, and for cross-row data, use random swap to break the original data

1) Rearrange by shuffling the bit order of the data in a certain order.

2) Rearrangement can guarantee some business data information in a considerable range, such as valid data range, data statistical characteristics, etc., so that the desensitized data looks more consistent with the original data, and at the same time sacrifices a certain degree of security. The general rearrangement method is used in scenarios where large data sets need to retain specific characteristics of the data to be desensitized. For small data sets, the target data formed by rearrangement may be restored through other information, so special care should be taken when using it.

b) Encryption: Desensitized data is processed with conventional encryption algorithms such as symmetric encryption algorithm and asymmetric encryption algorithm, so that external users can only see meaningless encrypted data. Raw data is available to interested parties with the key.

1) Use symmetric or asymmetric encryption algorithm to encrypt and store data.

2) The security of encryption depends on which encryption algorithm is used, generally according to the actual situation. The disadvantage of this method is that encryption itself requires a certain amount of computing power, and it will generate a lot of resource overhead for large data sets. Generally, the format of the encrypted data is quite different from the original data format, and the "authenticity" is poor.

c) Replacement: Replace the original data according to specific rules. Common replacement methods include constant replacement, table lookup replacement, and parameterized replacement.

1) Constant replacement: All sensitive data is replaced with a unique constant value, which is irreversible.

2) Table lookup replacement: select data from the intermediate table randomly or according to a specific algorithm for replacement.

3) Parameterized replacement: Sensitive data is used as input to form new replacement data through specific functions.

d) Hashing: Take the hash value of the original data and use the hash value to replace the original data.

1) Use a hash function to calculate the customer password and other information to obtain a hash value, which replaces the original data.

2) In order to ensure the security of the hash, avoid using weak security hash functions such as MD 5 and SHA 1. For the hash with limited space in the original text, the method of adding random factors is usually used to improve security in actual application scenarios. Column functions are often used in scenarios where sensitive information such as passwords is stored.

e) Rewriting: Refer to the characteristics of the original data to regenerate the data. Rewriting is similar to the overall replacement, but there is usually a mapping relationship between the replaced data and the original data, while the data generated by rewriting and the original data are generally There is no mapping relationship.

f) Fixed offset: Increase the data value by n fixed offsets to hide the numerical features.

g) Partial obfuscation: keep the first n bits unchanged, and obfuscate the rest.

h) Unique value mapping: Map data into a unique value, allow the original value to be retrieved according to the mapped value, and support correct aggregation or connection operations.

i) Homogenization: For numerical sensitive data, change the original value of the value while ensuring that the total value or average value of the desensitized data set is the same as the original data set. This method is usually used in cost tables and salary tables and other occasions.

C.4.4 Lossy

Loss means to protect the entire sensitive data set by losing part of the data. It is applicable to the scenario where all the data in the data set are aggregated to form sensitive information. The financial background system does not have open query capabilities. Lossy techniques can achieve the effect of limiting batch queries. Specific lossy techniques include, but are not limited to:

a) Limit the number of rows: only return a certain number of rows of data in the available data set, and are mostly used in background systems that do not have open query capabilities, and strictly limit batch queries.

b) Limit the number of columns: only return the data of a certain number of columns in the available data set, which can be applied to the basic information query of personnel, and limit or prohibit the returned data set from including certain sensitive columns.

C.5 Data masking application classification

C.5.1 Overview

Data masking can be divided into dynamic data masking and static data masking according to the real-time data masking and application scenarios. Static data desensitization is generally used in non-production environments. Sensitive data is extracted from the production environment and desensitized before being used in non-production environments such as training, analysis, testing, and development. Dynamic data masking is generally used in production environments, where sensitive data is masked in real time and then used in production environments such as application access.

C.5.2 Static data desensitization

Static data desensitization aims to complete the deformation and conversion processing of large batches of data at one time according to the desensitization rules through a processing method similar to ETL technology. The schematic diagram of static desensitization is shown in Figure C.1. Static desensitization is usually implemented in the production environment. Sensitive data is used when it is delivered to the development, testing, or outbound environment. While reducing the sensitivity of the data, it can maximize the value that can be mined such as the intrinsic relevance of the original data set.

Main features of static data desensitization:

a) Adaptability, which can desensitize sensitive data in any format.

b) Consistency, that is, retain the original data field format and attributes after data desensitization.

c) Reusability, that is, data desensitization rules and standards can be reused, and different business needs can be met by customizing data privacy policies.

C.5.3 Dynamic data desensitization

Dynamic data desensitization aims to use middleware technology similar to network proxy to process the data accessed by external applications in real time according to desensitization rules and return the desensitized results. The diagram of dynamic desensitization is shown in Figure C.2. Dynamic desensitization is usually used in scenarios where data is provided with external query services. While reducing the degree of data sensitivity, it minimizes the delay in obtaining desensitized data on the demand side, and the data generated in real time can also be desensitized immediately. After the result. The main features of dynamic data desensitization are as follows:

a) Real-time, that is, it can dynamically desensitize, encrypt and remind sensitive data accessed by users in real time.

b) Multi-platform, that is, access restrictions between platforms, different applications or application environments are realized through defined data desensitization strategies.

c) Availability, that is, it can ensure the integrity of desensitized data and meet the data needs of business systems.

C.6 Data desensitization application scenarios

The application scenarios of data desensitization are mainly divided into technical scenarios and business scenarios. Technical scenarios mainly include development and testing, data analysis, data scientific research, production, data exchange, operation and maintenance and other scenarios. Business scenarios include but are not limited to credit risk assessment, fraud Security identification, precision marketing, consumer credit and other scenarios, common data desensitization application scenarios are shown in Table C.2.

C.7 Privacy data desensitization method reference

C.7.1 Desensitization of contact names

Examples of desensitization methods for contact names are given in Table C.3.

C.7.2 Desensitization of enterprise account name

See Table C.4 for an example of the desensitization method of the enterprise account name.

C.7.3 Desensitization of ID number

See Table C.5 for examples of desensitization methods for ID card numbers.

C.7.4 Desensitization of passport numbers

Examples of desensitization methods for passport numbers are given in Table C.6.

C.7.5 Desensitization of addresses

Examples of desensitization methods for addresses are given in Table C.7.

C.7.6 Desensitization of license plate numbers

Examples of desensitization methods for license plate numbers are given in Table C.8.

C.7.7 Desensitization of contact numbers (landlines)

Table C.9 is an example of desensitization methods for contact numbers (landlines).

C.7.8 Desensitization of contact numbers (mobile numbers)

See Table C.10 for examples of desensitization methods for contact numbers (mobile numbers).

C.7.9 Desensitization of date and time

Example of desensitization methods for datetimes Table C.11.

C.7.10 Desensitization of email addresses

Examples of desensitization methods for e-mail addresses are given in Table C.12.

C.7.11 Desensitization of passwords

Examples of desensitization methods for passwords are given in Table C.13.

C.7.12 Desensitization of financial account numbers

Examples of desensitization methods for financial account numbers are listed in Table C.14.

C.7.13 Desensitization of bank card numbers

Examples of desensitization methods for bank card numbers are given in Table C.15.

C.7.14 Desensitization of passbook account

Examples of desensitization methods for passbook account numbers are shown in Table C.16.

C.7.15 Desensitization of VAT numbers

Examples of desensitization methods for VAT numbers are given in Table C.17.

C.7.16 Desensitization of VAT account number

Examples of desensitization methods for VAT account numbers are shown in Table C.18.
Author of the original text: Network Security Information
Reposted from the link: https://www.wangan.com/docs/7842
 

Guess you like

Origin blog.csdn.net/weixin_29403917/article/details/128324707