[Project data optimization 1] Desensitization of sensitive data

Foreword: With the development of informatization and digital transformation, enterprises have a large number of data secondary use (secondary use) and mining needs. In order to balance data utilization and privacy protection (sensitive data protection), data desensitization - as a mature and The application of flexible data security technology has become a necessary technology and measure for the vast majority of enterprises in the process of data security governance and construction.

Digital management platform
Vue3+Vite+VueRouter+Pinia+Axios+ElementPlus tutorial
permission system-Mall
personal blog address

1. Simple and logical data desensitization processing

In front-end project development, for some sensitive information, such as: ID number, mobile phone number, user name, etc., you do not want others to see it.
The general way to deal with it is to use "*" instead of the middle character, which is the "desensitization method".
insert image description here

1.1 Mobile phone number desensitization

insert image description here
The number code used in my country is 11 digits, of which the first 3 digits are network identification codes, indicating China Unicom, China Mobile, and China Telecom; the 4th to 7th digits are area codes; the 8th to 11th digits are user numbers (randomly assigned). as shown in picture 2. According to the mobile phone number, the number attribution of the number user and the operator selection information can be inferred. Except that the first digit of the mobile phone number is 1 by default, and the range of other digits is generally 0-9, then it can be estimated that the maximum number generated is 10^10=10 billion.

phone.replace(/^(.{3})(?:\d+)(.{4})$/, "$1****$2")

1.2 Desensitization of ID number

The national identity number of our country is 18 digits, which contains rich spatio-temporal information of the identification subject. It can be expressed as shown in Figure 1. It consists of a 17-digit digital body code and a check code. The first 6 digits are address codes: 1-2 digits represent provinces (autonomous regions, municipalities directly under the central government, and special administrative regions), 3-4 digits represent cities (summary codes of municipal districts and counties under the jurisdiction of regions, autonomous prefectures, leagues, and municipalities directly under the Central Government), 5- 6 represents the county (district under the jurisdiction of the city, county-level city, banner). The 7th-14th digits indicate the date of birth, including 4-digit birth year + 4-digit birthday; 15-17 digit sequence codes (among which 17 odd numbers are assigned to males, and even numbers are assigned to females), and 18 digits are digital school codes. The verification code can be calculated by a certain verification formula.
insert image description here
As can be seen from the above figure, given a citizen’s ID card number, information about the citizen’s place of birth, date of birth, and gender can be inferred. For the population sharing the same address code, it can be obtained from the public statistics of various regions in the country. This amount of data is related to the population density of the region. In remote areas with low densities, this information can be highly "identifiable". For the population of the same date of birth, for simple estimation, assuming a uniform distribution, the age range is 0-130, then about 30,000 people were born on the same day (1.4 billion/(130*365)). It can be seen that the "identity identifiability" is very weak by only exposing the 8 digits of the date of birth of the ID card number. However, taking the area code into account, 30,000 possibilities are further eliminated, and then through the sequence code (15-17 digits) of 1,000 possibilities, multiple possibilities can be completely eliminated, and the "identifiability" reaches a unique level.

idCard.replace(/^(.{6})(?:\d+)(.{2})$/, "$1**********$2") 

1.3 Code Demonstration

columns: [
          { dataIndex: 'shipperName', title: '企业名称', width: 230 },
          { dataIndex: 'serialNumber', title: '承运单号', width: 260 },
          { dataIndex: 'contractTypeName', title: '订单类型', width: 100, align: 'center' },
          { dataIndex: 'driverName', title: '司机姓名', width: 100, align: 'center' },
          { dataIndex: 'driverPhone',
            title: '司机手机号',
            width: 130,
            align: 'center',
            customRender: (text) => {
              return text.replace(/^(.{3})(?:\d+)(.{4})$/, '$1****$2')
          } },
          { dataIndex: 'plateNumber', title: '车牌号', width: 110, align: 'center' },
          { dataIndex: 'contractTime', title: '接单时间', width: 170, align: 'center' }
]

2. Data desensitization processing of complex logic

2.1 Username Desensitization

The user name is different from the mobile phone number and the ID number. It belongs to the indefinite length within the specified range. It may be two characters, three characters, four characters, etc., so it needs to be processed by judgment.

const nameMask  =  (name) => {
	let result = ""
	switch (true) {
			case name.length === 2:
					return name.substring(0, 1) + '*' 
					break;
			case name.length === 3:
					return name.substring(0, 1) + "*" + name.substring(name.length-1) 
					break;
		    case name.length >= 4:
					return name.replace(/^(.{1})(?:.{1,})(.{1})$/g, "$1**$2")
					break;
	}
	return result;
}

Note: Please adjust the specific data processing rules according to business needs.

3. Explanation

3.1 () in regular expressions

The () in the regular expression is to play a grouping role, put the matched ones into the mathches collection, $ is equivalent to the collection name, 1-9 is equivalent to the index, and $1...$9 is equivalent to the value of the corresponding index. Note that the subscript starts from 1, indicating the first element, not from 0.

Guess you like

Origin blog.csdn.net/qq_39335404/article/details/129477701