The 2023 Privacy Computing and Artificial Intelligence Summit was successfully held! Databao Speech Record (Part 1) Sharing

On April 8, 2023, the 2023 Privacy Computing and Artificial Intelligence Summit was held in Shenzhen. The conference was co-sponsored by East China Jiangsu Big Data Trading Center and Hot Information. At the meeting, Ms. Zhan Zhen, director of Databao, gave an opening speech.

Databao signed a strategic agreement with Open Islands, and Xiaobin, the rotating CEO of Databao, took the stage to light the kickoff ball with Open Islands.

With the theme of "Focus on Privacy Computing, Empowering the Future of Artificial Intelligence", this summit will focus on the development of the Internet ecosystem, with Internet technology as the core and user experience as the guide, to achieve a fair, open, and secure network system and ecosystem. Create a privacy computing technology application and ecological interactive communication platform, and launch a new global digital economic system. Xiao Bin, the rotating CEO of Databao, was invited to give a speech on the topic of "Privacy Computing Application Scenarios and Databao Practice Exploration". The following is the memoir of Xiao Bin’s speech (Part 1), the rotating CEO of Databao.

insert image description here

Distinguished guests, colleagues, good morning! Today, on behalf of Databao, I will make a simple sharing and report with you.

The report is roughly divided into several directions. The first is the main implementation scenarios of Databao in privacy computing, the second is the practical exploration of Databao, the third is Databao to see the current situation of privacy computing, and the fourth is the solution of Databao Solution exploration, and finally introduce the data treasure.

The first is the main landing scenario of Databao in privacy computing. Whether it is policy dividends or objective market needs, privacy computing has ushered in a relatively significant development opportunity. Since 2019, the trend of privacy computing has been increasing year by year, and it has ushered in a relatively high-speed development state in 2021. It has more obvious application possibilities in various aspects. Databao also has its own exploration and practice in this direction, including finance, insurance, government affairs, and of course the medical treatment mentioned by the previous guest. There are also many such applications.
insert image description here

Judging from this large pie chart, from the financial side, the demand for privacy computing is relatively strong in terms of the demand for privacy computing and its implementation in the market.

Let's focus on a few scenarios. The first one is finance and insurance. What are the main problems that finance and insurance solve? When data is applied, whether it is data risk control or insurance anti-fraud, or even the evaluation of more latitude information, to reduce risks. From this perspective, it has the direction of risk control, and of course it also includes financial aspects. Whether it is new aspects or active marketing aspects, it needs a lot of external data to integrate with the company. At this time, it is necessary to introduce compliance A legal, safe, and controllable system to ensure that data can be applied in compliance scenarios after external references. There is also a regulatory direction. Whether it is statistical type or query type, it will solve the problem of finance. Whether it is the China Banking Regulatory Commission or the China Securities Regulatory Commission, each regulatory agency has a lot of information about the various latitudes of information users, and it needs to supervise its content. No matter it is Its customers or its actual transaction scenarios need to be supervised in various ways. In this direction, it also needs to rely on the fusion of data from all parties. When fusing data, a security and compliance solution is required. Privacy computing can play a huge role here. This is the judgment obtained by Databao through practice.

Another direction is the direction of government affairs. There are many policy dividends in this direction. The policy has a clear trend of change with the advancement of marketization. From the earliest time, cognitive data is an effective asset. Simply understand it is a gold mine. In the first stage, the first thing to be solved is the direction of data integration, resolution of isolated islands, application of more scenarios in the future, and aggregation. In the process of aggregation, it actually involves a large number of scenarios including the rapid development of Internet companies and some bad data applications, and many problems will arise in data applications. The country has promulgated several laws in succession. The "three laws" that are often talked about in our industry, whether it is data security, network security or personal privacy, seem to have added a "curse" to our industry after the promulgation of these laws. The market-oriented circulation of data elements must first solve the problem of how to comply with the "three laws". What kind of orientation has the market become? Solve compliance security first, and then take a step forward under compliance security conditions. At this stage, whether it is the 20 data items or the recently released national development plan, its orientation has undergone some changes in the big data economic development plan. It is expected that data elements can maximize their value and create more market opportunities. application. On the premise of creating value, its compliance must be guaranteed. That is, its order has changed somewhat. It was originally expected that we could do a certain amount of market value mining in a safe state, but now it is called "guaranteeing its data security under the premise of being able to amplify its market value more". I will expand on these two changes later one time.

In terms of government affairs, not only the sharing of basic government affairs data, but also the opening of more public data, especially government affairs-related data can be circulated in the market, and how to operate it needs to solve data security and compliance. Regulatory issues, including how the government solves the issues of regulation and information controllability.

Databao has also made some practical explorations in this area. Let’s give an example first. Databao is acting as an agent for state-owned resources. What is the logic of agent operation? Help the data of various national ministries and commissions and state-owned enterprises to realize marketization, and find more market-oriented incremental space. Databao has operated the data of many ministries and commissions, and has done some self-exploration. For example, in the insurance scenario, we rely on the big data related to the dynamics of our own vehicles, which solves the original model of simply relying on static data for risk pricing. I made an active exploration, and increased the risk of many vehicles dynamically, which has a lot to do with their own models and their own attributes. It has a great relationship with the mileage of the vehicle, the fatigue condition, the driving condition, and the frequency and length of his transportation. We use this direction to do some exploration. This case also shows that we are in the direction of small trucks. Small trucks have a relatively large scale. Our data has played a huge role in the introduction, and it has also been verified in enterprise practice. We introduced vehicle dynamic factors and created dynamics. The model combined with dynamic and static also has better results in practice.
insert image description here

We started from the small truck model and gradually extended to the large truck model, including the model for fleet risk assessment. Now we are also trying to explore the application of private car insurance pricing and anti-fraud related models. Combined with our data, we will talk about it later How do we introduce some data security compliance, including privacy computing-related technologies, to solve some of our explorations in joint modeling and market-oriented application scenarios.

This is what I just mentioned. We actually introduced more data sources when modeling auto insurance. First, we need auto insurance claims data, as well as static data of vehicles and dynamic traffic-related data. In the case of dynamic data diversity, each data supplier will have a strong demand for its own data protection. At this time, how to solve the problem of protecting the data sources of all parties and realize our data mining under the premise of protecting the data interests of all parties? Databao has explored a method based on federated learning and applied this method to solve it. The data treasure side introduces traffic static and dynamic data, and the specific technical solutions will not be expanded, because the previous few have introduced more. In this way, we solve the scenario of introducing multi-party data to improve the model.

We also explored another scenario. In the way of fully homomorphic encryption, we modeled all data in the ciphertext state, and achieved phased results. In the profit state, we model the effect and the full plaintext effect, and the maximum deviation does not exceed 7%. This is relatively high technology, but it is relatively high in terms of technical solutions. Here I would like to say one more sentence to pave the way for what I will do later. Our company has a lot of actuaries and algorithm experts. When actuaries and algorithm experts compete with each other, some interesting phenomena occur. The algorithm personnel think that relying on the method of fully homomorphic encryption, or even semi-homomorphic encryption, we have done it Try, the effect made in this state has a relatively low degree of deviation. But from actuarial and market point of view, they think it is unacceptable. why? Because in some scenarios, such as auto insurance scenarios, there is actually a threshold for your final profitability, including the profitability of the first insurance policy. When the deviation exceeds 3 to 5%, this is in a state of negative profit. That is to say, this model is about 7%, which seems to be good in theory, but there are certain shortcomings in actual commercial use. We will try the way of federated learning, including semi-homomorphism, which is the direction we are trying, and we will finally solve the problem of realizing the final commercial value.

This is a scenario of data verification, whether it is to verify various social scenarios of individuals or enterprises, such as the basic information of vehicles, including some basic information of internal vehicles of the enterprise, including various conditions of enterprise transportation capacity, there are a large number of verification scenarios. What we use is what Databao considers as a security domain product, which is a bit similar to TEE. In fact, it is not a pure hardware solution. It is based on commercialization strategies and our supporting cryptographic mechanism to solve the application scenarios of security domains.

Expand a little here. The verification party will introduce more input parameters, and perform encryption operations on the input parameters. Through transmission in the ciphertext state, the data parties including the data treasure in the middle are all in the ciphertext state, and there is no data storage and reprocessing. The application scenario. When entering the data source side for verification, he will also perform matching in the ciphertext state to obtain input information. The middle process of Databao includes enterprises that are actually used in the middle, whether it is a financial company or an Internet company, or even the government. It can realize non-perceived transparent transmission. In order to complete the verification process in the final application terminal experience or end customers, we rely on the solution ideas of the product security domain.

This is about public security, and similar to the scene just now, we actually perform public and private key encryption operations on the querying party, including the unique ID of the input parameter that can achieve information matching. In the ciphertext state, the distribution is realized. Some sharding mechanisms are introduced here, because of data access, including my one-to-many or many-to-one scenarios. In the many-to-one scenario, we will divide the data into pieces and encrypt the ciphertext, which also solves the problem mentioned by the first three that we can use the specific technology in privacy computing to achieve large-scale application scenarios. There is no data leakage, and it is impossible to perceive which one I am looking for, so as to ensure that the final conclusion is consistent.

(end of part one)

Guess you like

Origin blog.csdn.net/Anita_zhang/article/details/130430651