Why is it "unbreakable for a long time" when it comes to privacy computing engineering?

Talk is cheap,show me the code!

This is a widely circulated saying in the programming world. Empty talk is useless, and only the code can be used to see the truth. What is needed is "real and practical".

And this sentence seems to reflect the change of water temperature in the field of privacy computing. From 2020, when data was officially incorporated into the production factors, privacy computing has ushered in its east wind like an arrow of tension. Under the hot effect, it has attracted many runners and entrants, and Gartner has listed it for two consecutive years Privacy computing is listed as one of the annual strategic technology trends, and for a while, privacy computing has been given a lot of aura.

On the one hand, there are high hopes, but on the other hand, there are hidden worries hidden behind the excitement. The development of privacy computing does not seem to meet our expectations for it.

Gao Shengtan (pseudonym), the business leader of a large state-owned financial institution, told Hashpower Think Tank: We do have a demand for privacy computing products, and we intend to prepare for procurement. However, the test results are not satisfactory: many privacy computing products do not have the actual engineering capabilities such as personalized modeling.

This is not a statement observed by the computing power think tank. Yan Shu, deputy director of the Big Data Department of the Cloud Computing and Big Data Research Institute of the China Academy of Information and Communications Technology, said: At present, privacy computing technologies and solutions are not mature enough. In terms of security, performance and data interconnection There are still challenges in interoperability and other aspects, and the ability to implement scenarios and engineering problems are "big difficulties". At the Privacy Computing All-in-One Salon held by the computing power think tank a few days ago, Zhou Yongming, product director of China Unicom Big Data Financial Industry Center, also believed that: In the past two years, all the privacy computing has seen are single-point, tentative, and experimental progress. It is time to truly reach the commercial level and scale up. This is what I want to see more, and it is also the direction that the industry needs to work harder.

Obviously, after two years, the market attitude has begun to become prudent and restrained, and the demands of the demand side for privacy computing technology are no longer easy to be coerced by the wind. Privacy computing technology is facing a critical point, that is, whether it can move from an innovative experimental technology to mass production and commercial use.

1. Dark Clouds Over Privacy Computing

On the crossing of this critical point, Youdao’s unbreakable hurdle is the implementation of industrial-level engineering, just like "the short board of the wooden barrel restricts the overall situation." A senior person in the industry said frankly.

In the aforementioned interview with the computing power think tank, the ability to implement engineering is becoming a "high-frequency" word in the industry. The "2022 China Privacy Computing Technology and Market Development Research Report" released by CB Insights China this year also pointed out that in the future, privacy The engineering capabilities of computing companies will become the focus of the industry.

What is engineering capability? This term may be familiar, but the industry has yet to see a clear definition of it.

The concept of "engineering capability" is a "table", and its "inside" includes many aspects and dimensions. **I think privacy computing engineering capabilities refer to the ability to transform privacy computing products from theory and prototype to real implementation to the customer side to generate business value. ** In the past few years, many companies in the industry may have accumulated a lot of theory, product prototypes, and open source standardization, but I think it only started last year when it comes to actual business value on the customer side. This ability will become particularly important at this year's node. Whether it can do a good job in product support for customers, including system delivery capabilities, operation and maintenance capabilities, and upgrade capabilities, is an important test for privacy computing companies. Ant Group can Xinyuan Technology Director Qin Chenggang said in an interview.

Gao Shengtan also believes that the popular explanation of engineering capabilities is: whether privacy computing products have the ability to directly enter the bank’s production environment , at least several aspects should be included, the first is compatibility, if you want to design hardware, the hardware solution can It cannot be compatible with the existing software and hardware equipment of financial institutions. The second is product practicability and stability. Whether the design software is stable and reliable, and whether it can support large-scale data throughput has not yet undergone large-scale commercial verification. Then there is whether the compliant data sources that have been connected can meet the business needs of financial institutions. Currently, under the impact of the Personal Information Protection Law, the third-party data market is facing reconstruction. It is rare for compliant data sources to achieve data value output through privacy calculations. .

But unfortunately, at this stage, the industry's privacy computing engineering capabilities are still generally in the early stage. Just as privacy computing started from the origin of secret sharing proposed by Shamir and Blakley in 1979, "engineering capabilities" are like floating in privacy computing. The dark cloud over the building has been around for a long time, but it has not dissipated for a long time.

2. "Loss" of engineering capabilities

Why is engineering capability "lost"?

From Gaoshengtan’s point of view, the first is the level of productization. Based on current observations, the productization capabilities of the entire privacy computing industry are uneven, and it is still in its infancy. Taking financial business scenarios as an example, the main manifestations are as follows: First, the product is not easy to use. Taking data cleaning as an example, most products do not support personalized cleaning or do not have such functions. It is a one-click The idiot-style generation model, put in 3,000 variable labels, and only less than ten variables are left in the model through the rules, making the model basically unusable. The second is the issue of product stability, that is, whether the production availability of the privacy computing platform is still guaranteed when faced with hundreds of millions of samples or even larger data levels. At present, on the side of privacy computing technology, regardless of MPC or FL (Federal Learning), bottlenecks in computing power and network transmission can be foreseen. At this stage, privacy computing is mainly performed within some institutions or between two or three parties. In the application of time, the amount of data processed is small, and this problem is not obvious. However, in the future, the arrival of multi-party data exchange requirements and the rapid increase in the amount of data brought about by the development of 5G and the Internet of Things. With the explosive growth of data volume, large-scale applications of privacy computing will be impossible without solving the problems of computing power and communication . talk about.

At this point, many privacy computing manufacturers have also had a deep understanding. Qin Chenggang, Director of Trustworthy Native Technology of Ant Group, also said frankly: After entering the field of privacy computing, we found that for cryptography, no matter what kind of cryptography today is facing challenges. The same problem is that when these cryptography operations are performed on general-purpose processors, the speed is very slow, whether it is zero-knowledge proof or obfuscation circuit, or homomorphic encryption. When doing research on homomorphic encryption before, I have a basic understanding. In the worst case, homomorphic encryption is about 100,000 times slower than plaintext operations. What is the concept of 100,000 times? It is equivalent to bringing the latest Intel Ice Lake processor back to the 8086 processor era decades ago.

"Imperfect" software is of course an objective bug, but the limitations of privacy computing vendors are also the cause.

In all fairness, many current privacy computing vendors generally lack a global perspective. Talking loudly and bluntly is like "what customers want is a dish, and it is not enough to just provide the plate, but the whole dish needs to be fried and served on the table." OK". To B business is to provide a "capability Lego", that is, a complete set of data solutions, rather than a single point product, such as whether it can be connected to available data sources. In most financial scenarios, external data sources need to be introduced, especially in In the process of data modeling, the demand for data is even greater, and it is often not enough to only provide privacy products. Another example is whether the compliance design of the data solution is considered, whether the data authorization chain is complete, whether the customer notification is sufficient, whether the data storage needs to be managed according to the requirements of the "Personal Information Protection Law" for the entire life cycle, etc., which involves product The adjustment of the front-end interface and the legal compliance clauses that need to be implemented in the process of interacting with the data source.

Secondly, the lack of understanding of the scene is also the lack of "innate genes" of many privacy computing companies. At this stage, most of the main players entering the privacy computing track are start-up companies, mostly with technical backgrounds, including some from AI companies or regions. Blockchain companies that have transformed are rarely rooted in the front line of business scenarios. Insufficient understanding of scenarios leads to insufficient product practicability. This is a chain-link relationship. Another pain point that has to be mentioned is the integration and interconnection between technical paths. Although from a technical point of view, each technical route of privacy computing has its own self-certifying logic, for end customers, this The establishment of trust in a technology cannot rely solely on the technical demonstration itself. Do a good job in technical standardization, realize independent and controllable localization of technology stacks, and establish a standard system recognized by authorities and regulatory agencies, which are also issues that privacy computing manufacturers need to promote and solve. Of course, this cannot be imposed on the privacy computing manufacturers alone, and requires the joint governance and promotion of multiple parties in the industry.

In addition, the insufficient enthusiasm of the data source also restricts the implementation of the industry to a certain extent. Due to the lack of incentives, the data source is unwilling to cooperate with the demand side to provide test data in a time-consuming and laborious manner. The willingness to share sources is even weaker. These problems are intertwined to create the predicament of today's engineering. Before these problems are resolved, the entire industry does not have the prerequisites for real production and engineering.

So what is the solution to the "difficulty of engineering"?

In the research of the computing power think tank, it was found that a widely agreed problem-solving direction is emerging.

3. The combination of soft and hard may break the "engineering difficulty"

The problem of engineering is actually how to enable customers to optimize performance and cost to the greatest extent. Wang Shengli, president of Huakong Qingjiao, summed up in a word in the privacy salon of the computing power think tank: the cost is too high, and the performance should be improved through engineering. Significantly improved to strike a balance between performance-to-performance ratio and computing cost.

How to balance it? The industry began to turn its attention to the combination of software and hardware.

In fact, it can be found that a few years ago, the industry seemed to have a higher voice for software solutions such as cryptography. Cryptography was once regarded as the "fundamentalism" of privacy computing, and even a potential chain of contempt could be vaguely seen. Software is better than hardware development, but now the trend has changed. After several years of practice tests, software is not the optimal solution in terms of "computing performance".

"We believe that privacy computing will depend on hardware in terms of trusted security, trusted execution environment TEE, and computing power acceleration in the future. The combination of software and hardware will definitely be an important technical field of trusted privacy computing, and it will definitely It is a mainstream form of the industry in the data-intensive era, and we also started to make relevant layouts very early. In fact, today, it also proves that our judgment at that time is almost correct,” Qin Chenggang said. In September 2021, Ant Group was the first to release an all-in-one computer with software and hardware, and recently took the lead in promoting the establishment of the world's first international standard project for an all-in-one privacy computer, which indeed confirms this point.

Ant is not the only one who realizes the necessity of hardware. At present, a group of start-up companies in the industry have successively entered the track of combining hardware and software in privacy computing, and each of them is trying to make breakthroughs at a single point. For example, Clustar focuses on the improvement of high-performance computing power, and has launched a privacy computing accelerator card and an all-in-one machine with software and hardware. Rongshu Lianzhi is aiming at chip research and development, and Data Science and Technology has launched a privacy computing solution that combines software and hardware. According to Data Science and Technology , by combining the pluggable features of the hardware, it is possible to flexibly transform a general-purpose server into a dedicated server for privacy computing, thereby improving the utilization efficiency of server resources and computing power, and further reducing computing power. resource overhead.

In the continuous exploration of the combination of software and hardware, the privacy computing industry has found a common entry point, that is, the all-in-one machine. The all-in-one machine is not a new concept. It can be divided into two types. One is the engineering equipment of software and hardware, the so-called Engineered System, which is mainly based on software and combined with hardware acceleration. Through engineering work, the advantages of software are maximized, but it has no effect on hardware. There are too many irreplaceable dependencies, which are also in line with the basic characteristics of IT, based on openness and compatibility. The other is what we call Fusion, which is to physically combine various hardware, or to integrate hardware through some kind of resource management software. The former is mainly based on software genes and does not rely on a proprietary hardware device. It is open, easy to use and maintain, and easy to be widely used. Relatively speaking, the latter will rely on some kind of proprietary hardware equipment, and the threshold for use is high, forming certain technical barriers. PEC (Privacy Enhancement Computation) chooses the delivery form of the engineering integration system to achieve a commercial balance between system performance and total cost of ownership, and brings a lot of convenience to customers for later maintenance and management. If the performance cost is not optimized, it will be difficult to use it in business scenarios without compatible and open IT technologies.

Talk loudly and judge: Adding hardware solutions is beneficial and harmless to the long-term development of the industry. The competition of pure software will easily lead to a red ocean, resulting in no bottom line for the industry. Last year, there was a sale of source code for 100,000 yuan. Vicious cases, if this is the case, how can this industry play? Therefore, after adding hardware, it will help solidify some profit margins, which is of long-term development significance for the privacy industry. In addition, from the perspective of Party A’s needs, we also hope to see a hardware-based implementation plan. As I said earlier, during the bidding process, Party A will face decision-making risks and migration costs. If there is hardware For the base, at least there is a deliverable. Unlike a pure software solution, if the manufacturer goes bankrupt, we cannot continue to maintain it, and we don’t understand the underlying cryptographic calculations. If it is a general-purpose hardware base solution, if we introduce other manufacturers, the migration cost will be relatively small. From these aspects, the direction of the industry is an all-in-one machine, which is also a solution that many manufacturers have unanimously recognized.

Liu Yao, CEO of Impulse Online, also believed that "the all-in-one machine combining software and hardware at this point in time is the core key to solve the large-scale commercialization of the privacy computing industry. Firstly, it is oriented to business systems. Adaptation bottlenecks, privacy computing-related products, software algorithms, or hardware cannot stand as new isolated islands in the customer's original system, and the migration and docking of the customer's original modeling platform and data center are also very important. Secondly, what privacy computing manufacturers need to solve most at present is the differentiation of underlying hardware devices in different environments. To solve the two major problems of adaptability and compatibility, through the combination of software and hardware, seamlessly adapt to various Various business systems and data governance processes. Downward, open up a variety of different chips and accelerator cards to form an integrated ability to combine software and hardware." Liu Yao said that a lot of work is also required to open up different hardware ecologies When it comes to hardware, privacy computing manufacturers need to pay more attention to adapting to the Xinchuang environment. Only by working together on both software and hardware and continuously tackling key problems can we ensure that all-in-one products can be deployed with one click for all kinds of customers, and improve the breadth and depth of all-in-one applications.

But the combination of soft and hard is not a smooth path at present.

It should be noted that the combination of software and hardware and the all-in-one machine are not equated. The all-in-one machine is a carrier of the combination of software and hardware. Although it can effectively alleviate some difficulties in the implementation and delivery of privacy computing engineering, it can With the increasing data scale and business complexity in the future, the all-in-one machine needs to be further improved in scalability and expanded into richer product forms, Qin Chenggang emphasized.

Combining software and hardware, no matter in terms of the cycle or the cost paid for it, is time-consuming and resource-intensive. For example, we have invested a lot of manpower in the early stage of self-controllable TEE, and it took about 2 years to complete it. The final product takes shape. Moreover, the delivery of software is different from the delivery of hardware. The delivery of hardware will involve a very long supply chain, which needs to be well managed at the same time. This also directly leads to the existence of "barriers" in the entire hardware field. Some small and medium-sized Businesses may be blocked. In fact, I think not all privacy computing companies need to make hardware. For example, some companies are suitable for software and cryptography algorithms, and some companies are suitable for hardware. Everyone can find their own positioning and form a good division of labor. For the entire industry, development will be faster and more efficient. Qin Chenggang continued to express.

Throughout the historical development of the Internet, any technology that goes deeper will enter the hardware field. However, under the increasingly complicated international situation and the delicate global competition and confrontation pattern, "de-IOE" (IBM's small Computers, Oracle databases, EMC storage devices) have become a strategic context. In 2020, the country will start to comprehensively promote the Xinchuang industry, which involves the domestic substitution of chips and servers. In terms of the combination of software and hardware for privacy computing, it is also necessary to plan ahead and seek The back road of "domestic independent and controllable" chips is the most difficult part of computing power improvement, and it will also be an opportunity to break through in the second half of privacy computing.

Guess you like

Origin blog.csdn.net/weixin_45413034/article/details/124095935