inventory! Achievements of domestic privacy computing scholars at the USENIX Security 2023 Summit

USENIX Security is one of the top four internationally recognized academic conferences in the field of network security and privacy computing, and a Class A conference recommended by CCF (China Computer Federation).

Each year the USENIX Security Symposium brings together researchers, practitioners, system administrators, system programmers, and others interested in the latest advances in computer systems, network security, and privacy.

Recently, at the 2023 USENIX Security Symposium (USENIX Security Symposium-2023), a total of 51 articles related to privacy computing were published, involving federated learning, homomorphic encryption, secure multi-party computing and other privacy computing fields. The top conference in the field of privacy computing, its acceptance rate is below 20% all year round.

At this year's conference, domestic scholars contributed a lot of excellent results. Let's take a brief look at a few popular areas.

1. Distribution of achievements

Outcome area Selected articles Domestic researchers participated in excellent results
Differential Privacy (DP) 3 Fine-grained Poisoning Attack to Local Differential Privacy Protocols for Mean and Variance Estimation (Lead author: Xidian University)
Trusted Execution Environment (TEE) 4 CIPHERH: Automated Detection of Ciphertext Side-channel Vulnerabilities in Cryptographic Implementations (first author: Southern University of Science and Technology, other members are from Hong Kong University of Science and Technology, Ant Group) Controlled Data Races in Enclaves: Attacks and Detection (third author: Southern University of Science and Technology)
Federated Learning (FL) 3 Gradient Obfuscation Gives a False Sense of Security in Federated Learning (Second Author: Zhejiang University)
Zero Knowledge Proof (ZK) 2 TAP: Transparent and Privacy-Preserving Data Services (third author: Southwest University)
Homomorphic Encryption (HE) 3 Squirrel: A Scalable Secure Two-Party Computation Framework for Training Gradient Boosting Decision Tree (all staff from Alibaba Group and Ant Group)
Privacy Set Computing (PSU) 3 Linear Private Set Union from Multi-Query Reverse Private Membership Test (All staff are from the State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shandong University, State Key Laboratory of Encryption Science and Technology, Alibaba Group)
Privacy Data Analysis 8 Lalaine: Measuring and Characterizing Non-Compliance of Apple Privacy Labels (Fourth Author: Alibaba Group Orion Lab)
AI Privacy Enhancement 3 V-CLOAK: Intelligibility-, Naturalness- & Timbre-Preserving Real-Time Voice Anonymization (members from Zhejiang University and Wuhan University)

2. Inventory of popular fields

Note: Due to limited space, this article only lists part of the outstanding achievements of domestic scholars.

2.1 Differential Privacy (DP)

Although Local Differential Privacy (LDP) protects individual users' data from inferences by untrusted data administrators, recent studies have shown that attackers can launch data poisoning attacks from the user end, injecting crafted fake data into the LDP protocol to Distort the data steward's final estimate as much as possible.

In this work, the research team further advances this knowledge by proposing a new fine-grained attack that allows an attacker to fine-tune and simultaneously manipulate mean and variance estimates, a popular analysis task in many real-world applications. . (threat to the security of a large number of real application data in the real world)

To achieve this goal, the attack exploits the characteristics of LDP to inject fake data into the output field of the local LDP instance. It is called Output Poisoning Attack (OPA).

We observe that a small privacy loss enhances the security of LDP, achieving security-privacy consistency, which contradicts known security-privacy tradeoffs from previous work. (Breaking through existing research and discovering new conclusions)

We further investigate the consistency and reveal a more comprehensive view of the threat landscape of LDP data poisoning attacks, fully evaluating newly proposed attacks against a baseline attack that intuitively feeds erroneous inputs to LDP.

Experimental results show that OPA outperforms baselines on three real-world datasets. The researchers also propose a new defense method to recover the accuracy of results from polluted data collection and provide insights into safe LDP design. (While giving the attack method, an effective defense method is provided)

2.2 Trusted Execution Environment (TEE)

Ciphertext side-channel is a new type of side-channel that utilizes deterministic memory encryption of Trusted Execution Environment (TEE). It enables an adversary to logically or physically read the encrypted memory's ciphertext to break the encrypted implementation protected by the TEE with high fidelity.

Previous research concluded that ciphertext side-channels are not only effective for AMD SEV-SNP, where the vulnerability was first discovered, but also a serious threat to all TEEs with deterministic memory encryption.

In this paper, the researchers propose CIPHERH, a practical framework for automatically analyzing cryptographic software and detecting program points vulnerable to ciphertext side-channel attacks. (Propose an automated side-channel attack detection framework)

CIPHERH is designed to perform practical hybrid analysis in production cryptographic software, with fast dynamic taint analysis to track the use of secrets throughout the program, and static symbolic execution for each "tainted" function, using symbolic constraints to Inferred ciphertext side channel vulnerability.

Through empirical evaluation, more than 200 vulnerable program points are found in state-of-the-art RSA and ECDSA/EDCH implementations from OpenSSL, MbedTLS, and WolfSSL. Representative cases have been reported to and confirmed or patched by the developer. (A large number of attack risks are found in actual products)

2.3 Federated Learning (FL)

Federated learning has been proposed as a privacy-preserving machine learning framework that enables multiple clients to collaborate without sharing raw data. However, the design in this framework does not guarantee client-side privacy protection.

Previous work has shown that gradient sharing strategies in federated learning may be vulnerable to data reconstruction attacks. However, in practice, clients may not send raw gradients considering the high communication cost or due to privacy enhancement requirements.

Empirical studies show that gradient obfuscation, including intentional obfuscation via gradient noise injection and unintentional obfuscation via gradient compression, can provide more privacy protection against reconstruction attacks.

In this work, the researchers propose a novel reconstruction attack framework for the image classification task in federated learning. We show that commonly used gradient post-processing procedures, such as gradient quantization, gradient sparsification, and gradient perturbation, can give a false sense of security in federated learning. (It proves that the existing protective measures fail)

Contrary to previous studies, we argue that privacy enhancement should not be considered as a by-product of gradient compression. Furthermore, the researchers devise a new method to reconstruct images at the semantic level under the proposed framework. We quantify semantic privacy leakage and compare it with traditional image similarity scores. Our comparison challenges image data leakage assessment schemes in the literature. The findings highlight the importance of revisiting and redesigning client-side data privacy protection mechanisms in existing federated learning algorithms. (Highlights the importance of designing new data privacy protection mechanisms for federated learning clients)

2.4 Zero Knowledge Proof (ZK)

Today, users expect greater security from the services that process their data. In addition to traditional data privacy and integrity requirements, they also expect transparency, that is, the processing of data by services can be verified by users and trusted auditors. The researchers' goal is to build a multi-user system that provides data privacy, integrity, and transparency for a large number of operations while achieving realistic performance. (Emphasis on the User Transparency Property of Privacy Computing)

To do so, the researchers first identified the limitations of existing methods using authenticated data structures. The researchers found that they fall into two categories:

  1. Operations that hide per-user data from other users, but have a limited scope of verifiable operations (e.g. CONIKS, Merkle2, and Proof of Responsibility)
  2. Supports a wide range of verifiable operations, but makes all data publicly visible (like IntegridDB and FalconDB)

Then, the researchers proposed TAP to address the above limitations. A key component of TAP is a novel tree data structure that supports efficient result verification and relies on independent audits using zero-knowledge range proofs to show that the tree was built correctly without revealing user data.

TAP supports a wide range of verifiable operations, including quantiles and sample standard deviations. The researchers conducted a comprehensive evaluation of TAP and compared it with two state-of-the-art baselines (namely, IntegridDB and Merkle2), showing that the system is practical at scale. (compared to the state-of-the-art baseline model, the proposed scheme is feasible)

2.5 Homomorphic Encryption (HE)

Gradient boosted decision trees (GBDT) and its variants are widely used in industry due to their strong interpretability. Secure multi-party computation allows multiple data owners to jointly compute a function while keeping their inputs private.

In this work, the research team proposes Squirrel, a two-party GBDT training framework based on vertically split datasets, where two data owners each have different features of the same data sample. Squirrel is kept secret from semi-honest opponents and does not reveal any sensitive intermediate information during training.

Squirrel also scales to datasets with millions of samples, even over a Wide Area Network (WAN). (Support million-sample model training under WAN)

Squirrel achieves its high performance through several novel joint designs of the GBDT algorithm and advanced cryptography.

  1. Propose a new efficient mechanism to use oblivious transfer to hide the sample distribution on each node (new efficient node distribution hiding mechanism)

  2. We propose a highly optimized method for gradient aggregation using lattice-based homomorphic encryption (HE), which is three orders of magnitude faster than existing homomorphic computation methods. (many orders of magnitude better than existing methods)

  3. A new protocol is proposed to evaluate the sigmoid function on secret shared values, which improves 19-200 times over the two existing methods.

Combining all these improvements, Squirrel costs less than 6 seconds per tree on a dataset with 50k samples, more than 28x higher than Pivot (VLDB 2020). The researchers also show that Squirrel can scale to datasets with over a million samples, for example, in about 90 seconds per tree over a WAN. (Complete the generation of a single decision tree for a million-sample data set within 2 minutes)

2.6 Private Set Computing (PSU)

  • Paper name: Linear Private Set Union from Multi-Query Reverse Private Membership Test
  • 英文:Linear Private Set Union in Multi-Query Reverse Private Membership Test
  • Domestic researchers: All members are from the State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shandong University, State Key Laboratory of Cryptography Science and Technology, Alibaba Group

The Private Set Union (PSU) protocol enables two parties (each holding a set) to compute the union of their sets without revealing any other information to either party.

So far, there are two known ways to construct the PSU protocol.

  • The first: relies primarily on additive homomorphic encryption (AHE), which is generally inefficient because it requires performing a non-constant number of homomorphic computations per item.
  • The second: Mainly based on forgetting transfer and symmetric key operations recently proposed by Kolesnikov et al. (ASIACRYPT 2019).

The second has good practical performance and is orders of magnitude faster than the first. However, neither of these approaches is optimal because neither of their computational and communication complexity is O(n), where n is the size of the set. Therefore, the question of constructing an optimal PSU protocol remains an open problem. (Although there are some solutions, they will bring a lot of overhead with the expansion of the collection, which does not meet the actual needs)

In this work, the research team addresses this open problem by proposing a general framework for PSUs from oblivious transfers and a newly introduced protocol called Multi-Query Reverse Private Membership Test (mq-RPMT). The researchers proposed two general constructions of mq-RPMT.

  • The first type: based on symmetric key encryption and general 2PC technology.
  • The second: based on rerandomizable public key encryption.

Both structures result in a PSU with linear computational and communication complexity. (A solution with linear complexity is designed to break through the original complexity limit)

The research team implemented two PSU protocols and compared them with state-of-the-art PSUs. Experiments show that the research team's PKE-based protocol has the lowest communication capability among all schemes, and the communication capability is reduced by a factor of 3.7−14.8 depending on the set size. Depending on the network environment, the running time of the research team's PSU scheme is 1.2−12 times faster than the state-of-the-art scheme. (Experiments have shown that the communication capability is increased by up to 14 times, and the running time is 12 times faster)

Three, finally

Congratulations to the above-mentioned domestic research team for "showing their faces" at the Privacy Computing Summit. It also shows that privacy computing technology is gradually moving from theoretical methods to practical applications. Making full use of privacy computing technology can meet the needs of privacy protection while complying with the development of data element circulation trends to help the healthy development of the digital economy. China's academia and industry will continue to cooperate in depth to create better high-quality privacy computing products, realize the safe circulation of data "available but not visible, usable but not stored, controllable and measurable", and help the healthy development of digital China.

PrimiHub is an open source privacy computing platform built by a team of cryptography experts. We focus on sharing technologies and content in privacy computing fields such as data security, cryptography, federated learning, and homomorphic encryption.

Graduates of the National People’s University stole the information of all students in the school to build a beauty scoring website, and have been criminally detained. The new Windows version of QQ based on the NT architecture is officially released. The United States will restrict China’s use of Amazon, Microsoft and other cloud services that provide training AI models . Open source projects announced to stop function development LeaferJS , the highest-paid technical position in 2023, released: Visual Studio Code 1.80, an open source and powerful 2D graphics library , supports terminal image functions . The number of Threads registrations has exceeded 30 million. "Change" deepin adopts Asahi Linux to adapt to Apple M1 database ranking in July: Oracle surges, opening up the score again
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/6662337/blog/10086702