"Building web applications on top of encrypted data using Mylar" paper notes

"Building web applications on top of encrypted data using Mylar" paper notes

1. Overview of the paper

Web applications rely on servers to store and process confidential information. This article introduces Mylar , a platform for building web applications that protects data confidentiality and prevents attackers from gaining full access to the server. Mylar stores encrypted sensitive data on the server and decrypts that data only in the user's browser .

Mylar mainly solves three problems:

①Mylar allows the server to perform keyword searches on encrypted documents, even if the documents are encrypted with a different key.
②Mylar allows users to securely share keys and encrypt data in the presence of active adversaries.
③Finally, Mylar ensures that the client application's code is authentic even if the server is malicious.

Experiments show that the results of building a Mylar prototype on the Meteor framework are promising: porting 6 applications requires only an average of 36 lines of code changes, and the performance overhead is modest, equivalent to 17% when sending messages in a chat application Throughput loss and 50ms latency increase.

2. Paper Background

To protect confidential data on a web application requires users to trust the server to protect data from unauthorized disclosure. However, this trust is often risky because there are many situations where confidential data can leak from the server. For example, attackers could exploit bugs in server software to break in, curious administrators could snoop on data on the server, or server administrators could be legally compelled to leak data.

One approach that seems to work is to assign each user their own encryption key , use that user's key to encrypt the user's data in the web browser, and store only the encrypted data on the server. This model ensures that attackers cannot read any confidential data on the server because they lack the corresponding decryption key. This model has also been adopted by some web applications with privacy protection function, but this approach has three significant flaws:

① First of all, in terms of security, the compromised server can provide malicious client code to the browser and extract the user's key and data. Because a web application consists of many files, such as HTML pages, Javascript code, and CSS style sheets, and HTML pages are usually dynamically generated, it is difficult to ensure that the server has not tampered with the application's code.

②Secondly, in terms of functionality, this method cannot provide data sharing between users. To solve this problem, it can be considered to encrypt the shared document with a separate key and distribute each key to all users who share the document through the server. user. However, the security of key distribution via servers is also not guaranteed, since a compromised server could provide arbitrary keys to users and thus trick users into using incorrect keys.

③Third, in terms of efficiency: This approach requires all application logic to run in the user's web browser, because it can decrypt the user's encrypted data. But this is often impractical, since doing a keyword search would require downloading all documents to the browser.

So this article introduces Mylar, a new platform for building web applications that only stores encrypted data on the server . The authors also demonstrate that Mylar is suitable for many classes of applications to protect confidential data from compromised servers. It takes the form of web application frameworks, where logic is implemented in client-side Javascript code and data is sent over the wire instead of HTML. This guarantees:

In terms of data sharing: In order to prevent the server from cheating during the key distribution process, Mylar proves the public key by generating a certificate path , and allows the application to specify a trusted certificate path every time the context is used. Combined with a user interface component to display the corresponding certificate user, this technique ensures that the server cannot collude with the application to use the wrong key.

Computation over encrypted data: Mylar provides the first encryption scheme to efficiently perform keyword searches on data encrypted with different keys . The client provides the encrypted word to the server, and the server can return all documents containing that word without learning the word or the content of the documents.

Authenticating application code: With Mylar, code running in a web browser can access the user's decrypted data and keys, but the code itself comes from an untrusted server. To ensure this code has not been tampered with, Mylar checks that the code is properly signed by the website owner . This check is possible because the application code and data are separated in Mylar , so the code is static. Mylar uses two sources to simplify code validation for web applications. The main origin only hosts the application's top-level HTML page, whose signature is verified using the public key in the server's X.509 certificate. All other files come from secondary sources, so if they are loaded as top-level pages, they cannot be accessed from the primary source. Mylar verifies the hashes of these files against the expected hashes contained in the top-level page.

3. Mylar architecture

There are three roles in Mylar: User, Site Owner and Server Operator. Mylar's goal is to help website owners protect users' confidential data in the face of malicious or compromised server operators.

The structure of Mylar is shown in the figure. Mylar consists of the following four components:
insert image description here
Browser extension : It is responsible for verifying that the client code of the web application loaded from the server has not been tampered with.

**Client-side library: **Client library. It intercepts and encrypts or decrypts data sent to and from the server. Each user has a public-private key pair. The client library stores the user's private key on the server, encrypted with the user's password. When a user logs in, the client library obtains and decrypts the user's private key. For shared data, Mylar's clients create separate keys, which are also stored in encrypted form on the server.

**Server-side library:** It performs calculations on encrypted data on the server.

**Identity provider (IDP):** For some applications, Mylar requires a trusted identity provider service (IDP) to verify that a given public key belongs to a specific username. If the application has no trusted method to authenticate the user who created the account, and the application allows the user to choose who to share data with, the application needs an IDP. (Note: The IDP does not store per-app state, and Mylar only contacts the IDP the first time a user creates an account in the app. After that, the app server stores the certificate from the IDP.)

Mylar is deployed on a web application as follows:
First, developers use Mylar's authentication library for user login and account creation. If the application allows the user to choose which other users to share data with, the developer should also specify the URL and public key of the trusted IDP.

Second, the developer specifies which data in the application should be encrypted and who should have access to it. Principals correspond to public/private key pairs and represent application-layer access control entities such as users, groups, or shared documents. In our prototype, all data is stored in MongoDB collections, and developers use fieldsets that contain confidential data and the name of the principal that should access that data (that is, which key should be used).

Third, the developer specifies which principals in the application can access which other principals. For example, if Alice wants to invite Bob to a secret chat, the application must call the Mylar client to grant Bob's principal access to the chat room principal.

Fourth, developers change their server-side code to call the Mylar server-side library when performing keyword searches. Our prototype's client library provides functionality for common operations, such as keyword searches on specific fields in a MongoDB collection.

Finally, as part of installing the web application, the site owner generates a public/private key pair and uses Mylar's bundled tools to sign the application's files with the private key. The web application must be hosted using https, and the site owner's public key must be stored in the web server's X.509 certificate. This ensures that even if the server is compromised, Mylar's browser extension will know the site owner's public key and refuse to load if the client code has been tampered with.

The implementation process of Mylar:
① First, it verifies the application code running in the browser so that the client code can safely access the key and plaintext data.
②The client code then encrypts the data marked as sensitive before sending it to the server. Since users need to share data, Mylar provides a mechanism to securely share and find keys between users.
③ Finally, to perform server-side processing, Mylar introduces a new encryption scheme that can perform keyword searches on documents encrypted with many different keys without revealing the content of the encrypted document or the word being searched for.
The following will be expanded according to this process:

4. Sharing data between users

In Mylar's threat model, the application cannot trust the server to enforce sharing policies because the server is assumed to be vulnerable. Therefore, applications must encrypt shared data with a key that only the correct user can access.

Each principal has a name chosen by the application, a public key to encrypt that principal's data, and a private key to decrypt that principal's data.

In addition to allowing applications to create principals and use the principal's key to encrypt and decrypt data, Mylar provides two key operations for applications to manage principals:

① Find a principal so that the application can decrypt the data with the corresponding private key. The purpose is to ensure that only authorized users have access to the corresponding private key.
②Find a principal so that the application can use the corresponding public key to encrypt data or share data with other users. The goal is to ensure that a malicious server cannot trick Mylar into returning the wrong public key, which could cause the app to share confidential data with an adversary.

Mylar achieves the above goals cryptographically by forming two graphs on top of the subject:
① Access graph (Access graph), which uses the key chain to distribute the private key of the shared subject to users.
②A certification graph (Certification graph), which uses a certificate chain to prove the mapping between the subject name and its public key.

Access graph

For example, if subject A has access to subject B's private key, then we say that A has access to B. If B also has access to C, then A also has access to C's private key. To represent an application's policy in an access graph, the application must create appropriate access rights to the relationships between principals. Applications can also create intermediate principals to represent (say) groups of users that all users should have access to the same private key.

Here Mylar uses the same keychain as CryptDB.
insert image description here
For example, in the diagram above, the private key for the "party" chat room is encrypted under Alice's public key, and also separately encrypted under Bob's public key. The server stores these encrypted keys. Mylar can generate the keys needed for different operations for users through the API in Figure 2.

Certification graph

Mylar applications must look up a principal's public key when sharing data, for two main purposes: to encrypt data with that key, or to give a certain principal access to that key. In both cases, if the server under attack tricks the client application into using the adversary's public key, the adversary will gain access to confidential data. For example, in the chat example, suppose Bob wants to send a confidential message to the "work" chat room. If the server provides the chat room principal with the attacker's public key, and it is used at the application client, the attacker will be able to decrypt the messages. Preventing this type of attack is difficult because all encrypted keys are stored on the server, which could be malicious.

To prevent this attack, Mylar relies on an authentication graph, which allows one subject to vouch for another subject's name and public key. The application creates a certificate chain for the principal, rooted at the authority principal. For example, in the chat example, an application could sign the "chatroom:work" principal with the key of the "user:boss" principal that created the chat room. Using a certificate graph, applications can look up a principal's public key by specifying the name of the principal they want to look up along the certificate chain they want to look up.

5. Data Integrity

For data integrity, all encrypted data is verified by MAC. Mylar does not guarantee the freshness of data or the correctness of query results. For example, in a chat room application, each message has several fields, including the message body and a (client-generated) timestamp. By putting these two fields in the authentication set, the developer can ensure that an attacker cannot concatenate the body of one message with the timestamp of another. An attacker can roll back the entire authentication set to an earlier version without being detected, but not a subset of the authentication set.

6. Computation of encrypted data

Web applications often have many users, resulting in data encrypted with many different keys. Existing efficient encryption schemes for computing on encrypted data, such as keyword search, assume that all data is encrypted with a single key. Using such a scheme in Mylar would require computing keys one at a time, which is inefficient. Mylar introduces a multi-key search solution to solve this problem.

If a user wants to search for a word in a set of documents on the server, where each document is encrypted with a different key. The client only needs to provide the server with a single search token for that word. In turn, the server returns each encrypted document containing the user's keywords, as long as the user has access to the document's key.

The pseudocode of Mylar multi-key search is as follows:
insert image description here

But one efficiency problem with this algorithm is that the server must scan every word of every document to identify a match. This can be slow if the document is large, but again unavoidable if the encryption of each word is randomized with a different r.

To be able to construct efficient indexes on words in searchable documents, Mylar supports an indexable version of this multi-key search scheme. The specific idea is to eliminate randomness without compromising security. Intuitively, randomness is needed to hide whether two words encrypted under the same key are equal. But for words in a document, Mylar can remove duplicate words when the document is encrypted, so there is no need for randomness for each word in the same document.

So, to encrypt a document consisting of words w1 , ...,wn , the client removes duplicates, chooses a random value R, and then uses the same R when encrypting each word with enc().

When searching for the word w in the document, the server performs the adjustment as before and gets the atk. Then, it computes v←combine(r,atk)=<r,H 2 (r,atk)> using the randomness r of the documents. If a word in the document is W, its encryption will be equal to V, since they use the same randomness R. Thus, the server can perform a direct equality check on encrypted words. This means it can build an index (e.g., a hash table) on encrypted words in a document, then use that index and v to determine if there is a match in constant time without scanning the document.

One limitation is that the server must use a unique key index for each key, rather than an overall index.

The flow of the above algorithm is as follows:
① When creating the subject P, Mylar uses KeyGen to generate the key k p . Whenever P receives access to some new principal A, Mylar includes K A in P's wrapping key. ②When a user with access to p is online for the first time, the Mylar client in the user's browser retrieves k A
from the wrapped key, calculates δ kp→kA ←delta(k p ,k A ), and converts it stored on the server. For a pair of subjects, this delta calculation happens only once.

③ In order to encrypt a document for a subject A, the user's browser uses ENC(k A , w) to encrypt each word w in the document respectively. Since multi-key search schemes do not support decryption, Mylar encrypts all searchable documents twice: once for searching using a multi-key search scheme, and once for decryption using a traditional encryption scheme such as AES.
④ To search for a word w with subject P, the user's client computes a token tk using TOKEN(k p , w), and sends it to the server. In order to search for the encrypted data of subject A, the server obtains δk p →k A , and uses adjust(tk,δk p →k A ) to adjust the token from k p to k A , and obtains the adjusted token atk A . Then, for each document encrypted under k A with randomness r , the server computes v←combine(r, atk A ), and uses the index to check whether v exists in the document. The server repeats the same process for all other principals that p has access to.

Guess you like

Origin blog.csdn.net/qq_45764888/article/details/130186027