This article talks about the encryption subsystem of Linux Kernel [Crypto Subsystem]

PREFACE

Linux cryptography algorithm can be divided into two layers

  • User space layer implementation
  • Kernel space layer implementation
    in user space If you want to use cryptographic algorithms, you only need to install and execute packages such as openssl. However, the software operation of encryption and decryption is time-consuming and laborious to implement in user space, which is not suitable for embedded Device. Therefore, we can optimize the performance of encryption and decryption through the assistance of kernel space/hardware, and reduce the CPU computing burden.

This article mainly introduces the implementation process of Linux Kernel space cryptography algorithms, and does not cover the operation of pure applications in user space, but it will not discuss each encryption and decryption algorithm in detail, because there are too many wonderful contents on the Internet.

Using the kernel space cryptography algorithm on the Linux user space application, the detailed operation process can be divided into the following three main steps:

1. Kernel space (Kernel Space Cryptographic Implementation)

In the implementation of Kernel space cryptographic algorithms, it is mainly divided into software and hardware operations

  • Software calculation (Software calculation)
    is performed by the CPU for cryptographic algorithm calculations, no additional hardware is required, but it consumes a lot of CPU performance. The source code of Linux Kernel is located under the crypto subsystem
  • Hardware acceleration (Hardware component)
    is assisted by hardware to perform cryptographic algorithm operations (offloading), which does not consume CPU performance, but requires additional hardware.

SoC Component – ​​Many ARM SoC manufacturers will put hardware encryption and decryption components into SoC. The source code of Linux Kernel is mostly located under drivers/crypto. And the design must comply with the Linux crypto framework and cannot be modified privately.

TPM – a high-security hardware security chip specially designed to protect keys and cryptographic operations. The source code of Linux Kernel is located under drivers/char/tpm.

In addition, Intel has launched CPU instructions-Intel® AES NI [9]. This may also be regarded as a kind of hardware acceleration.

2. Crypto API–User space interface

The main function is to provide an interface so that the user space can access the kernel space. Currently, the mainstream ones are cryptodev and af_alg

  • CRYPTODEV [12]
    is not in the Linux Kernel, it needs to download, compile and mount the kernel module additionally
  • Use the ioctl interface
    to migrate from the OpenBSD Cryptographic Framework
    OpenSSL supports cryptodev in the early days
  • AF_ALG
    Linux Kernel 2.6.38 began to include, the source code is located in crypto/af_alg.c
  • Use the netlink interface
    OpenSSL v1.1.0 to support AF_ALG (note: In addition, OpenSSL v1.1.0 adds ChaCha20 & Poly1305 encryption and decryption algorithms and removes SSv2)

The official website of cryptodev indicates that the performance of using cryptodev is better than that of AF_ALG, but according to the experiment of [17], the performance difference is not much.

I personally think that newly developed programs can consider using AF_ALG. After all, AF_ALG is in the mainline Kernel – stability, compatibility and maintainability will be better.

3. User space Cryptography libraries (Cryptography libraries) [7]

The following are the more common User space cryptography libraries [19],

  • OpenSSL
  • wolfSSL
  • GnuTLS

Personally recommend OpenSSL. In addition to the old brand and many users, OpenSSL is also funded by the Core Infrastructure Initiative under the Linux Foundation.

OpenSSL provides AF_ALG and cryptodev engines, and the Crypto API can be accessed through the engine. But it should be noted here that the OpenSSL package in Debian defaults to disable the AF_ALG and cryptodev options. Therefore, the direct execution will use the cryptographic algorithm implementation of the user space If you want to use the cryptographic algorithm implementation in kernel space, you need to download the source code, set it up and recompile it.

  • Open the OpenSSL AF_ALG engine step
    • Modify debian/rules, add enable-afalgeng at the end of CONFARGS
  • Open the OpenSSL cryptodev engine steps
    • 1. After downloading cryptodev, copy crypto/cryptodev.h [21] to OpenSSL/crypto
    • 2. Modify debian/rules, and add -DHAVE_CRYPTODEV -DUSE_CRYPTDEV_DIGESTS to the front of CONFARGS
      to access Kernel space cryptographic algorithms after compiling OpenSSL.

PART ONE–Crypto Subsystem of Linux Kernel

Introduce the process of sending a crypto (cryptography) request from the application layer to the Linux kernel through the system call, and forwarding the request to the hardware crypto engine through the crypto subsystem.

overview

Crypto subsystem is the subsystem responsible for processing crypto requests in the Linux system. In addition to including the process control mechanism, another important feature is to provide an abstraction layer for algorithm implementation, so that various manufacturers can customize the implementation method according to their needs.

One of the common examples is that the manufacturer adds a hardware algorithm engine to the hardware architecture to accelerate the efficiency of specific algorithm operations , and integrates the process of driving the hardware algorithm engine into the Linux system through the crypto subsystem for use by other kernel modules or application layers. .

Generally, chip manufacturers will play this way, at least among the manufacturers I have contacted, because the integrated use of the openssl software library for calculation will affect the performance of the entire product, and generally use hardware to replace software implementation. I also posted about hardware IP before. It is the IP used for calculation.

The following is an example of how the openSSL library transmits a crypto request to the kernel crypto subsystem:
image from: Linux Kernel

cryptodev Engine

In the Linux system, in order to realize the communication between the application layer and the hardware device, the first thing that comes to mind is to let the application program open the abstraction layer representing the hardware device through the character/block device driver, and through the read and write behavior and the hardware device to interact.

And Cryptodev-linux is in charge of this role. It provides the service of the middle layer, receives the crypto request sent by the application layer, and then calls the crypto API of the Linux kernel crypto Subsystem to forward the request to a specific hardware algorithm engine.

Cryptodev-linux is a miscellaneous device type kernel module, the default path is /dev/crypto, use the ioctl file operation cryptodev_ioctl to receive the data passed from the application side.

 1// https://github.com/cryptodev-linux/cryptodev-linux/blob/master/ioctl.c
 2
 3static const struct file_operations cryptodev_fops = {
 4	.owner = THIS_MODULE,
 5	.open = cryptodev_open,
 6	.release = cryptodev_release,
 7	.unlocked_ioctl = cryptodev_ioctl,
 8#ifdef CONFIG_COMPAT
 9	.compat_ioctl = cryptodev_compat_ioctl,
10#endif /* CONFIG_COMPAT */
11	.poll = cryptodev_poll,
12};
13
14static struct miscdevice cryptodev = {
15	.minor = MISC_DYNAMIC_MINOR,
16	.name = "crypto",
17	.fops = &cryptodev_fops,
18	.mode = S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH,
19};
20
21static int __init
22cryptodev_register(void)
23{
24	int rc;
25
26	rc = misc_register(&cryptodev);
27	if (unlikely(rc)) {
28		pr_err(PFX "registration of /dev/crypto failed\n");
29		return rc;
30	}
31
32	return 0;
33}

The application side uses the struct crypt_op or struct crypt_auth_op defined in cryptodev.h to form a specified crypto request, and calls the ioctl system call to send the request to Cryptodev-linux.

 1// https://github.com/cryptodev-linux/cryptodev-linux/blob/master/crypto/cryptodev.h
 2
 3struct crypt_auth_op {
 4	__u32	ses;		/* session identifier */
 5	__u16	op;		/* COP_ENCRYPT or COP_DECRYPT */
 6	__u16	flags;		/* see COP_FLAG_AEAD_* */
 7	__u32	len;		/* length of source data */
 8	__u32	auth_len;	/* length of auth data */
 9	__u8	__user *auth_src;	/* authenticated-only data */
10
11	/* The current implementation is more efficient if data are
12	 * encrypted in-place (src==dst). */
13	__u8	__user *src;	/* data to be encrypted and authenticated */
14	__u8	__user *dst;	/* pointer to output data. Must have
15	                         * space for tag. For TLS this should be at least 
16	                         * len + tag_size + block_size for padding */
17
18	__u8    __user *tag;    /* where the tag will be copied to. TLS mode
19                                 * doesn't use that as tag is copied to dst.
20                                 * SRTP mode copies tag there. */
21	__u32	tag_len;	/* the length of the tag. Use zero for digest size or max tag. */
22
23	/* initialization vector for encryption operations */
24	__u8	__user *iv;
25	__u32   iv_len;
26};

Sample code for Cryptodev-linux ioctl:

1// setup data for your crypto request
 2 cryp.ses = ctx->sess.ses;
 3 cryp.iv = (void*)iv;
 4 cryp.op = COP_DECRYPT;
 5 cryp.auth_len = auth_size;
 6 cryp.auth_src = (void*)auth;
 7 cryp.len = size;
 8 cryp.src = (void*)ciphertext;
 9 cryp.dst = ciphertext;
10 cryp.flags = COP_FLAG_AEAD_TLS_TYPE;
11
12 // call ioctl to pass a crypto request to `/dev/crypto`
13 if (ioctl(ctx->cfd, CIOCAUTHCRYPT, &cryp)) {
14   perror("ioctl(CIOCAUTHCRYPT)");
15   return -1;
16 }

In addition, Cryptodev-linux also provides a session mechanism, each crypto request corresponds to a session, and the session manages the status of the current crypto request.

For example, if the current session is in the initialized state, it means that this crypto request can execute encrypt, and this method ensures that the crypto request will operate under the correct process.

Linux Kernel Crypto Subsystem

The Crypto request will be sent to the kernel crypto subsystem through the kernel crypto API. The following is a brief crypto API call process:

insert image description here

Transformation Object & Transformation Implementation

First of all, the Crypto subsystem has two important elements:

  • transformation object
  • transformation implementation。

The transformation object will be abbreviated as tfm in the API, also known as cipher handler;

The transformation implementation is the implementation content of the bottom layer of the transformation object, also known as crypto algo . In the previous example, it is the algorithm implementation of crypto engine.

The main reason for distinguishing between object and implementation is that multiple objects may use the same implementation.

For example, users A and B both use the hmac-sha256 algorithm, so two transformation objects A and B will be newly created and contain the key values ​​owned by A and B respectively, but these two objects may use the same transformation implementation to call the same crypto engine for algorithmic operations.

TFM: The transformation object (TFM) is an instance of a transformation implementation. There can be multiple transformation objects associated with a single transformation implementation. Each of those transformation objects is held by a crypto API consumer or another transformation. https://www.kernel.org/doc/html/latest/crypto/intro.html

1struct crypto_tfm {
2	u32 crt_flags;
3	int node;
4	void (*exit)(struct crypto_tfm *tfm);
5	struct crypto_alg *__crt_alg; // crypto algorithm or transformation implementation
6	void *__crt_ctx[] CRYPTO_MINALIGN_ATTR;
7};
  • When a crypto request comes in, it will first take out a suitable crypto algorithm from the registered crypto algorithm list according to the algorithm name specified in the request, and create a new transformation object.

  • Afterwards, the transformation object will be composed of the cipher request used by the crypto subsystem. Cipher requests may share the same transformation object. For example, the transformation object of hmac-sha256 contains transformation implementation and a key value, and this transformation object can be used to perform hash algorithm on the messsage of multiple cipher requests (different plaintext uses operation with the same key).

  • After the cipher request completes the relevant settings, the transformation implementation of the transformation object is actually called to perform the algorithm operation.

At this point, there will be a problem, that is, when there are multiple requests coming in in a short time, how should we process the requests sequentially?

At this point, the crypto subsystem also designs a convenient struct crypto_engine. The crypto engine provides a queue management mechanism, so that multiple requests can be sequentially forwarded to the corresponding crypto engine.

Of course, if we have additional requirements, we can also implement other mechanisms to manage them, not necessarily using a crypto engine.

 1struct crypto_engine {
 2	char			name[ENGINE_NAME_LEN];
 3	bool			idling;
 4	bool			busy;
 5	bool			running;
 6
 7	bool			retry_support;
 8
 9	struct list_head	list;
10	spinlock_t		queue_lock;
11	struct crypto_queue	queue;
12	struct device		*dev;
13
14	bool			rt;
15
16	// implement these three functions to trigger your hardware crypto engine
17	int (*prepare_crypt_hardware)(struct crypto_engine *engine);
18	int (*unprepare_crypt_hardware)(struct crypto_engine *engine);
19	int (*do_batch_requests)(struct crypto_engine *engine);
20
21	struct kthread_worker           *kworker;
22	struct kthread_work             pump_requests;
23
24	void				*priv_data;
25	struct crypto_async_request	*cur_req;
26};

Register an Crypto Algorithm (Transformation Implementation)

After introducing the crypt API process, you can know that to add a transformation implementation to the crypto subsystem, the most important thing is to register the transformation implementation to the crypto algorithm list.

The Crypto API provides related registration APIs, taking stm32-crypt as an example:

 1struct skcipher_alg {
 2	int (*setkey)(struct crypto_skcipher *tfm, const u8 *key,
 3	              unsigned int keylen);
 4	int (*encrypt)(struct skcipher_request *req);
 5	int (*decrypt)(struct skcipher_request *req);
 6	int (*init)(struct crypto_skcipher *tfm);
 7	void (*exit)(struct crypto_skcipher *tfm);
 8
 9	unsigned int min_keysize;
10	unsigned int max_keysize;
11	unsigned int ivsize;
12	unsigned int chunksize;
13	unsigned int walksize;
14
15	struct crypto_alg base;
16};
17
18static struct skcipher_alg crypto_algs[] = {
19{
20	.base.cra_name		= "ecb(aes)",
21	.base.cra_driver_name	= "stm32-ecb-aes",
22	.base.cra_priority	= 200,
23	.base.cra_flags		= CRYPTO_ALG_ASYNC,
24	.base.cra_blocksize	= AES_BLOCK_SIZE,
25	.base.cra_ctxsize	= sizeof(struct stm32_cryp_ctx),
26	.base.cra_alignmask	= 0xf,
27	.base.cra_module	= THIS_MODULE,
28
29	.init			= stm32_cryp_init_tfm,
30	.min_keysize		= AES_MIN_KEY_SIZE,
31	.max_keysize		= AES_MAX_KEY_SIZE,
32	.setkey			= stm32_cryp_aes_setkey,
33	.encrypt		= stm32_cryp_aes_ecb_encrypt,
34	.decrypt		= stm32_cryp_aes_ecb_decrypt,
35},
36}

After the above-mentioned establishment of the transformation implementation containing the algorithm implementation, then call the registration API:

1ret = crypto_register_skciphers(crypto_algs, ARRAY_SIZE(crypto_algs));
2if (ret) {
3	dev_err(dev, "Could not register algs\n");
4	goto err_algs;
5}

Registration is now complete.

In addition, in the structure member mentioned in the code, cra_priority represents the priority of each transformation implementation . For example, AES-ECB has two different transformation implementations, registered software and hardware, and the one with higher priority will be adopted first.

cra_priority

Priority of this transformation implementation. In case multiple transformations with same cra_name are available to the Crypto API, the kernel will use the one with highest cra_priority.

PART TWO–Crypto Subsystem of Linux Kernel - Asynchronous & Synchronous

In the crypto subsystem, the crypto API is divided into two mechanisms: asynchronous (asynchronous) and synchronous (synchronous).

The earliest version of the crypto API is actually only the synchronous crypto API, but as the amount of data to be processed increases, the calculation and data transmission time may also be greatly prolonged. At this time, the synchronous crypto API may cause the processing process to fall into a long wait, so Later, the asynchronous crypto API was introduced for users to choose the appropriate mechanism according to their own usage scenarios.

The naming design of asynchronous and synchronous crypto APIs is different. Asynchronous will add an additional letter a to the prefix, whereas synchronous will be prefixed with s. Take hash as an example:

1// asynchronous API
2int crypto_ahash_digest(struct ahash_request *req);
3
4// synchronous API
5int crypto_shash_digest(struct shash_desc *desc, const u8 *data,
6                        unsigned int len, u8 *out);

In addition to naming, the required parameters are different due to the different processing flow of the two mechanisms.

The following also uses the hash crypto algorithm as an example to illustrate the differences and usage scenarios between synchronous and asynchronous crypto APIs.

Synchronous hash API

API document: https://docs.kernel.org/crypto/api-digest.html#synchronous-message-digest-api

insert image description here

There is an important parameter struct shash_desc *desc in the Synchronous hash API, which is a state handler, which is used to save the state value required in the operation process.

For example, in the call flow of the API, crypto_shash_update() can be called multiple times, allowing the user to put in multiple sets of messages that need to be calculated, and when the crypto engine finishes calculating a set of messages, there may be some intermediate states that need to be updated Saved, these state values ​​will be placed in the state handler.


1struct shash_desc {
2	struct crypto_shash *tfm;
3
4  // store required state for crypto engine
5	void *__ctx[] __aligned(ARCH_SLAB_MINALIGN); 
6};

Therefore, before calling the API, the user will need to allocate a memory of sufficient size so that the crypto engine can store these states . In the transformation implementation, the size of the state storage space required by the crypto engine is set, and the user only needs to call a specific API to obtain it.

1unsigned int size;
2struct crypto_shash *hash; // transformation object or called cipher handler
3struct shash_desc *desc; // state handler
4
5hash = crypto_alloc_shash(name, 0, 0); // create a transformation object
6
7// get a required desc size for crypto engine via `crypto_shash_descsize` API
8size = sizeof(struct shash_desc) + crypto_shash_descsize(hash);
9desc = kmalloc(size, GFP_KERNEL);

After creating the shash_desc, then execute the initialization API,

This API mainly calls the init function of the transformation implementation, the purpose is to enable the corresponding crypto engine to initialize or reset, etc., in order to prepare for the next computing behavior.

1int rc;
2rc = crypto_shash_init(desc);
3// error handling

After the initialization is complete, you can call the update API to perform hash operations on the specified message.

1rc = crypto_shash_update(desc, message, message_len);
2// error handling

Finally, call final to get the hash result.

1 u8 result[DIGEST_SIZE];
2 rc = crypto_shash_final(desc, result);
3 // error handling

Basically, the synchronous API is used in a similar way to the general application-side crypto library, as long as the APIs corresponding to the process are called sequentially, and error handling is performed on the returned results.

Although the Synchronous crypto API is intuitive to use, it is not suitable for some scenarios. In addition to the fact that the synchronous mechanism is mentioned at the beginning, another possible problem is that when the data to be processed is a discontinuous memory segment, The synchronous crypto API is not so easy to use.

You can see the example mentioned above, where the input parameter message of crypto_shash_update is a buffer of continuous memory. Assuming that there are several pieces of data, it is necessary to call crypto_shash_update multiple times to pass in all the data.

Asynchronous hash API

Asynchronous crypto API provides an asynchronous mechanism and introduces [struct scatterlist] to improve the problems mentioned above.

 1struct ahash_request {
 2	struct crypto_async_request base;
 3
 4	unsigned int nbytes;
 5	struct scatterlist *src;
 6	u8 *result;
 7
 8	/* This field may only be used by the ahash API code. */
 9	void *priv;
10
11	void *__ctx[] CRYPTO_MINALIGN_ATTR;
12};
13
14int crypto_ahash_digest(struct ahash_request *req);

It can be seen from the API that the parameter is changed to struct ahash_request at a rate, and the ahash_request structure contains an important member struct scatterlist *, which is used to describe a continuous physical memory section, and it can be in the form of chain, which also means Multiple physical memory sections can be concatenated into a list.

 1typedef void (*crypto_completion_t)(struct crypto_async_request *req, int err);
 2
 3struct crypto_async_request {
 4	struct list_head list;
 5	crypto_completion_t complete;
 6	void *data;
 7	struct crypto_tfm *tfm;
 8
 9	u32 flags;
10};

In addition, struct crypto_async_request contains a callback function crypto_completion_t. After the operation is completed, the callback will be used to notify the user of the completed process.

insert image description here

Since it is an asynchronous non-synchronous mechanism, when the crypto engine processes a request, its behavior and process are quite different from the synchronous synchronization mechanism. The common implementation method is to add a request queue to manage multiple requests. When the user calls the update API When sending a request, the request will be added to the queue, and the status information of processing (-EINPROGRESS) will be returned directly.

The following is a simple asynchronous hash API usage example:

 1const u32 result_len = 16;
 2struct crypto_ahash *tfm;
 3struct ahash_request *req;
 4u8 *result;
 5
 6result = kmalloc(result_len, GFP_NOFS);
 7
 8tfm = crypto_alloc_ahash(0, 0, CRYPTO_ALG_ASYNC);
 9req = ahash_request_alloc(tfm, GFP_NOFS);
10// set callback function
11ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP, callback_fn, NULL);
12// set input data
13ahash_request_set_crypt(req, sc, NULL, 32);
14
15err = crypto_ahash_init(req);
16
17err = crypto_ahash_update(req);
18if (err == -EINPROGRESS)
19{
20	//
21}
22
23err = crypto_ahash_final(req);
24if (err == -EINPROGRESS)
25{
26	//
27}

Other notes:
Although it is called asynchronous hash API, in fact, the corresponding crypto engine implementation method may not always be processed in an asynchronous manner, and the specific process depends on the implementation content of each manufacturer.

If the user uses the asynchronous hash API, but the corresponding transformation implementation is actually a synchronous type, the crypto subsystem will actively perform the relevant data conversion, so it can also work normally.

in conclusion

Briefly introduce the asynchronous and synchronous crypto APIs and the communication process with the crypto engine. On the surface, it seems that the asynchronous mechanism is more flexible, but for manufacturers, which mechanism to actually implement may be affected by hardware or other implementation levels. Therefore, it is still necessary to refer to many parties to know which method is better.


PART THREE–Crypto Subsystem of Linux Kernel - Asynchronous Request Handling Mechanism

Since it is expected in the crypto subsystem that multiple crypto requests can send requests to the same crypto engine at the same time, the crypto engine driver must implement a corresponding mechanism to be able to cope with this situation.

In addition, combined with the asynchronous crypto API process of the crypto subsystem mentioned in the previous section, the more common implementation method is to use a crypto queue with a worker, open an additional kernel thread to communicate with the crypto engine, and let the crypto requests be processed in FIFO order , and this article mainly focuses on this design method, explaining the entire operation process and details.

Overview

insert image description here

condition

Assuming that the hardware crypto engine can only process one request at a time, after registering the request according to the requirements, start the crypto engine to perform calculations, and the next request can only be changed after the calculation results are completed.

When the result is calculated, the crypto engine will raise the status interrupt to notify the outside that the calculation has been completed.

Multiple Requests may send requests to the hardware crypto engine at the same time.

A complete crypto request process includes three API calls: Init→Update→Final. After the Final result is sent back, the crypto request will be released and no longer used.

IDEA

Create a global crypto request list, and arrange incoming requests into the list in order.

Create a worker (kernel thread) and corresponding work queue to communicate with the hardware crypto engine. In addition to taking out requests from the crypto request list for processing, worker tasks may also include crypto engine initialization and resource release.

Register the interrupt handler, when the status interrupt is raised, call the user-defined completion callback function to complete the final process. If the last final API call is currently being executed and the request has a custom resource that needs to be released, it will be executed after the callback function is called.

verison

Linux kernel version: v5.17.3

Crypto Queue

The Linux kernel implements a general-purpose crypto queue structure and the corresponding operation API:

 1struct crypto_queue {
 2	struct list_head list;
 3	struct list_head *backlog;
 4
 5	unsigned int qlen;
 6	unsigned int max_qlen;
 7};
 8
 9void crypto_init_queue(struct crypto_queue *queue, unsigned int max_qlen);
10
11int crypto_enqueue_request(struct crypto_queue *queue, struct crypto_async_request *request);
12void crypto_enqueue_request_head(struct crypto_queue *queue, struct crypto_async_request *request);
13
14struct crypto_async_request *crypto_dequeue_request(struct crypto_queue *queue);
15
16static inline unsigned int crypto_queue_len(struct crypto_queue *queue);

In most cases, we can directly use this structure to implement the crypto request list, but according to our above scenario, the request list may be operated by multiple requests at the same time, so it needs to be protected by a lock mechanism.

 1struct cherie_crypto_engine {
 2    struct device  *dev;
 3
 4    struct crypto_queue  	  queue;
 5    struct kthread_worker   *kworker;
 6	  struct kthread_work     do_requests;
 7    spinlock_t		          queue_lock;
 8
 9    struct crypto_async_request *current_req;
10};
11
12static int cherie_request_enqueue(struct ahash_request *req)
13{
14	int ret;
15	unsigned long flags;
16	struct cherie_crypto_engine *engine = get_engine();
17
18	spin_lock_irqsave(&engine->queue_lock, flags);
19	ret = crypto_enqueue_request(&engine->queue, &req->base);
20	spin_unlock_irqrestore(&engine->queue_lock, flags);
21	return ret;
22}

Worker & Worker Queue

Worker is the only kernel thread that can operate the crypto engine to ensure that the crypto engine will only execute one task at a time. Similarly, we also use the worker API provided by the Linux kernel itself to achieve:

1struct kthread_worker *kthread_create_worker(unsigned int flags, const char namefmt[], ...);
2bool kthread_queue_work(struct kthread_worker *worker, struct kthread_work *work);

As for the work, it may contain several items:

  • Crypto engine initialization.
  • Take out the request from the crypto request list, and perform read and write operations on the registers related to the crypto engine according to the information of the request.
  • The release of crypto engine resources. (For example, when there is currently no request to be processed, related resources can be released first)
1static void cherie_work(struct kthread_work *work)
 2{
 3	unsigned long flags;
 4  struct cherie_request_state *state;
 5	struct ahash_request *req;
 6	struct crypto_async_request *async_req;
 7	struct cherie_crypto_engine *engine = get_engine();
 8	
 9	spin_lock_irqsave(&engine->queue_lock, flags);
10
11	if (!engine->initialize)
12  {
13     // do initialization
14  }
15
16	// we can't fetch the next request if the current request isn't done.
17	if (engine->current_req)
18	{
19		spin_unlock_irqrestore(&engine->queue_lock, flags);
20		return;
21	}
22
23	async_req = crypto_dequeue_request(&engine->queue);
24	spin_unlock_irqrestore(&engine->queue_lock, flags);
25
26	if (!async_req)
27		return;
28
29	req = ahash_request_cast(async_req);
30	state = ahash_request_ctx(req);
31
32	switch (state->algo_op)
33	{
34	case ALGO_UPDATE:
35		cherie_do_request_update(req);
36		break;
37	case ALGO_FINAL:
38		cherie_do_request_final(req);
39		break;
40	default:
41		break;
42	}
43}

Status Interrupt Handling

Due to the asynchronous request mechanism, after the crypto engine calculation is completed and the status interrupt signal is raised, the completion callback function defined by the user is called through the bottom half method to end the API call at this stage.

 1static irqreturn_t cherie_crypto_engine_irq_thread_fn(int irq, void *arg)
 2{
 3	unsigned long flags;
 4	struct cherie_crypto_engine *engine = get_engine();
 5
 6	spin_lock_irqsave(&engine->queue_lock, flags);
 7
 8	if (engine->current_req)
 9	{
10		engine->current_req->complete(engine->current_req, 0);
11		engine->current_req = NULL;
12	}
13	spin_unlock_irqrestore(&engine->queue_lock, flags);
14	// add a work to process the next request
15	kthread_queue_work(ctx->kworker, &ctx->do_requests);
16
17	return IRQ_HANDLED;
18}
19
20static int cherie_crypto_engine_probe(struct platform_device *pdev)
21{
22  int irq, ret;
23  irq = platform_get_irq(pdev, 0);
24	if (irq < 0)
25		return irq;
26
27	ret = devm_request_threaded_irq(dev, irq, cherie_crypto_engine_irq_handler,
28					cherie_crypto_engine_irq_thread_fn, IRQF_TRIGGER_HIGH | IRQF_ONESHOT,
29					dev_name(dev), ctx);
30}

Implement Crypto API

Finally, implement the Crypto API, which is the transformation implementation mentioned in the overview.

Generally speaking, there are three types of status code returned by asynchronous request:

  • 0 means success,
  • -EINPROGRESS stands for processing,
  • The rest represent other error codes.

If we return -EINPROGRESS when implementing the API, we must call the callback function of the user program in the subsequent process, otherwise the user program may fall into a loop of waiting for the callback.

For example, in the update API function, when the request is added to the queue, the -EINPROGRESS status is returned to the user program:

 1static int cherie_crypto_engine_update(struct ahash_request *req)
 2{
 3	int ret;
 4	struct cherie_crypto_engine *engine = get_engine();
 5	struct cherie_request_state *state = ahash_request_ctx(req);
 6
 7	state->algo_op = ALGO_UPDATE;
 8	ret = cherie_crypto_request_enqueue(req);
 9	kthread_queue_work(ctx->kworker, &ctx->do_requests);
10	return ret; // ret is -EINPROGRESS if the request was added to queue.
11}

Then it is necessary to ensure that the request→complete callback function is called during the worker's execution of the task, so that the user program knows that he can continue to execute the next API call.

in conclusion

This article mainly explains how to use the crypto queue to process multiple requests to a hardware crypto engine at the same time under the crypto subsystem of the Linux kernel, and to implement the asynchronous request process with workers.

What needs to be considered is that since the worker is the only thread that can interact with the crypto engine, the tasks and order that the worker needs to process need to be designed according to the Crypto API process and the functions of the crypto engine itself.

Of course, if you don’t want to be so troublesome, the abstraction layer [crypto/engine.h] is provided in the version above Linux kernel v4 . It is easier for suppliers to integrate the crypto engine into the Linux kernel.

References


+ [1]: https://en.wikipedia.org/wiki/Advanced_Encryption_Standard

+ [2]: https://en.wikipedia.org/wiki/RSA_(cryptosystem)

+ [3]: https://en.wikipedia.org/wiki/Curve25519

+ [4]: https://en.wikipedia.org/wiki/SHA-2

+ [5]: https://en.wikipedia.org/wiki/SHA-3

+ [6]: https://www.kernel.org/doc/Documentation/crypto/asymmetric-keys.txt

+ [7]: https://en.wikipedia.org/wiki/Comparison_of_cryptography_libraries

+ [8]: https://www.coreinfrastructure.org/grants

+ [9]: https://en.wikipedia.org/wiki/AES_instruction_set

+ [10]: https://en.wikipedia.org/wiki/Hardware_security_module

+ [11]: https://szlin.me/2017/01/07/%E5%88%9D%E6%8E%A2-tpm-2-0/

+ [12]: http://cryptodev-linux.org/
+ [13]: https://www.kernel.org/doc/Documentation/crypto/userspace-if.rst

+ [14]: https://lwn.net/Articles/410763/

+ [15]: https://www.openssl.org/news/openssl-1.1.0-notes.html

+ [16]: https://events.linuxfoundation.org/sites/events/files/slides/lcj-2014-crypto-user.pdf

+ [17]: http://events.linuxfoundation.org/sites/events/files/slides/2017-02%20-%20ELC%20-%20Hudson%20-%20Linux%20Cryptographic%20Acceleration%20on%20an%20MX6.pdf

+ [18]: https://www.slideshare.net/nij05/slideshare-linux-crypto-60753522

+ [19]: https://en.wikipedia.org/wiki/Comparison_of_cryptography_libraries

+ [20]: https://patchwork.kernel.org/patch/9192881/

+ [21]: https://github.com/cryptodev-linux/cryptodev-linux/blob/master/crypto/cryptodev.h

+ [22]: https://www.slideshare.net/nij05/slideshare-linux-crypto-60753522

+ [23]:[【Linux Kernel Crypto API】](https://docs.kernel.org/crypto/index.html)

+ [24]:[【Kernel Crypto API Interface Specification】](https://www.kernel.org/doc/html/latest/crypto/intro.html)

+ [25]:[【An overview of the crypto subsystem - The Linux Foundation】](http://events17.linuxfoundation.org/sites/events/files/slides/brezillon-crypto-framework_0.pdf)

+ [26]:[【SZ Lin with Cybersecurity & Embedded Linux】](https://szlin.me/2017/04/05/linux-kernel-%E5%AF%86%E7%A2%BC%E5%AD%B8%E6%BC%94%E7%AE%97%E6%B3%95%E5%AF%A6%E4%BD%9C%E6%B5%81%E7%A8%8B/)

Guess you like

Origin blog.csdn.net/weixin_45264425/article/details/132484700