Some resource limitations of rust asynchronous library-tokio

Project address: https://github.com/netwarps/rust-ipfs

Preface

In rust, async-std and tokio, as two asynchronous runtime libraries with more users, have their own advantages. And rust-ipfs is the rust implementation of ipfs, the runtime used is tokio, and the underlying network library is based on rust-libp2p. In order to try to modify the underlying rust-libp2p to libp2p-rs , we fork a code on the basis of the original warehouse for transplantation, which has been completed. Now share a hang problem encountered during the migration process.

Problem Description

First, I set up a go-ipfs daemon, and got the multiaddress information through the ipfs id command. Then run the simple rust-ipfs example program to ensure that it is successfully connected to go-ipfs. In ipfs, the maximum number of blocks carried by a dag node is 174, which is 43.5 MiB. I stored a file of about 77MiB through go. When I fetched it through rust, I found that I got 128 blocks at most, and the test code did not respond.

Search time is too long?

Since there is a timeout limit in the logic of cid obtaining block, Bitswap will throw an Error message when there is no return for more than 30s, so at first, it was simply thought that the search process took a long time and left it aside. But after about ten minutes, I found that the console did not throw any BitswapError message, and found that things may not be as simple as thought.

blockstore hangs

Through layer-by-layer printing of logs and troubleshooting, finally locate the problem in the call of the get_block() method.

Some resource limitations of rust asynchronous library-tokio

First find the block corresponding to cid in the local blockstore, and if you can't find it, check it through bitswap. In the test, the blockstore used the Hashmap packaged by tokio::Mutex. The problem of hang occurred in the step of obtaining the block from the Hashmap, which is line 383 in the figure.

tokio resource limit

With the help of an article by tokio , we found a solution to the above-mentioned problems.

Since tokio is not preemptively scheduled, it is possible that a certain task has been executing, causing other tasks to be unscheduled and starving. In some languages, the execution can be interrupted by injecting yield points, but the generator of the rust language does not seem to provide similar functions.

Therefore, in order to solve this problem, tokio introduced the concept of budget in version 0.2.14. This can be understood as a quota, and every resource of tokio will know this value. The default value of budget is 128, which is a good value obtained after the official test. Each asynchronous operation will reduce the value of budget. When it is reduced to 0, the task will return to the scheduler and reset the budget. , Waiting for the next time to be scheduled.

In tokio::block_on, budget detection will be performed:
Some resource limitations of rust asynchronous library-tokio

Some resource limitations of rust asynchronous library-tokio

Some resource limitations of rust asynchronous library-tokio

Some resource limitations of rust asynchronous library-tokio

The coop::budget() shown in the figure will initialize the budget variable to 128, and then poll the incoming future, where the incoming is the Acquire inside Mutex. Acquire implements Future and will first check the budget when polling. If the budget is sufficient or the budget is not restricted, return to Ready and perform the remaining operations, otherwise return to Pending.

At the same time, there is one more thing to note. The reset of the budget will only take effect in the worker thread of tokio. The executors of other libraries do not know the existence of the budget, that is, they will not perform the reset operation. For example, if the block_on of futures is used in tokio's executor, it will cause the code running logic inside the block_on to be hung after a certain number of executions, causing the hang problem.

In our code, we actually happened to encounter the above problems:

First, the main function uses #[tokio::main], which will automatically generate a tokio executor:
Some resource limitations of rust asynchronous library-tokio

Secondly, the method of testing the code uses the block_on of future::executor, which will cause the code to be executed only in the executor of the future. When get_block() finds the local blockstore, if it cannot be executed within 128 times, it will end. Causes to hang:
Some resource limitations of rust asynchronous library-tokio

in conclusion

In summary, the solution to the problem is not to execute executor::block_on() of other libraries in tokio's executor.


Netwarps is composed of a senior cloud computing and distributed technology development team in China. The team has very rich experience in the financial, power, communications and Internet industries. Netwarps currently has R&D centers in Shenzhen and Beijing, with a team size of 30+, most of which are technicians with more than ten years of development experience, from professional fields such as the Internet, finance, cloud computing, blockchain, and scientific research institutions.
Netwarps focuses on the development and application of secure storage technology products. The main products include decentralized file system (DFS) and decentralized computing platform (DCP), and are committed to providing distributed storage and distributed based on decentralized network technology. The computing platform has the technical characteristics of high availability, low power consumption and low network, and is suitable for scenarios such as the Internet of Things and the Industrial Internet.
Official account: Netwarps

Guess you like

Origin blog.51cto.com/14915984/2665535