Talk about the application of reactive programming in the server, database operation optimization, from 20 seconds to 0.5 seconds

The application of reactive programming in client programming is quite extensive, while the current applications on the server are relatively less mentioned. This article will introduce how to use response time programming in server-side programming to improve the performance of database operations.

The beginning is the conclusion

Using System.Reactive with TaskCompelteSource, it is possible to combine scattered single database insertion requests into one batch insertion request. Under the premise of ensuring correctness, the optimization of database insertion performance is realized.

If the reader already knows how to operate, then the rest of the content does not need to be read.

Precondition

Now, we assume that there is such a Repository interface to represent a database insert operation.

csharp

namespace Newbe.RxWorld.DatabaseRepository
{    public interface IDatabaseRepository
    {        /// <summary>
        /// Insert one item and return total count of data in database
        /// </summary>
        /// <param name="item"></param>
        /// <returns></returns>
        Task<int> InsertData(int item);
    }
}

Next, let's experience the performance difference brought by different implementations without changing the interface signature.

Basic version

The first is the basic version, which uses the most conventional single database INSERT operation to complete the data insertion. This example uses SQLite as the demonstration database, which is convenient for readers to experiment by themselves.

csharp

namespace Newbe.RxWorld.DatabaseRepository.Impl
{    public class NormalDatabaseRepository : IDatabaseRepository
    {        private readonly IDatabase _database;
        public NormalDatabaseRepository(
            IDatabase database)
        {            _database = database;        }        public Task<int> InsertData(int item)
        {            return _database.InsertOne(item);
        }    }}

Routine operation. The specific implementation of _database.InsertOne(item) is to call INSERT once.

The basic version can basically be completed faster when inserting less than 20 times at the same time. But if the order of magnitude increases, for example, it will take about 20 seconds to insert 10,000 databases at the same time, and there is a lot of room for optimization.

TaskCompelteSource

TaskCompelteSource is a type that can generate an operational Task in the TPL library. Readers who are not familiar with TaskCompelteSource can learn from this example code.

Here is also a brief explanation of the role of the object so that readers can continue reading.

For friends who are familiar with javascript, TaskCompelteSource can be regarded as equivalent to a Promise object. It can also be equivalent to $.Deferred in jQuery.

If you don’t know anything about it, you can listen to the life-like examples I thought of when I ate Mala Tang.

Talk about the application of reactive programming in the server, database operation optimization, from 20 seconds to 0.5 seconds

Eating Mala Tang technology explains that before eating Mala Tang, you need to use a plate to sandwich the dishes. After constructing the parameters and clamping the food, take it to the checkout and call the method. After the cashier finishes the checkout, the cashier will get an order card, which will ring a bell and get a Task return value. Take the menu card and find a seat to sit down. Playing on the phone and waiting for a meal is awaiting this Task, the CPU turns to deal with other things. The menu rang, fetch the meal, and the task is completed, await the number of sections, and continue to execute the next line of code

So where is TaskCompelteSource?

First of all, according to the example above, we will only pick up the meal when the menu is ringing. So when will the menu ring? Of course, the waiter manually pressed a manual switch on the counter to trigger the bell.

Then, this switch on the counter can be technically interpreted as TaskCompelteSource.

The table switch can control the bell of the menu. Similarly, TaskCompelteSource is an object that can control the state of Task.

Solutions

With the previous understanding of TaskCompelteSource, then the problem at the beginning of the article can be solved. The idea is as follows:

When InsertData is called, a tuple of TaskCompelteSource and item can be created. For the convenience of explanation, we named this tuple BatchItem.

Return the Task corresponding to the TaskCompelteSource of the BatchItem.

The code that calls InsertData will await the returned Task, so as long as the TaskCompelteSource is not operated, the caller will wait for a while.

Then, another thread is started to consume the BatchItem queue regularly.

This completes the operation of turning a single insert into a batch insert.

The author may not explain clearly, but all the following versions of the code are based on the above ideas. Readers can combine text and code to understand.

ConcurrentQueue version

Based on the above ideas, we use ConcurrentQueue as the BatchItem queue for implementation. The code is as follows (there is a lot of code, don’t worry, because there are simpler ones below):

csharp

namespace Newbe.RxWorld.DatabaseRepository.Impl
{    public class ConcurrentQueueDatabaseRepository : IDatabaseRepository    {        private readonly ITestOutputHelper _testOutputHelper;
        private readonly IDatabase _database;
        private readonly ConcurrentQueue<BatchItem> _queue;
        // ReSharper disable once PrivateFieldCanBeConvertedToLocalVariable
        private readonly Task _batchInsertDataTask;
        public ConcurrentQueueDatabaseRepository(            ITestOutputHelper testOutputHelper,            IDatabase database)        {            _testOutputHelper = testOutputHelper;            _database = database;            _queue = new ConcurrentQueue<BatchItem>();            // 启动一个 Task 消费队列中的 BatchItem            _batchInsertDataTask = Task.Factory.StartNew(RunBatchInsert, TaskCreationOptions.LongRunning);            _batchInsertDataTask.ConfigureAwait(false);
        }        public Task<int> InsertData(int item)        {            // 生成 BatchItem ，将对象放入队列。返回 Task 出去            var taskCompletionSource = new TaskCompletionSource<int>();            _queue.Enqueue(new BatchItem            {                Item = item,                TaskCompletionSource = taskCompletionSource            });            return taskCompletionSource.Task;
        }        // 从队列中不断获取 BatchItem ，并且一批一批插入数据库，更新 TaskCompletionSource 的状态        private void RunBatchInsert()
        {            foreach (var batchItems in GetBatches())
            {                try                {                    BatchInsertData(batchItems).Wait();                }                catch (Exception e)                {                    _testOutputHelper.WriteLine($"there is an error : {e}");
                }            }            IEnumerable<IList<BatchItem>> GetBatches()
            {                var sleepTime = TimeSpan.FromMilliseconds(50);                while (true)
                {                    const int maxCount = 100;                    var oneBatchItems = GetWaitingItems()                        .Take(maxCount)                        .ToList();                    if (oneBatchItems.Any())
                    {                        yield return oneBatchItems;
                    }                    else
                    {                        Thread.Sleep(sleepTime);                    }                }                IEnumerable<BatchItem> GetWaitingItems()
                {                    while (_queue.TryDequeue(out var item))
                    {                        yield return item;
                    }                }            }        }        private async Task BatchInsertData(IEnumerable<BatchItem> items)        {            var batchItems = items as BatchItem[] ?? items.ToArray();            try            {                // 调用数据库的批量插入操作                var totalCount = await _database.InsertMany(batchItems.Select(x => x.Item));                foreach (var batchItem in batchItems)
                {                    batchItem.TaskCompletionSource.SetResult(totalCount);                }            }            catch (Exception e)            {                foreach (var batchItem in batchItems)
                {                    batchItem.TaskCompletionSource.SetException(e);                }                throw;            }        }        private struct BatchItem        {            public TaskCompletionSource<int> TaskCompletionSource { get; set; }
            public int Item { get; set; }
        }    }}

More Local Function and IEnumerable features are used in the above code. Readers who don't understand can click here to understand.

The feature film begins!

Next, we use System.Reactive to transform the more complex version of ConcurrentQueue above. as follows:

csharp

namespace Newbe.RxWorld.DatabaseRepository.Impl
{    public class AutoBatchDatabaseRepository : IDatabaseRepository
    {        private readonly ITestOutputHelper _testOutputHelper;
        private readonly IDatabase _database;
        private readonly Subject<BatchItem> _subject;
        public AutoBatchDatabaseRepository(
            ITestOutputHelper testOutputHelper,            IDatabase database)        {            _testOutputHelper = testOutputHelper;            _database = database;            _subject = new Subject<BatchItem>();
            // 将请求进行分组，每50毫秒一组或者每100个一组
            _subject.Buffer(TimeSpan.FromMilliseconds(50), 100)
                .Where(x => x.Count > 0)
                // 将每组数据调用批量插入，写入数据库
                .Select(list => Observable.FromAsync(() => BatchInsertData(list)))
                .Concat()
                .Subscribe();
        }
        // 这里和前面对比没有变化
        public Task<int> InsertData(int item)
        {
            var taskCompletionSource = new TaskCompletionSource<int>();
            _subject.OnNext(new BatchItem
            {
                Item = item,
                TaskCompletionSource = taskCompletionSource
            });
            return taskCompletionSource.Task;
        }
        // 这段和前面也完全一样，没有变化
        private async Task BatchInsertData(IEnumerable<BatchItem> items)
        {
            var batchItems = items as BatchItem[] ?? items.ToArray();
            try
            {
                var totalCount = await _database.InsertMany(batchItems.Select(x => x.Item));
                foreach (var batchItem in batchItems)
                {
                    batchItem.TaskCompletionSource.SetResult(totalCount);
                }
            }
            catch (Exception e)
            {
                foreach (var batchItem in batchItems)
                {
                    batchItem.TaskCompletionSource.SetException(e);
                }
                throw;
            }
        }
        private struct BatchItem
        {
            public TaskCompletionSource<int> TaskCompletionSource { get; set; }
            public int Item { get; set; }
        }
    }
}

The code is reduced by 50 lines. The main reason is to use the powerful Buffer method provided in System.Reactive to implement the complex logic implementation in the ConcurrentQueue version.

Teacher, can you be a little bit stronger?

We can "slightly" optimize the code to make Buffer and related logic independent of the business logic of "database insertion". Then we will get a simpler version:

csharp

namespace Newbe.RxWorld.DatabaseRepository.Impl
{    public class FinalDatabaseRepository : IDatabaseRepository
    {        private readonly IBatchOperator<int, int> _batchOperator;
        public FinalDatabaseRepository(
            IDatabase database)
        {            var options = new BatchOperatorOptions<int, int>
            {                BufferTime = TimeSpan.FromMilliseconds(50),
                BufferCount = 100,
                DoManyFunc = database.InsertMany,            };            _batchOperator = new BatchOperator<int, int>(options);
        }        public Task<int> InsertData(int item)
        {            return _batchOperator.CreateTask(item);
        }    }}

Among them, IBatchOperator and other codes, readers can go to the code base to view it, it is not displayed here.

Performance Testing

Basically, it can be measured as follows:

When 10 pieces of data are operated concurrently, there is not much difference between the original version and the batch version. Even the batch version will be slower when the number is small, after all, there is a maximum waiting time of 50 milliseconds.

However, if a batch operation is required to concurrently operate 10,000 pieces of data, the original version may take 20 seconds, while the batch version only takes 0.5 seconds.

All sample codes can be found in the code library. If Github Clone is difficult, you can also click here to clone from Gitee

Last but most important!

Recently, the author is building a set of server-side development framework based on reactive, Actor mode and event traceability. I hope to provide developers with an application system that can facilitate the development of "distributed", "horizontally scalable" and "high testability"-Newbe.Claptrap