Dry goods丨DolphinDB API performance benchmark report

1 Overview

DolphinDB database is a high-performance distributed time-series database (time-series database), a columnar relational database, written in C++, with a built-in parallel and distributed computing framework, which can be used to process real-time data and massive historical data.

In addition to providing its own scripting language, DolphinDB also provides programming language APIs such as C++, Java, C#, Python, R, etc., so that developers can use DolphinDB in a variety of different development environments.

This article will test the performance of API interface (C++, Java, C#, Python, R) and DolphinDB interaction, including the following scenarios:

  • Single user upload data to memory table
  • Multiple users upload data concurrently to a distributed (DFS) database
  • Multi-user concurrent download data from DolphinDB to the client
  • Multi-users concurrently send calculation tasks (calculate the minute-level k-line of a certain stock on a certain day) to DolphinDB and return the result

2. Test environment

2.1 Hardware configuration

This test uses three servers with the same configuration (SERVER1, SERVER2, SERVER3), and the configuration of each server is as follows:

Host: PowerEdge R730xd

CPU: E5-2650 24cores 48 threads

Memory: 512G

Hard Disk: HDD 1.8T * 12

Network: 10 Gigabit Ethernet

OS : CentOS Linux release 7.6.1810

2.2 Software configuration

C++ : GCC 4.8.5

JRE : 1.8.0

C# :.net 2.2.105

Python : 3.7.0

R:3.5.2

DolphinDB : 0.94.2

2.3 Test framework

The DolphinDB cluster is deployed on SERVER1, and the API program runs on SERVER2 and SERVER3, and is connected to the DolphinDB data node on SERVER1 through the network for testing.

The DolphinDB cluster configuration is as follows:

The cluster contains 1 control node and 6 data nodes;

Memory: 32G/node * 6 nodes = 192G

Thread: 8 threads/node * 6 nodes = 48 threads

Hard Disk: Each node is equipped with an independent HDD hard disk, 1.8T/node* 6 = 9.6T

3. Single user upload data performance test

This section tests a single user to upload data to the DolphinDB server through the API. Create a memory table on the DolphinDB cluster of SERVER1, run the API program on SERVER2, and write data into the memory table of SERVER1.

The written data table fields include different types of fields such as STRING, INT, LONG, SYMBOL, DOUBLE, DATE, TIME, etc., with a total of 45 columns, each with 336 bytes, and a total of 1 million rows uploaded, with a size of about 336Mb. Test the throughput and latency when uploading 10 to 100,000 rows each time.

Because this scenario is a single user and does not involve disk operations, it mainly tests the conversion performance of the API program data format to the DolphinDB data format. CPU performance and network will have a greater impact on the test results. The test results of each API are as follows:

Table 1. C++ API single user upload data to memory table test results

5d3af50a05487bd6a61ef72f30ee6442.png

Table 2. Java API single user upload data to memory table test results

40893d87d86b6cf8ffbab4d4d2517252.png

Table 3. C# API single user upload data to memory table test results

901a6609aa33605f4aa388866b7ef0dc.png

Table 4. Python API single user upload data to memory table test results

6822080b92f360ecb4a1677ed873957d.png

Table 5. R API single user upload data to memory table test results

d33abe0e2f0655ad08ef38feb97a749c.png

Table 6. Comparison of the writing speed of each API single user uploading data to the memory table (unit: mega/second)

30d0b9c2d300e7f3c003cc64e3b74ce2.png

Figure 1. API upload data to memory table performance comparison

f4b1e14d2e029d76fec8c02b8c15213e.jpeg

From the test results of writing to the memory table by a single user, as the batch size increases, the performance improves significantly. This is because when the total amount of data is the same, the more rows of data uploaded at one time, the fewer upload times. , The fewer network communications.

C++ has the best performance and C# has poor performance. The underlying implementations of Python and R are both rewritten in C++, and the performance trends are the same. When the batch size is small, the C++ module is called more times, which will cause more performance loss and worse performance. When the batch size reaches more than 1000 rows, the performance improves significantly. We recommend that when uploading data using Python and R, try to increase the upload batch size.

4. Multi-user concurrent upload data performance test

In this section, the test uses the API to concurrently upload data to the DFS data table of SERVER1 by multiple users. On SERVER2 and SERVER3, multiple users simultaneously initiate write operations through the network.

Each user writes a total of 5 million rows, and each writes 25,000 rows, each with 336 bytes, so the total amount of data written by each user is 840Mb. Test the latency and throughput of concurrent writes when the number of concurrent users is 1~128.

We evenly distribute the number of users to run on SERVER2 and SERVER3. For example, when testing 16 users, two servers each run 8 client programs. The test involves concurrent writing. The written data is transmitted to SERVER1 through the network and stored on the disk. Therefore, it can be tested whether the DolphinDB system can make full use of server CPU, hard disk, network and other resources. The test results of each API are as follows:

Table 7. C++ API multi-user concurrent upload of data to DFS table test results

cf788df69f099f33f457eb8ccf7938f1.png

Table 8. Java API multi-user concurrent upload data to the DFS table test results

bc1f20268e37e773fd2eebdf28c588c4.png

Table 9. C# API multi-user concurrent upload data to DFS table test results

f6cc12a85b929d8585af2b1580035076.png

Table 10. Python API multi-user concurrent upload data to the DFS table test results

2f807365bff67691a0c75301c176ed31.png

Table 11. R API multi-user concurrent upload data to DFS table test results

480400a757d3e6b6b05b68225266e15f.png

Table 12. Comparison of test results of uploading various API data to DFS table (unit: mega/second)

21d7aa0804e7b8071826d823ecdf799e.png

Figure 2. Performance comparison of API uploading data to DFS table

04d63e968e5a9e48de61834aa4cec3cf.png

The test results show that when the number of users is less than 16, the performance advantages of C++ and Java are obvious, while the performance of Python and C# is slightly worse, and the throughput basically increases linearly. When the number of users exceeds 16, the network transmission reaches its limit, becoming a performance bottleneck, and the throughput is basically maintained at the limit of the network. The network is 10 Gigabit Ethernet with a limit of 1G, but because the transmitted data is compressed, the system throughput can reach up to 1.8G/sec.

5. Multi-user concurrent download data performance test

This section tests the speed of downloading data from DolphinDB concurrently by multiple users through the API. The database is deployed on SERVER1, multiple users download data at the same time on SERVER2 and SERVER3, and each user randomly selects a data node to connect. The total amount of data downloaded by each user is 5 million rows, each row is 45 bytes, a total of 225Mb, and each download data is 25,000 rows, and the concurrent performance under the scenario where the number of concurrent users is 1 to 128 are tested.

We tested the performance of concurrent client download data in the following two scenarios:

  • 5-year data volume: randomly select date and symbol to download from 5-year data, the amount of data involved is about 12T. Since the amount of data greatly exceeds the system memory, each download needs to load data from the disk;
  • 1 week data volume: randomly select symbols from the data of the last week to download, the data volume involved is about 60G. The memory allocated to DolphinDB is enough to hold 60G of data, and all the data is in the cache, so there is no need to load data from the disk for each download.

The performance test results of each API are as follows:

Table 13. C++ API data download data test results

4ef34c7041624a759a50c36bf7424646.jpeg

Table 14. Java API data download data test results

6f9038ab72e6586c3389a6a5dd29a76a.png

Table 15. C# API data download data test results

e6bb278ae9f40fd6fbee5fec003b313e.png

Table 16. Python API data download data test results

7cf954a6252daf8f9b1446c3ca97a435.png

Table 17. R API data download data test results

5f6e6d4221415c09685a3abbedc791d2.png

Table 18. Five-year data download throughput comparison of each API (unit: mega/second)

d45420752fdd9dc85a271ccde6fb5baf.png

Figure 3. API 5-year data download throughput comparison

07667320e97a62eb8078fb7938b1ae07.jpeg

From the test results, when the number of users is less than 64, the throughput basically increases linearly with the increase of the number of users. The performance of each API is not very different. The maximum throughput is about 350M. Since the data set is 12T, DolphinDB cannot cache all the data, it must be loaded from the disk every time, and the disk becomes the system bottleneck.

When the number of users is 128, the performance decreases. The reason is that DolphinDB loads data according to partitions. If a user wants to download the data of a certain stock on a certain day, the entire partition will be loaded into the memory, and then the data that the user needs will be returned. To the user. When there are too many concurrent users and data download requests are initiated at the same time, and because the amount of data is too large, the data basically needs to be loaded from the disk. 128 users read the disk concurrently, which intensifies the IO competition and reduces the overall throughput. .

Therefore, it is recommended that users in the scenario of high concurrency reading data, each node try to configure multiple independent data volumes to improve the concurrency of IO.

Table 19. Comparison of 1-week data download throughput of various APIs (unit: mega/second)

e0fa445821ec9afc43d95eb3532c22b9.png

Figure 4. One-week concurrent data download throughput comparison of various APIs

f202f2feb387e181e2eae7795fb5f433.png

From the test results, the throughput of each API basically increases linearly with the increase in the number of concurrent users. The memory allocated to DolphinDB can hold all the data for a week, and there is no need to load from the disk every time, so the maximum throughput is 1.4 Around G, the network limit has been reached (the network limit is 1G, due to data compression, the actual business data volume is 1.4G).

6. Compute concurrent performance test

The test in this section submits concurrent computing tasks to DolphinDB through the API, and calculates the minute-level K-line of a certain stock on a certain day. The total number of calculations is about 100 million.

We test the computing performance of different numbers of concurrent users (1~128) under two scenarios: 5-year data volume and 1-week data volume.

  • The total data volume in 5 years is 12T, and the memory cannot be fully cached, so almost every calculation needs to load data from the disk, which is an IO-intensive application scenario, and the disk is expected to become a performance bottleneck.
  • The data volume per week is about 60G, and the DolphinDB data nodes can be fully cached, so it is a computationally intensive application scenario. When multiple users are concurrent, the CPU is expected to become a performance bottleneck.

The test results of each API are as follows:

Table 20. C++ API calculation of the performance results of the minute bar

a42c333fee52ba406401c65bbcf98fde.jpeg

Table 21. Java API calculation of the performance results of minute bar

f3309c0ccd855612a2d4e75d18706ebf.png

Table 22. C# API calculates the performance results of the minute bar

a22af8e8f8900d35fc14612215e81191.png

Table 23. Python API calculates the performance results of minute bar

1d8032f536835c79c5708ea6e0a2c5c0.png

Table 24. R API calculated performance results of minute bar

439e40dee987f4e54ef9091714fc6587.png

Table 25. Comparison of 5 years data calculation throughput of various APIs (Unit: Mega/sec)

a688d6dba08f909bdc5f4dc5581e4e76.png

Figure 5. Comparison of 5-year data concurrent computing throughput of various APIs

77abd27859cbdaeb6b5809989fe410f0.jpeg

It can be seen from the above figure that when the number of users is less than 16, the throughput of each API basically increases linearly. When the number of users reaches 64, the throughput reaches the maximum; when the number of users increases to 128, the throughput drops instead. There are two aspects. On the one hand, a total of 12T data in 5 years, each time the date and symbol are randomly selected. After the number of concurrent users increases to a certain number, the DolphinDB memory cannot be fully accommodated, resulting in a large amount of data exchange between the memory and the disk, resulting in Performance drops; on the other hand, too many concurrent users lead to too many computing tasks in the system, and time-consuming task scheduling and distribution increases, resulting in lower throughput.

Table 22. Comparison of 1-week data calculation throughput of various APIs (Unit: Mega/sec)

ec5b35ad6482bf28e59c7e12733ac38e.png

Figure 5. Comparison of 1-week data concurrent computing throughput of various APIs

509e902a7ac894fd56d10680b118de32.jpeg

It can be seen from the test results that when the number of users is less than 64, the throughput increases steadily, and the performance of each API is not much different. When there are 64 concurrent users, the performance reaches the maximum, and the throughput of computing data is close to 7G/sec. Reaching 128G, due to too many system tasks, which greatly exceeds the number of physical machine threads (the number of physical machines in the cluster is 48 threads), resulting in frequent thread switching, a large number of tasks within the cluster increase in scheduling and distribution time, and throughput is reduced.

7. Summary

This time, we tested in detail the performance of DolphinDB C++, Java, C#, Python, and R API in data upload, data download, and calculation under different numbers of concurrent users. The conclusions are as follows:

Single-user data upload to the memory table, C++ has the best performance, the maximum throughput can reach 265Mbit/s, Java, Python, R can also reach 160-200Mbit/s, C# performance is slightly worse, the maximum throughput is about 60M . And as the batch size increases, the throughput increases significantly, especially for Python and R. Therefore, when writing, as long as the delay and memory allow, try to increase the batch size.

Multi-users write to the distributed DFS table concurrently. As the number of users increases, the throughput increases steadily before reaching the network limit. The overall performance C++ and Java performance advantages are obvious. When the number of concurrent users is about 32, the network It has become a bottleneck, and the performance of each API is basically the same. Due to data compression, the maximum throughput of the system reaches 1.8G/sec.

Multi-users download data concurrently . In the scenario where the data set is 12T for 5 years, the maximum throughput is reached when the number of users is 64, which is about 380 Mbit/s. In this scenario, all data needs to be loaded from the disk, and disk reading becomes a performance bottleneck. When the number of users is 128, because each node has to accept a large number of users to download, the disk IO competition is fierce, resulting in a decrease in overall throughput. In the scenario where the data set is about 60G per week, 6 data nodes can cache all the data, so all the data is in the memory and does not need to be loaded from the disk, and the throughput can reach the network limit. Due to data compression, the cluster The throughput is 1.8G/sec, and with the increase of concurrent users, the throughput increases steadily.

Multi-user concurrent computing . Each API sends a minute K-line task for calculating a certain stock on a certain day to DolphinDB and returns the calculation result. The amount of data transmitted over the network is very small, and most of the time is done on the server side. Therefore, each API The performance is basically the same. In both scenarios of 5-year and 1-week data volume, the throughput trend is basically the same. Both reach the maximum throughput when the number of users is 64. When the data volume is 5 years, the maximum throughput is 1.3G, while the data volume per week is due to All data is in memory, and the maximum throughput reaches 7GB. When there are 128 users, the throughput drops mainly because there are too many system tasks, which greatly exceed the number of physical machine threads (the cluster is located in a physical machine with a total of 48 threads), resulting in frequent thread switching, and a large number of tasks within the cluster.

In general, DolphinDB uses APIs to perform tasks such as data retrieval, data upload, and data calculation. With the increase in concurrency, the performance is steadily improved, and it basically meets the performance requirements of most businesses.


Guess you like

Origin blog.51cto.com/15022783/2605191