NFTScan x TiDB丨One-stack HTAP database provides millisecond-level multi-dimensional query for Web3 data services

guide

NFTScan is a multi-chain NFT data infrastructure service provider, providing efficient and concise NFT asset search and query services for Web3 users, and professional NFT API data services for Web3 developers and new generation financial technology companies.

As a distributed HTAP database, TiDB can meet the requirements of massive data storage and high concurrent reading and writing at the same time. It has advantages in high availability, distributed architecture, ACID transaction support and real-time multi-dimensional query. It is suitable for the Web3 industry. Scenario requirements.

During the rapid development, NFTScan found that the traditional MySQL database could not meet the rapid growth of business, and TiDB can provide millisecond-level multi-dimensional query capabilities, providing more efficient services for NFTScan, so TiDB was chosen as the core data architecture. This article introduces the challenges faced by the NFTScan data architecture, considerations for model selection, the process of migrating to TiDB, and the benefits obtained after the migration. The integrated HTAP architecture can replace the capabilities of MySQL + Elasticsearch and become the best choice for supporting online data services.

Founded in April 2021, NFTScan is a multi-chain NFT data infrastructure service provider. As of January 2023, we have supported 11 blockchain networks, including Ethereum, Solana, BNBChain, Moonbeam, Polygon, Arbitrum, Optimism, Avalanche, Fantom, Cronos, PlatON networks.

NFTScan has two core businesses: NFTScan.COM multi-chain NFT data browser platform and NFTScan OpenAPI developer platform. NFTScan mainly provides Web3 users with efficient and concise NFT asset search and query services, as well as professional NFT API data services for Web3 developers and new generation financial technology companies.

insert image description here

Currently, the NFTScan database contains 1 million+ NFT contract addresses, more than 700 million NFT asset data, and 1.7 billion multi-chain NFT chain interaction records. And this number is still increasing at a rate of 3,000 NFT contract addresses and 2 million NFT assets per day. It can be seen from the above data that NFTScan has two characteristics: large increment and high activity. Such business characteristics determine that we have extremely high requirements for the technical architecture of the database, which needs to be comprehensive, real-time, and efficient, and meet the requirements of high concurrency and low latency. It is very important for NFTScan to choose an appropriate data storage system that can meet business needs.

insert image description here

Challenges of previous MySQL solutions

Previously, NFTScan used MySQL and Elasticsearch on Amazon Web Services (AWS) as its core database solution. MySQL stores all business data, including data for analysis and processing from B-side and C-side users. Among them, the transaction records and asset records of NFT are the core business data models, and most of the queries on the B-side and C-side are carried out around these two types of core data. Since NFT data continues to grow every day, multi-dimensional queries will have some uneven distribution. NFTScan will synchronize NFT transaction and asset-related data to Elasticsearch in a full-index manner, and respond to multi-dimensional NFT data queries in a nearly full-field index , so as to solve the performance and efficiency bottleneck of MySQL in multi-dimensional retrieval of massive data.

After using this solution for half a year, we gradually found that it cannot meet the rapid growth of business, and has the following defects:

  • Poor scalability, high storage and maintenance costs . The amount of new blockchain data increased dramatically every day, but MySQL could not automatically scale out to handle the ever-increasing workload. We had to manually shard the table and add MySQL's active and standby clusters to share and balance the use of CPU and memory resources, which greatly increased storage and maintenance costs.
  • As costs increase, usage decreases . Elasticsearch is deployed on AWS. Due to the limitations of AWS's native cluster configuration, we have to add more Elasticsearch high-configuration data nodes to provide online query services, which leads to increased costs and reduced utilization.
  • Repeated precision error . Elasticsearch databases are designed more for searching than computing, so there are precision errors in aggregation calculations.

Why choose TiDB?

After nearly a month of research and testing, we finally chose TiDB as the core data architecture to replace the original database system. The NFTScan R&D team chose TiDB in the survey mainly for the following considerations:

  • Highly Compatible with MySQL : TiDB is highly compatible with MySQL in terms of transmission protocol and SQL syntax. NFTScan can easily migrate data to TiDB. MySQL compatibility greatly reduces the learning cost, time and effort for the R&D team to use a new database. Speed ​​up the migration of database schema;
  • Elastic scaling : TiDB adopts a distributed architecture that separates computing and storage and the underlying distributed storage data design mechanism. NFTScan can flexibly scale computing and storage resources according to real-time changes in read and write traffic, maximizing resource usage and greatly reducing costs;
  • Integrated HTAP architecture : TiDB's HTAP capability can handle transaction and analysis workloads at the same time, and a single database can meet the needs of transactional and analytical databases, which not only perfectly meets the growing business needs of NFTScan, but also reduces the overall operating costs;
  • High availability : TiDB's own data copy synchronization mechanism and built-in disaster recovery solution ensure the high availability of the overall database service.

Migration plan

After two months, we completed the work of switching all the underlying database systems to TiDB. By deploying 2 TiDB servers, 9 TiKV servers, and 2 TiFlash servers, and under the same region, across three availability zones ( AZ) for deployment to ensure high availability of the overall architecture.

As of November 2022, NFTScan's TiDB database stores about 6TB of business data, with a QPS of 5000 and an average query time of 40ms. Various applications run stably on TiDB.

Smooth Migration Experience

Throughout the migration process, we were impressed with TiDB's performance and the smoothness of data migration.

TiDB provides a series of data synchronization suites such as Dumpling and TiDB Data Migration (DM) to help NFTScan migrate historical data from MySQL to TiDB. For example, some business data of NFTScan cannot be directly migrated to TiDB, and must be adjusted before migration. In this case, TiDB's synchronization tool can write large amounts of data concurrently. When parsing and storing real-time NFT data, the execution efficiency is improved by about 30% compared with the previous storage solution.

At the same time, TiDB's online schema update (online schema update) design enables NFTScan to perform data definition language (DDL) operations such as asynchronously changing fields and asynchronously adding indexes during the migration process without blocking the read and write of the entire table. Improve the flexibility of data mode when business logic is adjusted. After the migration is completed, NFTScan has transformed the data query of various applications on the B-side and C-side. After full tuning and testing, it will gradually switch all the applications in the production environment to TiDB.

use income

  • TiDB supports multi-dimensional real-time query with short query time . TiDB perfectly meets the core requirements of NFTScan's high throughput and low latency. Taking the API service on the business side as an example, the average query time has dropped from 10-100 milliseconds to 10 milliseconds or less. Such query speed remains stable even when processing 1,000 QPS.
  • TiFlash, TiDB's columnar storage engine, can efficiently handle analytical workloads . For example, when executing a complex query on a table with hundreds of millions of rows, results can be obtained in seconds.
  • TiDB's intelligent SQL optimizer can select the most cost-effective data query execution plan according to the distribution of data, allowing developers to flexibly adjust and optimize SQL execution plans.

Guess you like

Origin blog.csdn.net/TiDB_PingCAP/article/details/129185570