Last October, TiDB version 1.0 was released . In the next six months, the development team maintained the stability of version 1.0 and added necessary new features, while developing version 2.0 non-stop. After 6 RC versions, TiDB 2.0 GA version was officially released on April 27th.
2.0 version planning
According to the situation of existing users, technology development trends and the voice of the community, the TiDB 2.0 version mainly focuses on the following points:
Ensure the stability and correctness of TiDB. These two points are the basic functions of a database software. As the cornerstone of the business, any jitter or error may have a huge impact on the business. At present, a large number of users are using TiDB online, and the data volume of these users is increasing and their services are constantly evolving.
Improve the query performance of TiDB under large data volumes. At present, many TiDB customers have data ranging from hundreds of GB to hundreds of terabytes. Therefore, it will be very helpful for users if the query performance under large data volume can be improved.
Optimize TiDB for ease of use and maintainability. The complexity of the entire TiDB system is relatively high, and the operation, maintenance and use are more difficult than stand-alone databases, so I hope to provide the most convenient solution to help users use TiDB. For example, simplify the deployment, upgrade, and expansion methods as much as possible, and locate abnormal states in the system as easily as possible.
Around the above three principles, TiDB has made a lot of improvements, some of which are visible to the outside world, such as a significant improvement in OLAP performance, a large increase in monitoring items, and various optimizations of operation and maintenance tools, and many more improvements are hidden behind the database. , silently improve the stability and correctness of the entire database.
correctness and stability
After the release of version 1.0, TiDB began to build and improve the automated testing platform Schrodinger, completely saying goodbye to the previous method of manually deploying cluster tests. At the same time, a lot of test cases have been added, so that the tests can be covered from the bottom RocksDB, to Raft, to Transaction, and then to SQL.
In the Chaos test, TiDB introduced more error injection tools, such as using systemtap to delay I/O, etc., and also conducted error injection tests for code-specific business logic, fully ensuring that TiDB can run stably under abnormal conditions. .
The development team of TiDB has done a lot of TLA+ demonstration work before, and there are also some simple tests. After 1.0, they started to use the TLA+ system for demonstration to ensure that the implementation is correct in design.
In terms of storage engine, in order to improve the stability and performance of large-scale clusters, TiDB optimizes the process of Raft and introduces new features such as Region Merge and Raft Learner; optimizes the hotspot scheduling mechanism, collects more information, and makes updates based on this information. Reasonable scheduling; optimize RocksDB performance, use features such as DeleteFilesInRanges, improve space reclamation efficiency, reduce disk load, and use disk resources more smoothly, etc.
OLAP performance optimization
TiDB 2.0 refactors the SQL optimizer and execution engine, hoping to select the optimal query plan as quickly as possible and execute the query plan as efficiently as possible.
Version 1.0 has shifted from a rule-based query optimizer to a cost-based query optimizer, but it is not perfect. In version 2.0, on the one hand, the accuracy and update timeliness of statistical information are optimized, and on the other hand, the ability of the SQL optimizer is improved. , the estimation of query cost is more accurate, the analysis of complex filter conditions is more detailed, the processing of correlated sub-queries is more elegant, and the selection of physical operators is more flexible and accurate.
In this version, the SQL execution engine introduces a new internal data representation --- `Chunk`, which stores a batch of data in a structure instead of just a row of data, and the data of the same column is stored continuously in memory, making memory usage more efficient. Compact, which brings several advantages: 1. Significantly reduces memory consumption; 2. Batch allocates memory, reducing GC overhead; 3. Data can be transferred in batches between operators, reducing call overhead; 4. . In some scenarios, vector calculations can be performed and the Cache Miss of the CPU can be reduced.
After completing the above two changes, the performance of TiDB in OLAP scenarios has been greatly improved. From the comparison results of TPC-H, all queries run faster in 2.0, and most of some queries have The improvement of several times or even orders of magnitude, especially some queries that cannot run the results in 1.0 can be executed smoothly in 2.0.
Ease of Use and Operability
In order to be easier to install and use, TiDB 2.0 has also made many optimizations in monitoring, operation and maintenance, and tools.
In terms of monitoring, more than 100 monitoring items have been added, and some runtime information is exposed through HTTP interfaces, SQL statements, etc., which are used for system tuning or locating problems in the system.
In terms of operation and maintenance, the operation and maintenance tools have been optimized to simplify the operation process, reduce the operation complexity and the impact of the operation process on the online. At the same time, the functions are also richer, supporting automatic deployment of Binlog components and enabling TLS.
2.0 Detailed update list
TiDB:
1. SQL Optimizer
Streamlined statistics data structure to reduce memory usage
Speed up loading stats on process startup
Support dynamic update of statistics [experimental]
Optimize the cost model for more accurate cost estimation
Use `Count-Min Sketch` to estimate the cost of a count more accurately
Supports analysis of more complex conditions, using indexes as fully as possible
Support for manually specifying Join order via `STRAIGHT_JOIN` syntax
Use the Stream Aggregation operator when the `GROUP BY` clause is empty to improve performance
Support for calculating `Max/Min` functions using indexes
Optimize the processing algorithm of correlated sub-queries to support disassociating and transforming more types of correlated sub-queries into `Left Outer Join`
Expand the scope of use of `IndexLookupJoin`, the algorithm can also be used in the scene of index prefix matching
2. SQL execution engine
Use Chunk structure to reconstruct all executor operators, improve analytical statement execution performance, reduce memory usage, and significantly improve TPC-H results
Support Streaming Aggregation operator pushdown
Optimized the performance of `Insert Into Ignore` statement by more than 10 times
Optimized the performance of `Insert On Duplicate Key Update` statement by more than 10 times
Push down more data types and functions to TiKV calculation
Optimized `Load Data` performance, increased by more than 10 times
Supports statistics on the memory usage of physical operators, and specifies the processing behavior after the threshold is exceeded through configuration files and system variables
Supports limiting the size of memory used by a single SQL statement to reduce the risk of program OOM
Support for implicit row IDs in CRUD operations
Improve check performance
3.Server
Support Proxy Protocol
Add a lot of monitoring items, optimize logs
Validation of configuration files is supported
Support HTTP API to obtain TiDB parameter information
Use Batch Resolve Lock to improve garbage collection speed
Supports multi-threaded garbage collection
TLS support
4. Compatibility
Support for more MySQL syntax
Support configuration file to modify `lower_case_table_names` system variable to support OGG data synchronization tool
Improve compatibility with Navicat
Support showing table creation time in `Information_Schema`
Fix the problem that the return type of some functions/expressions is different from that of MySQL
Improve compatibility with JDBC
Support more `SQL_MODE`
5.DDL
Optimize the execution speed of `Add Index`, the speed is greatly improved in some scenarios
The `Add Index` operation is changed to a low priority to reduce the impact on online business
`Admin Show DDL Jobs` outputs more detailed DDL job status information
Support `Admin Show DDL Job Queries JobID` to query the raw statement of currently running DDL jobs
Supports the `Admin Recover Index` command for recovering index data in disaster recovery situations
Support modifying Table Options via `Alter` statement
PD:
1. Add `Region Merge` support, merge empty Regions generated after data deletion [experimental]
2. Add `Raft Learner` support [experimental]
3. Scheduler optimization
The scheduler adapts to different Region sizes
Improve the priority and speed of data recovery when TiKV is down
Improve the speed of data migration for offline TiKV nodes
Optimize the scheduling strategy when the TiKV node space is insufficient, and try to prevent the disk from being full when the space is insufficient
Improve the scheduling efficiency of the balance-leader scheduler
Reduce the scheduling overhead of the balance-region scheduler
Optimize the execution efficiency of hot-region scheduler
4. Operation and maintenance interface and configuration
Add TLS support
Support setting PD leader priority
Support label-based configuration properties
Nodes that support configuring specific labels do not schedule Region leaders
Supports manual Split Region, which can be used to deal with single Region hotspots
Support to break up the specified Region, for manual adjustment of hotspot Region distribution in some cases
Added configuration parameter inspection rules to improve the validity of configuration items
5. Debug interface
Added `Drop Region` debugging interface
Add an interface for enumerating each PD health status
6. Statistics related
Add statistics for abnormal Regions
Add Region isolation level statistics
Add scheduling related metrics
7. Performance optimization
The PD leader tries to keep pace with the etcd leader to improve write performance
Optimized Region heartbeat performance, now supports over 1 million Regions
TiKV:
1. Function
Protect critical configurations from erroneous modifications
Support `Region Merge` [experimental]
Add `Raw DeleteRange` API
Add `GetMetric` API
添加 `Raw Batch Put`,`Raw Batch Get`,`Raw Batch Delete` 和 `Raw Batch Scan`
Add Column Family parameter to Raw KV API, can operate on specific Column Family
Coprocessor supports streaming mode and streaming aggregation
Supports configuring the timeout period for Coprocessor requests
Heartbeat packets carry timestamps
Support online modification of some parameters of RocksDB, including `block-cache-size` size, etc.
Support for configuring the behavior of the Coprocessor when it encounters certain errors
Support to start in data import mode to reduce write amplification during data import
Supports manually splitting the region in half
Improve the data repair tool tikv-ctl
Coprocessor returns more statistics to guide TiDB's behavior
Support ImportSST API, can be used for SST file import [experimental]
Added TiKV Importer binary, integrated with TiDB Lightning for fast data import [experimental]
2. Performance
Use ReadPool to optimize read performance, `raw_get/get/batch_get` improves by 30%
Improve metrics performance
Notify PD immediately after Raft snapshot is processed to speed up scheduling
Solve the performance jitter problem caused by RocksDB flushing
Improve space reclamation after data deletion
Speed up the junk cleaning process during startup
Use `DeleteFilesInRanges` to reduce I/O overhead during replica migration
3. Stability
Solve the problem that the gRPC call does not return when the PD leader sends a switch
Solve the problem that the offline node is slow due to snapshot
Limit the amount of space temporarily occupied by moving copies
If there is a Region without a Leader for a long time, report it
Update the statistical Region size in time according to the compaction event
Limit the amount of data scanned by a single scan lock request to prevent timeouts
Limit the memory usage in the process of receiving snapshots to prevent OOM
Improve the speed of CI tests
Solve the OOM problem caused by too many snapshots
Configure gRPC's `keepalive` parameter
Fix the problem that the increase of Region is easy to OOM
In addition, TiSpark 1.0 GA version was released at the same time . The TiSpark 1.0 version component provides the ability to use Apache Spark for distributed computing on data on TiDB. Updates include:
1. Provides a gRPC communication framework for TiKV reading
2. Provides encoding and decoding of TiKV component data and communication protocol parts
3. Provides a calculation pushdown function, including
Aggregate pushdown
predicate pushdown
TopN push down
Limit push down
4. Provides index related support
Predicate transforms clustered index range
Predicate Transformation Secondary Index
Index Only query optimization
Runtime index degradation scan table optimization
5. Provides cost-based optimization
Statistics support
index selection
Broadcast Table Cost Estimation
6. Multiple Spark Interface support
Spark Shell support
ThriftServer/JDBC support
Spark-SQL interactive support
PySpark Shell support
SparkR support