VoltDB Database

Transfer: https://blog.csdn.net/ransom0512/article/details/50440316

http://www.360doc.com/content/16/0712/11/9200790_574921580.shtml

https://www.cnblogs.com/kkyycom/p/9359090.html

 

VoltDB database is a distributed, scalable, shared-nothing-memory database. Use stored procedures written in JAVA define transactions. Use standard SQL to access data, use the single-threaded parallel processing mode to ensure data consistency, while avoiding the locks, latches, resource management overhead of a traditional database. 
VoltDB has the following characteristics:

  • High throughput: millions of times per second
  • Horizontal expansion: the freedom to expand based on demand, performance of linear growth.
  • High availability: supported by a copy of the data can also be persistently saved, in addition, also supports dual-alive mechanism.
  • Real-time data analysis: data real-time high, as are the in-memory computing.
  • Full ACID support to ensure transactional and reliability.

Dropped VoltDB design motives from memory cost of the system for time-critical data is increasing, while traditional database because the data stored in the local file, so whether or concurrent processing speed, are difficult to meet the requirements. The new NoSQL database, SQL support, and the lack of full support for ACID completely unable traditional bill of lading database. 
VoltDB, NoSQL comparison with traditional relational database is as follows: 
Contrast VoltDB, NoSQL and traditional relational databases

Applicable scene

VoltDB for OLTP systems, a single transaction is small, but very much of the total amount of the transaction application. Such as finance, retail, WEB2.0 other traditional OLTP applications. Not suitable for frequent range queries or multi-table such a scenario Join.

Design ideas

High throughput, real-time

VoltDB by traditional database analysis, found that only 12% of the data in the CPU time to do meaningful data manipulation, while most of the rest of the time is cached, concurrency control and other steps consumed. 
Traditional database performance cost analysis

  • Management Index (Index Management): The index database is generally based on the B-tree, these indexes will consume significant IO and CPU.
  • Log (Logging): the traditional database usually write two log database is a data storage section, a database recovery log, and these operations must be mandatory to brush up the disk, which bring significant IO consumption.
  • Lock (Locking): data read and write operations are related to the lock, which is a very frequent operation.
  • Lock Manager (Latching): globally shared data such as index data and table metadata, information and other resources, must guarantee reliable operation in multi-threaded environment, so that the lock manager will no doubt consume more CPU resources.
  • Cache management (Buffer Management): Data is stored in fixed-size disk page, the buffer pool manages these disk pages, which will be made by IO-intensive operations.

In summary, 88% of the CPU time is wasted in these meaningless for practical steps go up, to enhance database performance, only to reduce this redundancy fundamental steps to centralize data operations to fully utilize CPU. 
VoltDB through memory storage, data partitioning, and no lock operation is performed with high-performance computing.

  • Memory storage 
    VoltDB All data is stored in memory (there will be reliability data to disk brush, see the reliability of the design VoltDB ACID), memory access speed has been far higher than the disk several orders of magnitude, and this is the high VoltDB important for performance reasons.
  • Data partitions 
    VoltDB, create manage memory at each node of the plurality of partitions on each node, all the data in the partition table, are dispersed in each partition, and at the time of reading and writing, a plurality of partitions can be achieved concurrent carried out, so scalability is linear improvement. 
    This partitioning mechanism can also cause problems when cluster capacity is needed, it is necessary to stop the entire cluster, and then the expansion; when the cluster starts, VoltDB will re-adjust the data distribution, data distribution after all the adjustment is completed, began Provide services.
  • No lock calculated 
    VoltDB data partition is stored, when the SQL statement is executed, the client is automatically determined according to the conditions in which the data partition, the partition is then issued to execution. If, after the query does not contain the partitioning column, it will be centralized control by the client are queried on each partition, and then return to a unified result, this scenario will greatly affect performance. 
    VoltDB procedures are based on stored procedure execution, support the use of java or any other language-defined stored procedure. Each partition stored procedure execution are linear single-threaded execution, which ensures that no single partition lock design. When a statement involves a number of distinguished coordinated to read and write, VoltDB will be in a coordinated, integrated lock partition queue, and so on after the statement is finished, will release the lock. So multi-partition operating performance will be so consumed. 
    VoltDB management on the partition, each physical CPU is recommended to create a partition, so that a single data partition in the CPU cache and secondary cache, to avoid a plurality of data operation between the CPU, the maximum increase CPU utilization, avoid concurrent locks. So theoretically, VoltDB CPU usage is 100 percent.

Horizontal expansion

VoltDB multi-partition design, so that the data dispersed in each partition, each partition can provide concurrent access, both to enhance performance, but also achieve the effect of no lock. So in theory, VoltDB lateral expansion may be such that linear performance improvement.

High Availability

VoltDB using K-safety, seki, snapshot, WAL combination mechanism mechanisms to ensure high data availability.

  • K-Safety 
    is actually a copy mechanism N + 1, VoltDB when writing data operation, the statement will be executed in each copy so that you can ensure that data is properly inserted in each copy. This copy N + 1 can provide simultaneous access, while allowing up to N copies of loss (partition failure) when N + 1 copies are not available, VoltDB will be out of service for repair.
  • Dual live 
    multi-cluster-active mechanism, two clusters can provide services, asynchronous replication of data across multiple partitions, when a cluster hung up when another cluster of service, when the abnormal cluster recovery, data will be automatically synchronized only when the same data, will provide services. But this mechanism but it is still a problem, it may result in inconsistent data, synchronous replication mechanism is still needed.
  • Snapshot 
    Since the data is stored in memory, when the node fails, the data will be lost, so VoltDB regularly make snapshots of data for each partition, power down time to prepare for node recovery.
  • WAL 
    the Write Ahead log, VoltDB will be when the data insertion, pre-write operation logs, the traditional database and the same, but the order because it is written, so the performance is much better than a traditional database. 
    Snapshot WAL mechanism and the mechanism will lead to performance resulting in a decline of about 5%, but this sacrifice is to complete ACID had to make.

Partition table and copy table

  • Partition Table 
    After you create a table, you need to manually partition statements partition table. 
    PARTITION TABLE towns ON COLUMN state_num; 
    the statement that the use of fields state_numn towns partition table. After inserting the data when, VoltDB will automatically insert the data into the specified partition. 
    Hash VoltDB currently only supports partition, follow-up might consider supporting range partitioning. 
    VoltDB partition is a logical concept, a partition may contain a plurality of data tables of a plurality of partitions. VoltDB partition the recommended number of physical CPU, each CPU uses a separate partition, to avoid competition between the CPU lock.
  • Copy the table 
    a table, do not specify a partition, then that is a copy table. This feature table is saved in each district a copy of the full amount, and so do other tables in Join query, they will not cross-node query, greatly accelerate Join speed. 
    Copy table suitable for the data is relatively stable, only a small update scene. If the data is to be updated, it means to perform one insert on each partition, this operation will significantly reduce system concurrency.

performance

VoltDB claims scalability have very high, more than 120 partitions, server 39 can handle complex transactions per second at 1.6 million CPU core 300

VoltDB and has a performance comparison in a profile database, the following results: 
the Dell R610, 2x Xeon 2.66GHz Core Quad-5550 with 12X 4GB (48GB) DDR3-1333 Registered the ECC DIMMs, the RPM 2.5in 3x 72GB 15K 6Gbps Drives the SAS Enterprise 
VoltDB performance comparison

ACID

  • Atomicity (Atomicity) 
     VoltDB to ensure atomicity by using stored procedures, a stored procedure to execute a stored procedure must wait before success or failure because of the rollback end.
  • Consistency (Consistency) 
    VoltDB strong data typing conventions, forced schema and data type constraints in all database queries.
  • Isolation (Isolation) 
     VoltDB global transaction (all partitions affected) sequentially performed (without cross) (any partition Only one execution, i.e., serial).
  • Persistent (Durability Rev) 
     VoltDB mechanism provides K-safely and snapshot, to ensure data persistence.

License

License as AGPL, the license requirements more stringent than the GPL, even if the product WEB way released, must also be open source. When in use need to be careful.

Guess you like

Origin www.cnblogs.com/goodfuture/p/11584225.html