The main article is a test of the indicators I realized B + tree results show. B + CRUD specific text tree algorithm not covered, it may be added subsequent.
project address
B + tree Profile
Quoted from Wikipedia
B + tree is a tree data structure, typically the database and operating system's file system. Characteristics of the B + tree is able to maintain stable and orderly data, which has a more stable insertion and modification time complexity of the number of pairs. B + tree element is inserted from the bottom up, which is just the opposite binary tree.
If you have wanted to learn c ++ programmers, you can come to our C / C ++ learning buckle qun: 589348389,
free delivery C ++ Video Tutorial Oh!
Each 20:00 I will live in the group to explain the C / C ++ knowledge, welcome everyone to learn Oh.
B + tree structure
B + tree has an important parameter called 阶
(m), determines the number of a B + tree each node stores a key promoter.
Each node are sequentially stored in a set of keywords, for the non-root nodes, which key in the tree s> = (m + 1) / 2. For the leaf nodes, a pointer to the value stored in the structure, corresponding to the keyword, as well as a next pointer to the next sibling leaf node, so finding the leftmost leaf node can traverse the list in order of a keyword; for non-leaf nodes , there s a pointer pointing to child nodes.
B + tree split by the insertion, by keyword or merge sibling to sibling nodes when deleting a balance, all the leaf nodes are at the same level. Query, insert, delete all efficiency Log(N)
.
Realization of public API
template<typename K, typename V> class BPTree { private: ... public: // constructor and destructors ... /** * deserialize from a file */ static BPTree<K, V> deserialize(const std::string &path); static BPTree<K, V> deserialize(const std::string &path, comparator<K> comp); void put(const K &key, const V &value); void remove(K &key); /** * @return NULL if not exists else a pointer to the value */ V *get(const K &key); bool containsKey(const K &key); int getOrder(); int getSize(); /** * iterate order by key * @param func call func(key, value) for each. func returns true means iteration ends */ void foreach(biApply<K, V> func); void foreachReverse(biApply<K, V> func); void foreachIndex(biApplyIndex<K, V> func); void foreachIndexReverse(biApplyIndex<K, V> func); void serialize(std::string &path); /** * clear the tree * note that all values allocated will be freed */ void clear(); }; Copy the code
Tips: For compatibility custom categories, need to pass comparison of the function pointers, or implement respective>, =, <, etc. operator;
Important data structure
Node: the B + tree index node
Main data structure is as follows:
struct Node { // parent // if root, parentPtr == NULL Node *parentPtr = NULL; // flag bool leaf; List<K> keys; /*-------leaf--------*/ Node *previous = NULL; Node *next = NULL; List<V> values; /*-------index-------*/ List<Node *> childNodePtrs; // for init int initCap; // constructor ... }; Copy the code
List <T>: List using fixed-length array implementation than std :: vector <T> function more simple and efficient; memory decreases after removal of a certain number of elements.
Serialization
- File suffix bpt
- Header format:
Offset (byte) | Size (byte) | content |
---|---|---|
0 | 4 | LYC \ 0 head logo |
4 | 4 | order, int type, stage B + Tree |
8 | 4 | initCap, int type, the size of the pre-assigned to each node |
12 | 4 | size, int type, the number of elements |
- If the size is not 0, after the head is the root node, the node has the same format, in front of the node common format:
Offset (relative to the starting node, byte) | Size (byte) | content |
---|---|---|
0 | 4 | leaf, int type, identifies whether the node is a leaf node |
4 | 4 | sizeofK, int type, key form represents the number of bytes |
8 | 4 | The number of keywords kSize, int type, the node has |
12 | kSize * sizeofK | In order to store keywords |
- For leaf nodes
Offset (relative to the starting node, byte) | Size (byte) | content |
---|---|---|
12 + kSize * sizeofK | 4 | sizeofV, int type, value type representing the number of bytes |
16 + kSize * sizeofK | kSize*sizeofK | Sequentially storing values |
- For non-leaf node
Offset (relative to the starting node, byte) | Size (byte) | content |
---|---|---|
12 + kSize * sizeofK | ksiz A * 8 | long type, the byte order is stored in the file offset point |
Points achieved
- The initial implementation is to use a vector, measuring down performance is not particularly good;
- Based on ordered features within a node key, use binary search when looking for;
- Memory should store a pointer to a child node of the node. Because it involves split, merge, you need to copy the list, if the structure is stored, copied cause recursive copy, inefficient, and difficult to control memory;
- Because the
root
node is no minimum keyword limits, after you delete node operation, you need to check theroot
number of child nodes, if 1, directly to theroot
set point for the byteroot
, or after the child is removed may result inroot
a keyword, the child node is lost . - Each time you insert, you need to update parent up after the last key delete nodes.
- Split, for the next leaf node and pointers to update the previous merge operation.
test
test environment:
File main.cpp
has the following macro, 1 open test:
List // test performance (and vector comparison) #define TEST_LIST 0 // function to test the correctness of B + tree #define TEST_FUNC 0 // Test B + tree speed (deletions change check) #define TEST_SPEED 0 // Test B + tree and the heap memory leak (using test tool after build) #define TEST_MEM 0 // Test B + tree serialization and deserialization #define TEST_SERIAL 0 Copy the code
List Test
- Volume: 10 ^ 5
- Add, delete data, corresponding to the assertion position is as expected (functional testing)
- Test tail insert (not pre-assigned and pre-allocated)
- Head insertion test
- Head remove test
- Last delete the test
- rangeRemove test
Test run results:
form:
List(ms) | vector(ms) | |
---|---|---|
Insert the tail | 1.506 | 4.724 |
Inserting the tail (pre-allocated space) | 1.201 | 2.765 |
Head insert | 834.804 | 906.981 |
Remove half of the elements (of the head) | 619.493 | 805.379 |
Remove half of the element (the tail) | 1.444 | 7.523 |
rangeRemove (half) | 0.065 | 0.558 |
Histogram:
B + tree function test
- Volume: 10 ^ 5
- After inserting data into data exists asserts
- Remove half of the data, the assertion remove data does not exist, does not remove the data exists
- Re-insert all the data, all the data are present assertions (delete test whether structural damage)
- Data do not exist prior to the assertion (after) clear
- After inserting all the data, the test traversal method (key test sequence)
Speed Test
- Volume: 10 8,10 ^ ^ ^ 6 7,10
- B + tree of order log (TEST_SIZE) ^ 2
- Data insertion cycle
- Cycle access to all data
- Cycle to remove data
Test run results:
form:
bp tree(ms) | stl map (ms) | |
---|---|---|
Insert (10 ^ 8) | 192808.064 | 325621.333 |
Access (10 ^ 8) | 163102.022 | 280150.403 |
Removing (10 ^ 8) | 213982.406 | 366576.836 |
Insert (10 ^ 7) | 11825.821 | 22213.139 |
Access (10 ^ 7) | 10190.870 | 18137.073 |
Removing (10 ^ 7) | 15130.015 | 22133.154 |
Insert (10 ^ 6) | 1057.291 | 1624.615 |
Access (10 ^ 6) | 888.186 | 1155.504 |
Removing (10 ^ 6) | 1099.584 | 1495.433 |
Histogram:
If you have wanted to learn c ++ programmers, you can come to our C / C ++ learning buckle qun: 589348389,
free delivery C ++ Video Tutorial Oh!
Each 20:00 I will live in the group to explain the C / C ++ knowledge, welcome everyone to learn Oh.