Performance Test B + Tree (C ++ implementation)

The main article is a test of the indicators I realized B + tree results show. B + CRUD specific text tree algorithm not covered, it may be added subsequent.

project address

github.com/SirLYC/BPTr…

B + tree Profile

Quoted from  Wikipedia

B + tree is a tree data structure, typically the database and operating system's file system. Characteristics of the B + tree is able to maintain stable and orderly data, which has a more stable insertion and modification time complexity of the number of pairs. B + tree element is inserted from the bottom up, which is just the opposite binary tree.

If you have wanted to learn c ++ programmers, you can come to our C / C ++ learning buckle qun: 589348389,
free delivery C ++ Video Tutorial Oh!
Each 20:00 I will live in the group to explain the C / C ++ knowledge, welcome everyone to learn Oh.

B + tree structure

B + tree has an important parameter called   (m), determines the number of a B + tree each node stores a key promoter.

Each node are sequentially stored in a set of keywords, for the non-root nodes, which key in the tree s> = (m + 1) / 2. For the leaf nodes, a pointer to the value stored in the structure, corresponding to the keyword, as well as a next pointer to the next sibling leaf node, so finding the leftmost leaf node can traverse the list in order of a keyword; for non-leaf nodes , there s a pointer pointing to child nodes.

B + tree split by the insertion, by keyword or merge sibling to sibling nodes when deleting a balance, all the leaf nodes are at the same level. Query, insert, delete all efficiency  Log(N) .

Realization of public API

template<typename K, typename V>
class BPTree {
private:
    ...

public:
    // constructor and destructors
    ...

    /**
     * deserialize from a file
     */
    static BPTree<K, V> deserialize(const std::string &path);

    static BPTree<K, V> deserialize(const std::string &path, comparator<K> comp);

    void put(const K &key, const V &value);

    void remove(K &key);

    /**
     * @return NULL if not exists else a pointer to the value
     */
    V *get(const K &key);

    bool containsKey(const K &key);

    int getOrder();

    int getSize();

    /**
     * iterate order by key
     * @param func call func(key, value) for each. func returns true means iteration ends
     */
    void foreach(biApply<K, V> func);

    void foreachReverse(biApply<K, V> func);

    void foreachIndex(biApplyIndex<K, V> func);

    void foreachIndexReverse(biApplyIndex<K, V> func);

    void serialize(std::string &path);

    /**
     * clear the tree
     * note that all values allocated will be freed
     */
    void clear();
};
Copy the code

Tips: For compatibility custom categories, need to pass comparison of the function pointers, or implement respective>, =, <, etc. operator;

Important data structure

Node: the B + tree index node

Main data structure is as follows:

struct Node {
    // parent
    // if root, parentPtr == NULL
    Node *parentPtr = NULL;
    // flag
    bool leaf;
    List<K> keys;
    /*-------leaf--------*/
    Node *previous = NULL;
    Node *next = NULL;
    List<V> values;
    /*-------index-------*/
    List<Node *> childNodePtrs;
    // for init
    int initCap;
    // constructor
    ...
};
Copy the code

List <T>: List using fixed-length array implementation than std :: vector <T> function more simple and efficient; memory decreases after removal of a certain number of elements.

Serialization

  • File suffix bpt
  • Header format:
Offset (byte) Size (byte) content
0 4 LYC \ 0 head logo
4 4 order, int type, stage B + Tree
8 4 initCap, int type, the size of the pre-assigned to each node
12 4 size, int type, the number of elements
  • If the size is not 0, after the head is the root node, the node has the same format, in front of the node common format:
Offset (relative to the starting node, byte) Size (byte) content
0 4 leaf, int type, identifies whether the node is a leaf node
4 4 sizeofK, int type, key form represents the number of bytes
8 4 The number of keywords kSize, int type, the node has
12 kSize * sizeofK In order to store keywords
  • For leaf nodes
Offset (relative to the starting node, byte) Size (byte) content
12 + kSize * sizeofK 4 sizeofV, int type, value type representing the number of bytes
16 + kSize * sizeofK kSize*sizeofK Sequentially storing values
  • For non-leaf node
Offset (relative to the starting node, byte) Size (byte) content
12 + kSize * sizeofK ksiz A * 8 long type, the byte order is stored in the file offset point

Points achieved

  • The initial implementation is to use a vector, measuring down performance is not particularly good;
  • Based on ordered features within a node key, use binary search when looking for;
  • Memory should store a pointer to a child node of the node. Because it involves split, merge, you need to copy the list, if the structure is stored, copied cause recursive copy, inefficient, and difficult to control memory;
  • Because the  root node is no minimum keyword limits, after you delete node operation, you need to check the  root number of child nodes, if 1, directly to the  root set point for the byte  root , or after the child is removed may result in  roota keyword, the child node is lost .
  • Each time you insert, you need to update parent up after the last key delete nodes.
  • Split, for the next leaf node and pointers to update the previous merge operation.

test

test environment:

File  main.cpp has the following macro, 1 open test:

List // test performance (and vector comparison)
#define TEST_LIST 0
// function to test the correctness of B + tree
#define TEST_FUNC 0
// Test B + tree speed (deletions change check)
#define TEST_SPEED 0
// Test B + tree and the heap memory leak (using test tool after build)
#define TEST_MEM 0
// Test B + tree serialization and deserialization
#define TEST_SERIAL 0
Copy the code

List Test

  • Volume: 10 ^ 5
  • Add, delete data, corresponding to the assertion position is as expected (functional testing)
  • Test tail insert (not pre-assigned and pre-allocated)
  • Head insertion test
  • Head remove test
  • Last delete the test
  • rangeRemove test

Test run results:

form:

  List(ms) vector(ms)
Insert the tail 1.506 4.724
Inserting the tail (pre-allocated space) 1.201 2.765
Head insert 834.804 906.981
Remove half of the elements (of the head) 619.493 805.379
Remove half of the element (the tail) 1.444 7.523
rangeRemove (half) 0.065 0.558

Histogram:

B + tree function test

  • Volume: 10 ^ 5
  • After inserting data into data exists asserts
  • Remove half of the data, the assertion remove data does not exist, does not remove the data exists
  • Re-insert all the data, all the data are present assertions (delete test whether structural damage)
  • Data do not exist prior to the assertion (after) clear
  • After inserting all the data, the test traversal method (key test sequence)

Speed ​​Test

  • Volume: 10 8,10 ^ ^ ^ 6 7,10
  • B + tree of order log (TEST_SIZE) ^ 2
  • Data insertion cycle
  • Cycle access to all data
  • Cycle to remove data

Test run results:

form:

  bp tree(ms) stl map (ms)
Insert (10 ^ 8) 192808.064 325621.333
Access (10 ^ 8) 163102.022 280150.403
Removing (10 ^ 8) 213982.406 366576.836
Insert (10 ^ 7) 11825.821 22213.139
Access (10 ^ 7) 10190.870 18137.073
Removing (10 ^ 7) 15130.015 22133.154
Insert (10 ^ 6) 1057.291 1624.615
Access (10 ^ 6) 888.186 1155.504
Removing (10 ^ 6) 1099.584 1495.433

Histogram:

If you have wanted to learn c ++ programmers, you can come to our C / C ++ learning buckle qun: 589348389,
free delivery C ++ Video Tutorial Oh!
Each 20:00 I will live in the group to explain the C / C ++ knowledge, welcome everyone to learn Oh.

Guess you like

Origin blog.csdn.net/XZQ121963/article/details/90812730