Understanding B-tree search

    In this tutorial, you will learn what a B-tree is. In addition, you will find examples in the C language.
    B-tree is a special type of self-balancing search tree, in which each node can contain multiple keys and can have more than two child nodes. It is a popularization form of binary search tree.
    It is also called a highly balanced m-way tree.
Insert picture description here

1. Why choose B-tree?

    As the time required to access physical storage media (such as hard drives) decreases, the demand for B-trees also increases. Auxiliary storage devices are slower and larger in capacity. It is necessary to adopt this type of data structure to minimize access to the disk.
    Other data structures such as binary search tree, avl tree, red-black tree, etc. can only store one key in one node. If you have to store a large number of keys, the height of these trees will become very large and the access time will increase.
    However, the B-tree can store multiple keys in a single node and can have multiple child nodes. This greatly reduces the height, allowing faster access to the disk.

2. B-tree attributes
  1. For each node x, the keys are stored in increasing order.
  2. In each node, there is a boolean value x.leaf, if x is a leaf, the value is true.
  3. If n is the order of the tree, each internal node can contain up to n-1 keys and pointers to each child node.
  4. Except for the root node, each node can have at most n child nodes, and at least n/2 child nodes.
  5. All leaves have the same depth (that is, the height of the tree h).
  6. The root has at least 2 children and contains at least 1 key.
  7. If n≥1, for any n-key B-tree, assuming the height is h and the minimum level t≥2, then h ≥ log ⁡ t (n + 1) / 2 h ≥ \log_t (n+1)/2hlogt(n+1)/2
3. Operation
3.1 Search

    Searching for an element in a B-tree is a general form of searching for an element in a binary search tree. Follow the steps below.

  1. Starting from the root node, compare k with the first key of the node. If k = the first key of the node, return the node and corresponding index.
  2. If k.leaf=true, it returns NULL (that is, not found).
  3. If k <the first key of the root node, recursively search the left child of this key.
  4. If there are multiple keys in the current node and k>the first key, compare k with the next key in the node.
    If k<next key, search for the left child of the key (that is, k is between the first key and the second key).
    Otherwise, search for the right child of the key.
  5. Repeat steps 1 to 4 until you reach the leaf.
3.2 Search example
  1. Let us search for the key k = 17 in the tree below level 3.
    Insert picture description here
  2. K cannot be found in the root, so compare it to the root key.
    Insert picture description here
  3. Because k>11, go to the right child node of the root node.
    Insert picture description here
  4. Compare k with 16. Since k>16, compare k with the next key 18.
    Insert picture description here
  5. Since k<18, k is between 16 and 18. Search for the right child of 16 or the left child of 18.
    Insert picture description here
  6. Find k.
    Insert picture description here
4. Algorithm for searching elements
BtreeSearch(x, k)
 i = 1
 while i ≤ n[x] and k ≥ keyi[x]        // n[x] means number of keys in x node
    do i = i + 1
if i  n[x] and k = keyi[x]
    then return (x, i)
if leaf [x]
    then return NIL
else
    return BtreeSearch(ci[x], k)
5. C example
// Searching a key on a B-tree in C

#include <stdio.h>
#include <stdlib.h>

#define MAX 3
#define MIN 2

struct BTreeNode {
    
    
  int val[MAX + 1], count;
  struct BTreeNode *link[MAX + 1];
};

struct BTreeNode *root;

// Create a node
struct BTreeNode *createNode(int val, struct BTreeNode *child) {
    
    
  struct BTreeNode *newNode;
  newNode = (struct BTreeNode *)malloc(sizeof(struct BTreeNode));
  newNode->val[1] = val;
  newNode->count = 1;
  newNode->link[0] = root;
  newNode->link[1] = child;
  return newNode;
}

// Insert node
void insertNode(int val, int pos, struct BTreeNode *node,
        struct BTreeNode *child) {
    
    
  int j = node->count;
  while (j > pos) {
    
    
    node->val[j + 1] = node->val[j];
    node->link[j + 1] = node->link[j];
    j--;
  }
  node->val[j + 1] = val;
  node->link[j + 1] = child;
  node->count++;
}

// Split node
void splitNode(int val, int *pval, int pos, struct BTreeNode *node,
         struct BTreeNode *child, struct BTreeNode **newNode) {
    
    
  int median, j;

  if (pos > MIN)
    median = MIN + 1;
  else
    median = MIN;

  *newNode = (struct BTreeNode *)malloc(sizeof(struct BTreeNode));
  j = median + 1;
  while (j <= MAX) {
    
    
    (*newNode)->val[j - median] = node->val[j];
    (*newNode)->link[j - median] = node->link[j];
    j++;
  }
  node->count = median;
  (*newNode)->count = MAX - median;

  if (pos <= MIN) {
    
    
    insertNode(val, pos, node, child);
  } else {
    
    
    insertNode(val, pos - median, *newNode, child);
  }
  *pval = node->val[node->count];
  (*newNode)->link[0] = node->link[node->count];
  node->count--;
}

// Set the value
int setValue(int val, int *pval,
           struct BTreeNode *node, struct BTreeNode **child) {
    
    
  int pos;
  if (!node) {
    
    
    *pval = val;
    *child = NULL;
    return 1;
  }

  if (val < node->val[1]) {
    
    
    pos = 0;
  } else {
    
    
    for (pos = node->count;
       (val < node->val[pos] && pos > 1); pos--)
      ;
    if (val == node->val[pos]) {
    
    
      printf("Duplicates are not permitted\n");
      return 0;
    }
  }
  if (setValue(val, pval, node->link[pos], child)) {
    
    
    if (node->count < MAX) {
    
    
      insertNode(*pval, pos, node, *child);
    } else {
    
    
      splitNode(*pval, pval, pos, node, *child, child);
      return 1;
    }
  }
  return 0;
}

// Insert the value
void insert(int val) {
    
    
  int flag, i;
  struct BTreeNode *child;

  flag = setValue(val, &i, root, &child);
  if (flag)
    root = createNode(i, child);
}

// Search node
void search(int val, int *pos, struct BTreeNode *myNode) {
    
    
  if (!myNode) {
    
    
    return;
  }

  if (val < myNode->val[1]) {
    
    
    *pos = 0;
  } else {
    
    
    for (*pos = myNode->count;
       (val < myNode->val[*pos] && *pos > 1); (*pos)--)
      ;
    if (val == myNode->val[*pos]) {
    
    
      printf("%d is found", val);
      return;
    }
  }
  search(val, pos, myNode->link[*pos]);

  return;
}

// Traverse then nodes
void traversal(struct BTreeNode *myNode) {
    
    
  int i;
  if (myNode) {
    
    
    for (i = 0; i < myNode->count; i++) {
    
    
      traversal(myNode->link[i]);
      printf("%d ", myNode->val[i + 1]);
    }
    traversal(myNode->link[i]);
  }
}

int main() {
    
    
  int val, ch;

  insert(8);
  insert(9);
  insert(10);
  insert(11);
  insert(15);
  insert(16);
  insert(17);
  insert(18);
  insert(20);
  insert(23);

  traversal(root);

  printf("\n");
  search(11, &ch, root);
}
6. B-tree search complexity

    Worst case time complexity: Θ(log n)
    average case time complexity: Θ(log n)
    best case time complexity: Θ(log n)
    average case space complexity: Θ(n)
    worst case space complexity Degree: Θ(n)

7. B-tree application
  • Database and file system
  • Storage data block (auxiliary storage medium)
  • Multi-level index
Reference documents

[1]Parewa Labs Pvt. Ltd.B-tree[EB/OL].https://www.programiz.com/dsa/b-tree,2020-01-01.

Guess you like

Origin blog.csdn.net/zsx0728/article/details/114298539