Common data structures and applications

Preface

Data structures are the way computers store and organize data. At work, we usually use the encapsulated collection API directly, so that we can complete tasks more efficiently. But as a programmer, it is very important to master the data structure, because it can help us better understand and design algorithms, thereby improving the efficiency and reliability of the program. This article will introduce several common data structures. By understanding the characteristics and advantages of these data structures, you can better choose the appropriate data structure in different scenarios.

Introduction to data structures

Common data structures are broadly divided into two types: linear and nonlinear.

The linear data structure is as its name implies, and the image of the overall structure is a straight line. Including arrays, linked lists, stacks, queues, etc.
Nonlinear data structures include trees, heaps, graphs, etc.

array

An array is a collection composed of multiple elements, as shown in the figure below

Insert image description here

The space for storing arrays in memory is continuous, and each element occupies a certain amount of memory space. This is why you need to specify the length when declaring an array, otherwise you will not know how much space it will occupy. Taking the Java language as an example, when an array is declared, the array variable will point to the starting address of the array object, which is the position of the first element, as shown below

Insert image description here

From this point of view, when querying an element in the array, the memory address of this element can be calculated through the subscript. For example, if you want to find the element with subscript 2, then the memory address of arr[2] = the memory of arr Address + 2 * element size, that is, the element can be accessed directly through the memory address, and the time complexity is O(1).

However, arrays also bring a problem: since the length of the array is fixed, adding or deleting elements will involve creating a new array to replace the original array, resulting in high complexity. For example, the following code demonstrates how to add an element to the end of an array:

int[] arr = {
    
    1, 2, 3, 4, 5};  
  
arr[arr.length] = 6; // 将要添加的元素放到数组的最后一个位置  
  
int[] newArr = new int[arr.length + 1]; // 创建一个新的数组,长度加1  
  
for (int i = 0; i < newArr.length; i++) {
    
      
    newArr[i] = arr[i]; // 将原数组中的元素复制到新数组中  
}  
  
arr = newArr; // 使用新数组替换原数组

The activities of the sample code in memory are as follows:

Insert image description here

There are many underlying implementations of collections in Java that are based on arrays, such as the commonly used ArrayList, Vector, HashMap, ArrayBlockingQueue, etc.

linked list

A linked list consists of a series of nodes. Each node includes two parts: one is the data field that stores the data element, and the other is the pointer field that stores the address of the next node. Taking Java as an example, the structure of a node is expressed as follows:

public class Node<T> {
    
    

    //存储数据元素的数据域
    private T value;

    //下一个节点地址的指针域
    private Node next;
}

The pointer of each element points to the next element, thus forming a linked list, as shown in the figure below.

Insert image description here

Different from arrays, linked lists are non-continuous spaces in memory. They can make full use of computer memory space, achieve flexible dynamic memory management, and solve the disadvantage of arrays that need to know the data size in advance. Its storage in memory is as shown below

Insert image description here

Compared with arrays, the insertion and deletion operations of linked lists can reach O(1) complexity (just point the pointer at the end of the chain to the next node or to null), but searching for a node or accessing a specific numbered node requires It takes O(n) time.

The one-way linked list introduced above has a disadvantage: it can only be traversed from beginning to end. If you want to delete the penultimate node, you can only traverse it from the beginning. For more flexible operation and higher efficiency, there is a doubly linked list, whose structure is shown as follows:

Insert image description here

If the structure is a doubly linked list, to delete the penultimate node, just find the node before the tail node and delete it. LinkedList in Java is an implementation of a doubly linked list.

Queues and Stacks

Arrays and linked lists mainly focus on the storage structure and access methods of data, while queues and stacks focus on the processing order and logic of data, which have their own characteristics.

The characteristic of the queue is first-in-first-out (FIFO): the first element entering the queue will be the first to be accessed or taken out, or when adding elements, they will be queued at the end of the queue and dequeued at the head. Its expression is as shown below

Insert image description here

The characteristic of the stack is first-in-last-out (FILO): the first element pushed onto the stack is the last to be accessed or taken out, or the last element pushed onto the stack will be the first to be accessed or taken out. The stack only allows insertion and deletion operations at the top of the stack.

A very vivid description is: you can think of the stack as a magazine. The first bullet loaded will be pressed into the bottom, and when it is fired, it will pop out from the top.

Insert image description here

The underlying implementation of both can choose arrays or linked lists as the underlying data structure according to specific needs and scenarios. For example, ArrayBlockingQueue in Java is a blocking queue implemented through arrays, and LinkedBlockingQueue is a non-blocking queue implemented through queues.

Tree

A tree is a nonlinear structure, which is a collection of n limited nodes with hierarchical relationships. There are also many types of trees, such as binary trees, balanced trees, 2-3-4 trees, red-black trees, B-trees, and B+ trees.

A binary tree is a tree structure in which each node has at most two subtrees. It is usually used to implement a binary search tree. Its characteristics are: the value of the left child node is less than the value of the root node, and the value of the right child node is greater than the value of the root node. Taking Java as an example, the structure of a binary search tree is expressed as follows:

public class Node {
    
    

    //当前节点的值
    private int value;

    //父节点、左子节点、右子节点
    private Node parent,left,right;

}

The expression is as shown below

Insert image description here

The time complexity of its query is O(log n). Compared with the linked list, the query efficiency is greatly improved. But in the worst case it may degenerate to O(n), such as the following situation

Insert image description here


To avoid this situation, AVL trees were born. The AVL tree is a self-balancing binary search tree. When inserting and deleting operations, it will automatically adjust its structure through left or right rotation to ensure that the height difference between the left and right subtrees of each node does not exceed 1, thus maintaining The balance of the tree also ensures that the time complexity of the query is O(log n).

Take the following figure as an example. When node 5 is inserted, the height difference between the left and right subtrees of node 7 is 2. At this time, node 7 needs to perform a right rotation to maintain the balance of the tree.

Insert image description here
Right rotation means: taking a node as the rotation point, its left child node becomes its parent node, the right child node of the left child node becomes its left child node, and the right child node remains unchanged.
In the same way, left rotation means: taking a node as the rotation point, its right child node becomes its parent node, the left child node of the right child node becomes its right child node, and the left child node remains unchanged.


Although AVL maintains the balance of the tree through rotation, in scenarios with frequent insertions and deletions, frequent rotations will lead to performance degradation. To solve this problem, red-black trees were proposed.

Everyone should be familiar with red-black trees. They should often be asked about it during interviews, but whether they understand it is another matter.

The red-black tree is also a self-balancing binary search tree, which uses node colors to ensure the balance of the tree. Compared with AVL, red-black trees are more difficult to understand. The first doubt is: "Isn't it also left-handed and right-handed? It's so troublesome. The color of the nodes changes, who is confusing it?".

I will write a special article to introduce the red-black tree. Here is the conclusion: the number of rotations of the red-black tree is less than that of the AVL tree. Therefore, when there are many operations such as insertion and deletion, the red-black tree is usually used. For example, everyone knows HashMap. The figure below shows the AVL tree and the red-black tree where 9, 7, 6, 10, 5, 8, 4, 2, 1, 0 are inserted in order. You can see that there are certain structural differences between the two.

Insert image description here


The trees mentioned above are all binary trees, that is, each node has only two branches, and they are all ordered. Because there are only two branches, this is also a common problem of binary trees. When there is more and more data, the height of the tree will become higher. This situation is not suitable for scenarios such as databases and file systems.

The tree structures mentioned above are all binary trees, each node has only two child nodes, and they are all ordered. When the amount of data continues to increase, the height of the binary tree will also gradually increase, resulting in reduced query efficiency. In scenarios with disk I/O operations, the taller the tree, the less conducive it is to querying.

In order to solve the above problems, a multi-tree structure is adopted, which can effectively reduce the height of the tree and improve query efficiency.

Common multi-trees include 2-3-4 trees, B-trees and B+ trees, which are usually used in databases and file systems. Their expressions are as follows.

Insert image description here

B+ tree is an extension of B-tree, which is more suitable for use on disks or other storage devices. In the B+ tree, non-leaf nodes do not store data information, only keywords and child node pointers are stored. This will store more effective data, such as indexes. At the same time, each leaf node points to a pointer to the adjacent leaf node, so that querying within the database range becomes very efficient.

heap

A heap is a special tree data structure characterized by: each node is greater than or equal to (less than or equal to) each of its child nodes.

Common heaps include binary heaps, Fibonacci heaps, etc. Binary heaps are a complete binary tree, which can be divided into maximum heaps and minimum heaps. Each node in the maximum heap is greater than or equal to its child node. Minimum heap Every node in is less than or equal to its child nodes. The left side of the figure below shows the representation of a maximum heap, and the right side does not conform to a complete binary tree (the nodes inserted from left to right are complete binary trees).

Insert image description here

Heaps are often used as priority queues because the root node of the heap is always the largest or smallest.

Summarize

Many programming languages ​​provide different types of collection classes. Taking Java as an example, our commonly used collections include List, Set, Queue, and Map, and their underlying implementations are data structures such as arrays, linked lists, or trees. So by understanding the data structure, we can better select and use these collections, and even design more efficient data structures ourselves to solve the problem.

Guess you like

Origin blog.csdn.net/qq_28314431/article/details/133853068