Data structure common problem series (2)

Article Directory

1. Common data structures

1). Array: sequential storage, random access. Linked list: linked list storage, sequential access.
2). Stack: Divided into stack top and stack bottom, following the principle of first in, last out.
3). Queue: First in first out principle (analogous to queuing).
4). Tree: Binary tree, balanced binary tree, large and small top pile, etc.
5). Picture: shortest path, critical path.

2. What is a linked list, queue, stack

Linked list:

When you need to store multiple of the same data type, you can use the array storage, the array can be directly accessed through the subscript, but the disadvantage of the array is that it cannot dynamically insert or delete the elements (especially the element in the first position) ), and the linked list makes up for this defect. It is very convenient for element insertion and deletion operations, but the "performance" of accessing elements is much worse.

The so-called singly linked list means that there is only one pointer to the address of the next element (node). As long as the first address of the singly linked list is known, the entire linked list can be traversed. Because the linked list node is dynamically applied for in the heap area, its address is not continuous, so random access cannot be performed. Only through the next pointer of the previous node can the pointer of the next node be located.

Singly linked lists can only be traversed backwards, not reversed, so there is a more widely used double-linked list. That is, the node has an additional prev pointer that stores the address of the previous node. Double-linked lists can be traversed in both directions, but they can only be accessed sequentially.

queue:

The queue is just like we usually queue. It is queued in the order of data arrival. Each time a new node is inserted at the end of the queue, and a node is deleted to dequeue only from the beginning. In short, the order of arrival of elements is based on the principle of "first in, first out". Due to the frequent insertion and deletion of queues, in order to be efficient, fixed-length arrays are generally used, and the array space can be recycled. It is necessary to judge whether the processed queue is full or empty before operation. If you want a dynamic length, you can use a linked list, as long as you remember the first address (front of the team) and the end address (rear) of the linked list at the same time.

Stack:

The characteristics of the stack are just opposite to the queue. The stack is popped according to the reverse order of the data being pushed into the stack, that is, "first in and last out". Each time the element is pushed into the stack, the element is placed on the top of the stack. , Generally use a fixed-length array to store stack elements instead of dynamically applying for node space. Pushing into the stack is generally called pushing, and popping is called popping.

Since the push and pop stacks are on the top of the stack, only one size field is needed to store the size of the current stack. The initial size is 0. Each time the stack is pushed, size+1. Pay attention to whether the stack is full, and the pop stack is size-1.

3. What is a tree (balanced binary tree, binary sort tree, B tree, B+ tree, R tree, red-black tree)

Why is there the concept of trees? Because the existing data structures (arrays, linked lists) cannot balance the time overhead of static and dynamic operations.

time complexity	Array	Linked list
Static operation (lookup)	O (1)	O (n)
Dynamic operations (insert, delete)	O (n)	O (1)

For the tree data structure, the most significant feature is that there can only be one root node (except for empty trees), each node can have multiple child nodes, except for the root node, other nodes can only have one The parent node. There are many types of trees, and the most talked about is a binary tree, each node has no more than two child nodes.

Balanced Binary Tree & Binary Sort Tree

Binary Search Tree (Binary Search Tree, BST, also called Binary Search Tree). It is also very simple to construct a binary sort tree. Those larger than the root node are placed on the right subtree of the root node, and those smaller than the root node are placed on the right subtree. On the left subtree of the root node (equal to the root node depends on the situation). If you write a program, you can use recursion, and since there is no overlapping sub-problem, the performance of recursion is good enough (regardless of stack overflow).

The binary sort tree can reach the time complexity of O(lgn) static and dynamic operations under normal circumstances, but there is a special case. If the input is originally ordered, then the binary tree degenerates into a linked list. In order to eliminate the sensitive characteristics of the binary tree to the input, the balanced binary tree (AVL) is introduced. In fact, the balanced binary tree should be called the balanced binary sort tree. The balanced binary tree only needs to ensure that the height difference between the left subtree and the right subtree of each node is less than or equal to 1.

B tree & B+ tree

In the operating system, the access speed and capacity of registers go up and down. The fastest is the registers on the CPU, then the cache (cache), then the memory, and then the external disk, etc. When data is exchanged between different levels of storage (such as memory and external disk), we call it I/O. I/O is time-consuming, so avoid using it as much as possible.

The emergence of B-tree (B-Tree, I don’t agree with the translation of “B-tree”) is to solve this problem. Since the B-tree is a multi-way binary tree (the root node has two sub-nodes, and the other nodes have more than two sub-nodes), its height is much lower than that of the balanced binary tree. Generally speaking, every time the binary balanced tree drops one level, Performing a disk I/O operation, taking 1GB data as an example, it takes an average of 30 disk I/Os to read the data, and every time the B tree drops one level, each node will read multiple key codes, so B The tree is suitable for implementing disk read and write logic.

B-tree is a kind of multi-fork designed for disks or other storage devices (compared to binary tree, each inner node of B-tree has multiple branches, namely multi-fork balanced sorting tree. It is very similar to the red-black tree described below , But it is better in reducing disk I/O operations. Many database systems generally use B-trees or deformed structures of B-trees (such as B+ trees) to store information.

R tree

Dealing with the problem of spatial storage, R-tree has made remarkable achievements in fields such as databases. It solves the problems of searching in high-dimensional space very well. Take an example that R-tree can solve in the real world: find all restaurants within 20 miles. What would you do if there was no R tree? Under normal circumstances, we will divide the restaurant's coordinates (x, y) into two fields and store them in the database. One field records the longitude and the other records the latitude. In this case, we need to traverse all the restaurants to obtain their location information, and then calculate whether they meet the requirements. If there are 100 restaurants in an area, we have to perform 100 location calculation operations. If it is applied to a very large database such as Google Maps, this method must not be feasible.

R-tree solves this kind of high-dimensional space search problem very well. It extends the idea of B-tree to multi-dimensional space very well, adopts the idea of dividing space by B-tree, and uses the method of merging and decomposing nodes when adding and deleting operations to ensure the balance of the tree. Therefore, the R-tree is a balanced tree used to store high-dimensional data.

The leaf nodes of each R tree contain multiple pointers to different data. These data can be stored in the hard disk or in the memory. According to this data structure of the R tree, when we need to perform a high-dimensional space query, we only need to traverse the pointers contained in a few leaf nodes to check whether the data pointed to by these pointers meets the requirements. This method allows us to get answers without having to traverse all the data, and the efficiency is significantly improved.

Red black tree

RB Tree, the full name is Red-Black Tree, also known as "Red-Black Tree", it is a special binary search tree. Each node of the red-black tree has a storage bit to indicate the color of the node, which can be red or black.

Insert picture description here
The red-black tree is widely used, mainly to store ordered data, its time complexity is O(lgn), and the efficiency is very high. For example, the TreeSet and TreeMap in the Java collection, the set and map in the C++ STL, require a firewall system with dynamic rules, and the use of red-black trees instead of hash tables has proved to be more scalable. The Linux kernel uses red-black trees to maintain memory blocks when managing vm_area_struct (virtual memory).

For linked lists, arrays, trees, and graphs, each dynamic operation of them will completely forget the previous state and reach a brand new state. This data structure is called an ephemeral structure. Another data structure can record the state of a certain historical moment, and can be accessed according to the version + target data when accessing. This data structure is called persistent structure. In fact, the red-black tree can achieve this kind of recording of historical versions.

The biggest difference between the B tree and the red-black tree is that the nodes of the B tree can have many children, from a few to several thousand. Then why is it said that B-trees are similar to red-black trees? Because like red-black trees, the height of a B-tree with n nodes is also O (lgn), but it may be higher than that of a red-black tree. Much smaller because its branching factor is relatively large. Therefore, the B-tree can implement various dynamic collection operations such as insert and delete in O (logn) time.

What is a heap (big root heap, small root heap)

The "heap" mentioned here is a data structure, pay attention to separate from the heap memory in the jvm. The heap must meet the following two conditions: (1) It is a complete binary tree (2) The value stored in the heap is a partial order (the partial order only has a relation R for some elements, and a total order has a relation R for any two elements in the set) .

Large root heap: the value of the parent node is greater than or equal to the value of its child node; small root heap: the value of the parent node is less than or equal to the value of its child node.