18 pictures, one article to understand 8 common data structures

A few days ago, I communicated with Ao Bing. He said that our writers are constantly burning ourselves, so we need to constantly refuel. For his point of view, I can't agree anymore-so I started to madly supplement the basic knowledge of computers, including my relatively weak data structure.

Baidu Baike's definition of data structure is: a collection of data elements that have one or more specific relationships with each other. The definition is very abstract, and you need to read it aloud several times to feel a little bit. How to make this feeling stronger and more intimate? Let me list 8 common data structures, arrays, linked lists, stacks, queues, trees, heaps, graphs, and hash tables.

What is the difference between these 8 data structures?

①, array

advantage:

  • The speed of querying elements according to the index is very fast;
  • It is also convenient to traverse the array by index.

Disadvantages:

  • The size of the array is determined after creation and cannot be expanded;
  • The array can only store one type of data;
  • The operation of adding and deleting elements is time consuming because other elements need to be moved.

②, linked list

The book "Algorithm (4th Edition)" defines the linked list as follows:

A linked list is a recursive data structure. It is either empty (null) or a reference to a node. The node also has an element and a reference to another linked list.

Java's LinkedList class can vividly express the structure of a linked list in the form of code:

public class LinkedList<E> {
    
    
    transient Node<E> first;
    transient Node<E> last;

    private static class Node<E> {
    
    
        E item;
        Node<E> next;
        Node<E> prev;

        Node(Node<E> prev, E element, Node<E> next) {
    
    
            this.item = element;
            this.next = next;
            this.prev = prev;
        }
    }
}

This is a doubly linked list. The current element item has both prev and next, but the prev of first is null, and the next of last is null. If it is a singly linked list, there is only next and no prev.

The disadvantage of a singly linked list is that it can only be traversed sequentially from beginning to end, while a doubly linked list can be advanced or retreated, and it can find the next one as well as the previous one-each node needs to allocate one more storage space.

The data in the linked list is stored in a "chain" structure, so it can achieve the effect of non-continuous memory, and the array must be a continuous memory.

Since it does not have to be stored in a sequential manner, the linked list can reach O(1) time complexity when inserting and deleting (only need to point to the reference again, no need to move other elements like an array). In addition, the linked list overcomes the disadvantage that the data size of the array must be known in advance, so that flexible dynamic memory management can be realized.

advantage:

  • No need to initialize capacity;
  • Any element can be added;
  • You only need to update the reference when inserting and deleting.

Disadvantages:

  • Contains a large number of references and takes up a large memory space;
  • Finding an element requires traversing the entire linked list, which is time-consuming.

③, stack

The stack is like a bucket, the bottom is sealed, the top is open, water can enter and exit. Friends who have used buckets should understand this truth: the water that goes in first is at the bottom of the bucket, and the water that goes in later is at the top of the bucket; the water that goes in later is poured out first, and the water that goes in first is poured out later.

In the same way, the stack stores data according to the principles of "last in, first out" and "first in last out". The data inserted first is pushed into the bottom of the stack, and the data inserted later is on the top of the stack. When reading data, start from the top of the stack. Read sequentially.

④, queue

The queue is like a section of water pipe, both ends are open, water enters from one end, and then comes out from the other end. The water that goes in first comes out first, and the water that goes in later comes out.

Something different from the water pipe is that the queue will define the two ends, one end is called the head of the team, and the other end is called the tail. Only delete operations (dequeue) are allowed at the head of the team, and only insert operations (enter the queue) are allowed at the end of the team.

Note that the stack is first-in-last-out, and the queue is first-in-first-out-although both are linear tables, the principles are different and the structure is different.

⑤, tree

Tree is a typical nonlinear structure, it is a hierarchical collection composed of n (n>0) finite nodes.

It is called a "tree" because this data structure looks like an upside-down tree, except that the roots are on the top and the leaves are on the bottom. The tree data structure has the following characteristics:

  • Each node has only a limited number of child nodes or no child nodes;
  • A node without a parent node is called the root node;
  • Every non-root node has one and only one parent node;
  • Except for the root node, each child node can be divided into multiple disjoint subtrees.

The following figure shows some terms of the tree:

The root node is level 0, its child nodes are level 1, the child nodes of child nodes are level 2, and so on.

  • Depth: For any node n, the depth of n is the only path length from root to n, and the depth of root is 0.
  • Height: For any node n, the height of n is the longest path length from n to a leaf, and the height of all leaves is 0.

There are many types of trees, the common ones are:

  • Unordered tree: There is no order relationship between the child nodes of any node in the tree. So how do you understand the unordered tree, and what does it look like?

If there are three nodes, one is a parent node and two are child nodes of the same level, then there are three situations:

If there are three nodes, one is a parent node, and two are child nodes of different levels, then there are six situations:

A disordered tree composed of three nodes can be combined into nine situations.

  • Binary tree: Each node contains at most two subtrees. Binary trees can be divided into multiple types according to different manifestations.

Complete binary tree: For a binary tree, suppose its depth is d (d> 1). Except for the d-th layer, the number of nodes in the other layers has reached the maximum value, and all the nodes in the d-th layer are continuously and tightly arranged from left to right. Such a binary tree is called a complete binary tree.

Take the above figure as an example, d is 3. Except for the third layer, the first and second layers have reached the maximum value (2 child nodes), and all the nodes of the third layer are closely connected from left to right (H , I, J, K, L), in line with the requirements of a complete binary tree.

Full binary tree: A binary tree with the maximum number of nodes in each layer. There are two manifestations. The first one, as shown in the figure below (each layer is full), satisfies the maximum number of nodes in each layer of 2.

The second, like the following figure (although each layer is not full), but the number of nodes in each layer still reaches the maximum of 2.

Binary Search Tree: The English name is Binary Search Tree, or BST, which needs to meet the following conditions:

  • The left subtree of any node is not empty, and the values ​​of all nodes on the left subtree are less than the value of its root node;
  • The right subtree of any node is not empty, and the values ​​of all nodes on the right subtree are greater than the value of its root node;
  • The left and right subtrees of any node are also binary search trees.

Based on the characteristics of the binary search tree, its advantage compared to other data structures is that the time complexity of search and insertion is relatively low, which is O(logn). If we want to find 5 elements from the above figure, we start from the root node 7. 5 must be on the left of 7, and 4 must be found, and 5 must be on the right of 4, and 6 must be found on the left of 6. Side, found it.

Ideally, to find nodes through BST, the number of nodes that need to be checked can be halved.

Balanced Binary Tree: A binary tree whose height difference between two subtrees of any node is not greater than 1. The highly balanced binary tree proposed by mathematicians Adelse-Velskil and Landis of the former Soviet Union in 1962 is also called AVL tree according to the English name of scientists.

A balanced binary tree is essentially a binary search tree. However, in order to limit the height difference between the left and right subtrees, and avoid tilting trees that are biased toward linear structure evolution, the left and right subtrees of each node in the binary search tree are Because of the limitation, the height difference between the left and right subtrees is called the balance factor, and the absolute value of the balance factor of each node in the tree is not greater than 1.

The difficulty of balancing a binary tree lies in how to maintain left-right balance by left-handed or right-handed when nodes are deleted or added.

The most common balanced binary tree in Java is the red-black tree. The nodes are red or black. The balance of the binary tree is maintained through color constraints:

1) Each node can only be red or black

2) The root node is black

3) Each leaf node (NIL node, empty node) is black.

4) If a node is red, then its two child nodes are black. That is to say, two adjacent red nodes cannot appear on a path.

5) All paths from any node to each leaf contain the same number of black nodes.

  • B-tree: A self-balanced binary search tree optimized for read and write operations, which can keep data in order and has more than two subtrees. B-tree is used in database index technology.

⑥, heap

The heap can be regarded as an array object of a tree, with the following characteristics:

  • The value of a node in the heap is always not greater than or not less than the value of its parent node;
  • Heap is always a complete binary tree.

The heap with the largest root node is called the largest heap or large root heap, and the heap with the smallest root node is called the smallest heap or small root heap.

⑦、Picture

A graph is a complex nonlinear structure, composed of a finite non-empty set of vertices and a set of edges between vertices, usually expressed as: G(V, E), where G represents a graph, and V is a graph in G The set of vertices, E is the set of edges in the graph G.

The above figure has 4 vertices V0, V1, V2, V3, and there are 5 edges between the 4 vertices.

In a linear structure, data elements satisfy a unique linear relationship, and each data element (except the first and last) has a unique "precursor" and "successor";

In the tree structure, there is an obvious hierarchical relationship between data elements, and each data element is only related to one element in the upper layer (parent node) and multiple elements (child nodes) in the next layer;

In the graph structure, the relationship between nodes is arbitrary, and any two data elements in the graph may be related.

⑧, hash table

Hash Table, also called Hash Table, is a data structure that can be directly accessed through key-value. Its biggest feature is that it can quickly find, insert, and delete.

The biggest feature of an array is that it is easy to find, but difficult to insert and delete; on the contrary, the linked list is difficult to find, but easy to insert and delete. The hash table perfectly combines the advantages of the two, Java's HashMap also adds the advantages of the tree on this basis.

The hash function plays a very critical role in the hash table. It can transform an input of any length into a fixed-length output, and the output is the hash value. The hash function makes the access process of a data sequence more rapid and effective. Through the hash function, data elements can be quickly located.

If the keyword k, the value stored in the hash(k)storage location. Therefore, the value corresponding to k can be directly obtained without traversal.

For any two different data blocks, the possibility of the same hash value is extremely small, that is to say, for a given data block, it is extremely difficult to find a data block with the same hash value. Furthermore, for a data block, even if only one bit of it is changed, the change in its hash value will be very large-this is the value of Hash!

Although the possibility is extremely small, it still happens. If the hash conflicts, Java's HashMap will add a linked list at the same position in the array. If the length of the linked list is greater than 8, it will be converted into a red-black tree for processing-this It is the so-called zipper method (array + linked list).

To be honest, according to this progress, I feel like a bald rhythm, but if it can become stronger, it's worth - yes, worth it. There are also some friends who want me to recommend some books on algorithms and data structures. I have collected some popular ones on GitHub. You can download them by clicking on the link. I hope my heart can help you.

Link: https://pan.baidu.com/s/1rB-CCjjpKPidOio7Ov_0YA Password: g5pl

I am the second king of silence, a silent but interesting programmer, and attention can improve learning efficiency. Friends who like this article, please don’t forget Quad, like, bookmark, forward, leave a message, you are the most beautiful and you are the most handsome!

Guess you like

Origin blog.csdn.net/qing_gee/article/details/108725651