Talking about 8 big data structures in software programming


foreword

In fact, in Java programming, everyone has always ignored the data structure. Most of them use several collection types in common use, as long as they can realize business functions, they lack systematic understanding of data structures. Recently, I plan to look through the source code to deepen my understanding of the commonly used data structures in java.


1. Why study data structures

  1. Coping with big factory interviews, promotion and salary increase
  2. Familiarity with the data structure can help us better understand the source code and architecture design, and understand the subtleties of the design
  3. Data structures can help us optimize program performance
  4. Data structure is the cornerstone of software programming and has universal applicability. It is commonly known as "internal strength". Unlike specific technology stacks, it is easy to become obsolete.

2. Classification of data structures

program = algorithm + data structure

Data structures are the way computers store and organize data. A data structure is a collection of data elements that have one or more specific relationships with each other. Can be divided into eight data structures: 数组(Array)、栈(Stack)、链表(Linked List)、图(Graph)、散列表(Hash)、队列(Queue)、树(Tree)、堆(Heap).
insert image description here

1. Array (Array)

数组是最基本的数据结构, many languages ​​have built-in support for arrays. Arrays are used to 一块连续的内存空间save data and store a group of data with the same type. The number of saved data is determined when memory is allocated.
insert image description here

  • The time complexity of accessing the nth data in the array (that is, random access according to the subscript of the data) is O(1); but to find a specified data in the array is O(N);
  • When inserting or deleting data into an array:
    the best case is to operate at the end of the array, the time complexity is O(1);
    the worst case is to insert or delete the first data, the time complexity is O( N). When inserting or deleting data at any position in the array, all subsequent data needs to be moved, and the moved data is still related to the number of data, so the overall time complexity is still O(N). The average case time complexity is also O(n).

insert image description here
Features : According to the subscript (index), the query speed is fast, and it is convenient to traverse the array.

limit:

  • After the array size is fixed, it cannot be expanded;

  • Arrays can only store one type of data;

  • Adding and deleting is slow (need to move other elements);

Scenarios for use:
frequent queries, small storage space requirements, and few additions and deletions.

2. Linked List

The linked list非连续的内存 saves data in the unit, and by linking 指针each memory unit together, the pointer of the rightmost node points to NULL. The linked list does not need to allocate a fixed-size storage space in advance. When data needs to be stored, a piece of memory is allocated and inserted into the linked list.

The linked list is a discontinuous and non-sequential storage structure on the physical storage unit. The logical order of the data elements is realized through the pointer address of the linked list. Each element contains two nodes, one is the data field (memory space) for storing elements, and the other is A pointer field to the next node. According to the pointing of the pointer, the linked list can form different structures.

The time complexity of finding the nth data in the linked list and finding the specified data is O(N), but the time complexity of inserting and deleting data is O(1), because only the pointer needs to be adjusted.
insert image description here
It is more difficult to program a linked list structure like the above when inserting and deleting, because it is necessary to remember the previous node of the current node, so that the insertion and deletion can be completed. For simplicity, a linked list with a head node is usually used:
insert image description here
the storage location of the first node in the singly linked list is called the head pointer, so the access of the entire linked list must start from the head pointer. Each subsequent node is actually the position pointed to by the previous successor pointer.
insert image description here
The head pointer is the name of the linked list, and the head pointer is just a pointer.

The benefits of adding a header node:

  • Mainly for 统一操作. The head node is set up for the unity and convenience of operation. It is placed before the first element node, and its data field is generally meaningless (of course, in some cases, the length of the linked list can also be stored, used as a monitoring post, etc.).
    After having the head node, the operation of inserting a node before the first element node and deleting the first node is unified with the operation of other nodes.

The linked list above is a single linked list, and there is also a double linked list, that is, the node contains a pointer to the next node and a pointer to the previous node: a
insert image description here
doubly linked list without a head node will not appear when inserting and deleting data A problem like a singly linked list. In addition, there is also a linked list that is a circular linked list, which connects the head and tail of the doubly linked list:
insert image description here
inserting or deleting data into the circular doubly linked list and the circular linked list just moves a few more pointers.
Features :
A very commonly used data structure, no need to initialize the capacity, you can add or subtract elements arbitrarily;
when adding or deleting elements, you only need to change the address of the pointer field of the two element nodes before and after, so the speed of adding and deleting is very fast.

Restrictions:
Because it contains a large number of pointer fields, it takes up a lot of space; to
find an element, it needs to traverse the linked list to find it, which is very time-consuming.

Usage scenario:
The amount of data is small, and frequent addition and deletion operations are required.

Note ⚠️:
There is a saying that there are only two data structures in software programming: 数组和链表. Other data structures are based on the evolution of these two data structures, which shows the importance of the two data structures of arrays and linked lists.

3. Queue

The queue implements first-in-first-out semantics, and the queue can also be implemented using arrays and linked lists. A queue is also a type of linear list. The difference is that a queue can add elements at one end and remove elements at the other end.
insert image description here
The queue only allows adding data at the end of the queue and deleting data at the head of the queue. But you can view the data of the head and tail of the queue. There is also a double-ended queue, which can be inserted and deleted at both ends:
insert image description here
Features : first-in-first-out (FIFO);
usage scenario : because of the first-in-first-out characteristics of the queue, it is very suitable for multi-threaded blocking queue management, MQ message order Consumption, orderly traversal of collection objects.

Queue ( Queue ) in the java system 既可以采用数组实现,又可以采用双向链表实现.

ArrayDeque is based on a doubly linked list implemented by an array, which can satisfy the characteristics of the first-in-first-out (FIFO) of the queue.
insert image description here

LinkedList implements a doubly linked list based on a linked list, and can also satisfy the characteristics of the first-in-first-out (FIFO) of the queue.
insert image description here

4. Stack

Stack or stack, stack is a special linear table (barrel linear data structure). It can only be operated at one end of the linear list, operations are allowed on the top of the stack, but not on the bottom of the stack.
insert image description here
Features : first in last out (FILO);

Restrictions : Only the top of the stack is allowed to operate, and the bottom of the stack is not allowed;

Usage scenarios : Stacks are often used in scenarios that implement recursive functions, such as the Fibonacci sequence.

Related implementations of stacks (Stack) in java:
Deque double-ended queues can also be used as LIFO (last in, first out) stacks. The official recommendation should be to use the Deque interface in preference to the old Stack class. When a deque is used as a stack, elements are pushed and popped from the beginning of the deque. The Stack method is completely equivalent to the Deque method, as shown in the following table:
Comparison of Stack and Deque methods

stack method Equivalent Deque method
push(e) addFirst(e)
pop() removeFirst()
peek() peekFirst()
Note that the peek method works equally well when the deque is used as a queue or a stack;

Note ⚠️:
There are also two implementations of the Deque double-ended queue in Java, one is based on arrays, and the other is based on linked lists. It also verified the core position of arrays and linked lists in data structures.

  • Array based:
Deque<Integer> stack = new ArrayDeque<Integer>();
  • linked list
Deque<TreeNode> stack = new LinkedList<Integer>();

Students who have brushed Leetcode will find that in the algorithm questions about recursion and stacks, the stacks are all using Deque:

For example: use the stack instead of recursion to implement the pre-order traversal of the tree

public List<Integer> preorderTraversal(TreeNode root) {
    
    
        //采用非递归方式实现前序遍历
        List<Integer> res = new ArrayList<Integer>();
        if(root ==  null){
    
    
            return res;
        }
        //java官方文档推荐用deque实现栈(stack)。
        //push压入,pop弹出,peek取出但不弹出
        Deque<TreeNode> stack = new LinkedList<TreeNode>();
        TreeNode node = root;

        while (!stack.isEmpty() || node != null) {
    
    
            while(node != null){
    
    
                //前序遍历,先访问根节点
                res.add(node.val);

                stack.push(node);
                //继续遍历下一个左节点
                node = node.left;
            }
            //FIFO倒序遍历每个节点的右节点
            node = stack.pop();
            node = node.right;
        }
        return  res;
    }

5. Hash table (Hash)

Hash table (Hash table, also called hash table) is a data structure that is directly accessed according to the key value (Key value). That is, it accesses records by mapping key values ​​to a location in the table to speed up lookups. This mapping function is called a hash function, and the array storing the records is called a hash table. The time complexity of equivalent query for a certain data is O(1).

The characteristics of the array are: easy to address, difficult to insert and delete; and the characteristics of the linked list are: difficult to address, easy to insert and delete.

So can we combine the characteristics of the two to make a data structure that is easy to address and easy to insert and delete? The answer is yes, this is the hash table we are going to mention.

Advantages : One-to-one search is very efficient;
Disadvantages : One keyword may correspond to multiple hash addresses; when a range needs to be searched, the effect is not good.
Hash conflict: Different keywords get the same hash address through the calculation of the hash function.

好的散列函数=计算简单+分布均匀(计算得到的散列地址分布均匀)

A hash table is a data structure that provides fast insertion and fast precise lookup.

In the java system, HashMap is a typical hash table (Hash) structure.
Its underlying implementation is:数组 + 链表+(红黑树)
insert image description here

6. Tree

Definition of tree in Wikipedia:
In computer science, a tree (English: tree) is an abstract data type (ADT) or a data structure that implements this abstract data type, used to simulate a collection of data with a tree-like structure . It consists of n (n>0) finite nodes forming a set with hierarchical relationships. It's called a "tree" because it looks like an upside-down tree, meaning it has the roots pointing up and the leaves pointing down. It has the following characteristics:

  • Each node has only a limited number of child nodes or no child nodes;
  • A node without a parent node is called a root node;
  • Each non-root node has one and only one parent node;
  • In addition to the root node, each child node can be divided into multiple disjoint subtrees;
  • There are no cycles in the tree

Trees are divided into many types, and many types are defined due to different usage scenarios:

  • Unordered tree: There is no sequential relationship between the child nodes of any node in the tree. This kind of tree is called an unordered tree, also known as a free tree;
  • Ordered tree: There is an order relationship between the child nodes of any node in the tree. This kind of tree is called an ordered tree;
  • Binary tree: A tree with at most two subtrees per node is called a binary tree;
  • Complete binary tree: A complete binary tree is a very efficient data structure, and a complete binary tree is derived from a full binary tree. For a binary tree with a depth of K and n nodes, it is called a complete binary tree if and only if each node has a one-to-one correspondence with the nodes numbered from 1 to n in the full binary tree with a depth of K.

insert image description here

Define a node of the tree:

public class TreeNode {
    
    
     int val;
     TreeNode left;
     TreeNode right;
     TreeNode() {
    
    }
     TreeNode(int val) {
    
     this.val = val; }
     TreeNode(int val, TreeNode left, TreeNode right) {
    
    
         this.val = val;
         this.left = left;
         this.right = right;
     }
 }

Note ⚠️:
The tree can be understood as a special linked list structure.

7. Heap

**Heap (Heap)** is a general term for a special type of data structure in computer science. A heap is usually an array object that can be regarded as a complete binary tree. The root node can be greater than or equal to any child node (or less than It is equal to any child node, see the specific sorting method), there is no restriction on access, and you can access a certain child node at will. Max heap: The value of each node is greater than or equal to its child nodes. Min heap: The value of each node is less than or equal to its child nodes.

Note ⚠️:
It is especially important to declare and note that the heap here is only a structural way of storing data, and it is not a concept with the memory stack.

Features :

  • The value of a node in the heap is always not greater than or not less than the value of its parent node;
  • The heap is always a complete binary tree.

The heap with the largest root node is called the largest heap or large root heap, and the heap with the smallest root node is called the smallest heap or small root heap.
insert image description here
insert image description here
The role of the heap :

  • build priority queue
  • Support for heap sort
  • Quickly find the minimum (or maximum) value in a collection
  • 在朋友面前装逼

Differences between heaps and ordinary trees
Heaps cannot replace binary search trees. There are similarities and differences between them. Let's take a look at the main differences between the two:

  • The order of the nodes. In a binary search tree, the left child node must be smaller than the parent node, and the right child node must be greater than the parent node. But not so in the heap. In a max-heap both children must be smaller than the parent, while in a min-heap they both must be larger than the parent.

  • memory usage. Ordinary trees take up more memory space than the data they store. You have to allocate memory for node objects and left/right child node pointers. The heap uses only one data to store the array and does not use pointers.

  • balance. The binary search tree must be "balanced" in order for most of its operations to have a complexity of O(log n). You can insert/delete data in any order, or use an AVL tree or a red-black tree, but you don't actually need the whole tree to be in order in a heap. We only need to satisfy the heap property, so balancing in the heap is not a problem. Because the organization of data in the heap can guarantee O(log n) performance.

  • search. Searching in a binary tree will be fast, but searching in a heap will be slow. Searching in the heap is not the first priority, because the purpose of using the heap is to put the largest (or smallest) node at the front, so as to quickly perform related insertion and deletion operations.

Priority queue: Dequeue has nothing to do with the order of entry, but with priority.
A heap is an implementation of a data structure such as a priority queue.

/**
 * 基于数组实现的最大堆
 * 堆中的元素需要具有可比较性,所以需要实现Comparable
 * 在此实现中是从数组的下标0开始存储元素,因为使用ArrayList作为数组的角色
 *
 * @author 01
 * @date 2021-01-19
 **/
public class MaxHeap<E extends Comparable<E>> {
    
    

    /**
     * 使用ArrayList的目的是无需关注动态扩缩容逻辑
     */
    private final ArrayList<E> data;

    public MaxHeap(int capacity) {
    
    
        this.data = new ArrayList<>(capacity);
    }

    public MaxHeap() {
    
    
        this.data = new ArrayList<>();
    }

    /**
     * 返回对中的元素个数
     */
    public int size() {
    
    
        return data.size();
    }

    /**
     * 判断堆是否为空
     */
    public boolean isEmpty() {
    
    
        return data.isEmpty();
    }

    /**
     * 根据传入的index,计算其父节点所在的下标
     */
    private int parent(int index) {
    
    
        if (index == 0) {
    
    
            throw new IllegalArgumentException("index-1 doesn't have parent.");

        }
        return (index - 1) / 2;
    }

    /**
     * 根据传入的index,计算其左子节点所在的下标
     */
    private int leftChild(int index) {
    
    
        return index * 2 + 1;
    }

    /**
     * 根据传入的index,计算其右子节点所在的下标
     */
    private int rightChild(int index) {
    
    
        return index * 2 + 2;
    }
}

8. Graph

A graph (Graph) is composed of a finite non-empty collection of vertices and a collection of edges between vertices, usually expressed as: G (V, E), where G represents a graph, V is a collection of vertices in graph G, and E is the set of edges in graph G. The data elements in the graph are called vertices (Vertex), and the vertex collection is finite and non-empty. In the graph, there may be a relationship between any two vertices, and the logical relationship between the vertices is represented by an edge, and the edge set can be empty.

The data structure of the graph may also contain a value (edge ​​value) associated with each edge, such as a label or a value (ie weight, weight; indicating cost, capacity, length, etc.).
insert image description here
There are really not many scenarios encountered in java, 状态机, , 图数据库and so on.


Summarize

This article mainly gives an overview of 8 common data structures in software programming.
1. Eight data structures: 数组(Array)、栈(Stack)、链表(Linked List)、图(Graph)、散列表(Hash)、队列(Queue)、树(Tree)、堆(Heap).
2. Among them 数组(Array)和链表(Linked List)is the core, and many other structures are evolved based on these two structures.

Summary of InnoDB storage structure of MySQL

Guess you like

Origin blog.csdn.net/w1014074794/article/details/127227306