Talk about the principles of several common data structures

array

Arrays are the most commonly used data structures. Creating an array requires a continuous space in memory, and the same data type must be stored in the array. For example, if we create an array with a length of 10 and an integer data type, and its address in memory starts from 1000, then its storage format in memory is as follows.

alt

Since each integer data occupies 4 bytes of memory space, the memory space address of the entire array is 1000~1039. According to this, we can easily calculate the memory subscript address of each data in the array. Using this feature, as long as we know the array subscript, that is, the position of the data in the array, such as subscript 2, we can calculate the position 1008 of the data in the memory, so as to quickly read and write the data 241 at this position Access, the time complexity is O(1).

Random fast reading and writing is an important feature of arrays, but to access data randomly, you must know the subscript of the data in the array. If we only know the value of the data and want to find this value in the array, then we can only traverse the entire array, and the time complexity is O(N).

linked list

Unlike arrays, which must have continuous memory space, linked lists can use fragmented memory space to store data. However, because the data of the linked list in memory is not continuous, each data element in the linked list must contain a memory address pointer pointing to the next data element. As shown in the figure below, each element of the linked list contains two parts, one part is data, and the other part points to the address pointer of the next element. The last element points to null, indicating that the linked list ends here.

alt

Because the linked list is stored discontinuously, if you want to find a piece of data in the linked list, you can only traverse the linked list, so the search complexity of the linked list is always O(N).

But because the linked list is stored discontinuously, it is very easy to insert or delete a piece of data in the linked list. Just find the position to be inserted (deleted) and modify the linked list pointer. As shown in the figure, if you want to insert an element x between b and c, you only need to modify the pointer of b to c to point to x, and then point the pointer of x to c.

alt

Compared with the simple operation of easily inserting and deleting an element in a linked list, if we want to insert or delete a piece of data in an array, the size of the continuous memory space of the array will be changed, and the memory space needs to be reallocated, which is much more complicated many.

Hash table

As mentioned earlier, fast access to the data in the array must pass through the subscript of the array, and the time complexity is O(1). If you only know the data or part of the data and want to find the data in the array, you still need to traverse the array, and the time complexity is O(N).

In fact, knowing part of the data to find complete data is often used in software development. For example, if you know the product ID, you want to find complete product information; if you know the entry name, you want to find detailed information in the encyclopedia entry wait.

Such scenarios require the use of a data structure such as a Hash table. The data in the Hash table is stored in the form of Key and Value. In the above example, the product ID and entry name are the Key, and the product information and entry details are the Value. When storing, write the Key and Value into the Hash table. When reading, you only need to provide the Key to quickly find the Value.

The physical storage of the Hash table is actually an array. If we can calculate the array subscript based on the Key, we can quickly find the required Key and Value in the array. Many programming languages ​​support obtaining the HashCode of any object. For example, the HashCode method in the Java language is included in the root object Object, and its return value is an Int. We can use this Int type HashCode to calculate the array subscript. The simplest method is the remainder method. Use the array length of the Hash table to calculate the remainder of the HashCode. The remainder is the subscript of the Hash table array. Using this subscript, you can directly access the Key and Value stored in the Hash table.

alt

In the example above, the Key is the string abc, and the Value is the string hello. We first calculate the hash value of Key to get an integer value of 101. Then use 101 to take the modulus of 8, and this 8 is the length of the hash table array. 101 The modulus of 8 is 5, and this 5 is the subscript of the array. In this way, a Key and Value such as ("abc", "hello") can be stored in the array record with the subscript 5.

When we want to read data, as long as the Key abc is given, we still use such an algorithm process, first obtain its HashCode 101, and then take the modulus of 8, because the length of the array remains unchanged, after taking the modulus of 8, it is still It is more than 5, then we go to the array subscript to find the position of 5, and then we can find the value corresponding to the abc stored in the previous.

但是如果不同的Key计算出来的数组下标相同怎么办?HashCode101对8取模余数是5,HashCode109对8取模余数还是5,也就是说,不同的Key有可能计算得到相同的数组下标,这就是所谓的Hash冲突,解决Hash冲突常用的方法是链表法。

事实上,(“abc”,“hello”)这样的Key、Value数据并不会直接存储在Hash表的数组中,因为数组要求存储固定数据类型,主要目的是每个数组元素中要存放固定长度的数据。所以,数组中存储的是Key、Value数据元素的地址指针。一旦发生Hash冲突,只需要将相同下标,不同Key的数据元素添加到这个链表就可以了。查找的时候再遍历这个链表,匹配正确的Key。

如下图:

alt

因为有Hash冲突的存在,所以“Hash表的时间复杂度为什么是O(1)?”这句话并不严谨,极端情况下,如果所有Key的数组下标都冲突,那么Hash表就退化为一条链表,查询的时间复杂度是O(N)。但是作为一个面试题,“Hash表的时间复杂度为什么是O(1)”是没有问题的。

数组和链表都被称为线性表,因为里面的数据是按照线性组织存放的,每个数据元素的前面只能有一个(前驱)数据元素,后面也只能有一个(后继)数据元素,所以称为线性表。但是对数组和链表的操作可以是随机的,可以对其上任何元素进行操作,如果对操作方式加以限制,就形成了新的数据结构。

栈就是在线性表的基础上加了这样的操作限制条件:后面添加的数据,在删除的时候必须先删除,即通常所说的“后进先出”。我们可以把栈可以想象成一个大桶,往桶里面放食物,一层一层放进去,如果要吃的时候,必须从最上面一层吃,吃了几层后,再往里放食物,还是从当前的最上面一层放起。

alt

栈在线性表的基础上增加了操作限制,具体实现的时候,因为栈不需要随机访问、也不需要在中间添加、删除数据,所以可以用数组实现,也可以用链表实现。那么在顺序表的基础上增加操作限制有什么好处呢?

During the running of the program we mentioned in the previous article, the method call needs to use the stack to manage the work area of ​​each method. In this way, no matter how the method is nested and called, the top element of the stack is always the work area of ​​the currently executing method. In this way, things are simple. And simplicity is exactly a goal we should strive for in software development.

queue

The queue is also a linear table with limited operations. The stack is last in first out, while the queue is first in first out.

alt

During the software running period, we often encounter the situation of insufficient resources: submitting a task to request thread pool execution, but the thread has been used up, the task needs to be put into the queue, and the first-in-first-out queue is executed; the thread needs to access the database during operation, and the database connection is limited , has been used up, the thread enters the blocking queue, when a database connection is released, wakes up a thread from the head of the blocking queue, and gets out of the queue to obtain a connection to access the database.

When I talked about the stack above, I gave an example of storing food in a big bucket. In fact, if you store food in this way, the bottommost food may never be eaten, and it will eventually expire.

The same is true in reality. When supermarkets place food on the shelves, they actually arrange them in queues, not in stacks. When the staff puts new food on the shelves, they always put the new food at the back, making the food a queue, so that the food that was put on the shelves before can be sold as soon as possible.

Tree

Arrays, linked lists, stacks, and queues are all linear tables, that is, each data element has only one predecessor and one successor. The tree is a non-linear table, the tree is like this.

alt

In software development, trees are also used in many places. For example, if we want to develop an OA system, the organizational structure of the department is a tree; when compiling the program we write, the first step is to generate the program code into an abstract syntax tree. Traditionally, tree traversal uses recursion, but I personally prefer to use the combination mode in design mode for tree traversal, which I will discuss in detail in the design mode section.

This article is published by mdnice multi-platform

Guess you like

Origin blog.csdn.net/qq_35030548/article/details/131179823