Data Structures and Algorithms - Q&A 2023

insert image description here

1. What is a hash table? How to resolve collisions?

Hash Table (Hash Table), also known as a hash table, is a data structure used to implement a dictionary (key-value pair) data structure. It maps the key to an index (bucket) in the hash table to hold the value. The main advantage of a hash table is that its lookup, insertion, and deletion operations have an average time complexity of O(1).

Hashtable implementations are usually based on a hash function that maps keys to a fixed-size index range (usually the size of the array). A collision occurs when two or more keys are mapped to the same index. Collisions are a major problem to be solved in hash table implementations.

There are two main methods for hash tables to resolve collisions:

链接法(Chaining):每个桶存储一个链表,当发生碰撞时,新的键值对可以添加到链表的末尾。这种方法的缺点是链表需要额外的内存来存储,同时在链表中查找或删除一个键值对的平均时间复杂度可能会增加。

开放地址法(Open Addressing):当发生碰撞时,新的键值对可以使用一种探测序列来寻找下一个可用的桶。这种方法的优点是没有额外的内存开销,同时可以更好地利用缓存,但是需要仔细选择探测序列,以便尽可能快地找到下一个可用的桶。开放地址法还可以分为线性探测、二次探测、双重散列等多种方式。

2、

3. What is the difference between size and capacity in vector?

In the vector container in C++ STL, size and capacity are two different concepts:

size:vector容器中存储的实际元素数量。

capacity:vector容器中分配的内存空间的大小,即当前vector容器最多可以存储多少个元素。

When the number of elements stored in the vector container exceeds the size of the currently allocated memory space, the vector container will automatically expand, allocate a larger memory space, and copy the original elements to the new memory space. Therefore, capacity is usually greater than or equal to size.

In vector, the size() function returns the number of elements actually stored in the vector container, and the capacity() function returns the size of the currently allocated memory space, that is, the maximum number of elements that can be stored in the current vector container.

4. What is the difference between an array and a linked list?

Arrays and linked lists are two commonly used data structures that differ greatly in their implementation and usage.

实现方式:

An array is a sequential storage structure, its elements are stored consecutively in memory, and the size of the array needs to be specified in advance when it is created.

A linked list is a dynamic storage structure whose elements can be discontinuous in memory, and each node usually contains a data element and a pointer to the next node.

插入和删除操作:

The insertion and deletion operations of the array are more troublesome, because the size of the array is fixed, if you need to insert or delete elements in the middle, you need to move other elements, which is inefficient.

The insertion and deletion operations of the linked list are relatively simple, and only need to modify the pointer. The time complexity of inserting or deleting an element in a linked list is O(1).

访问操作:

The access operation of the array is very fast, and the elements in the array can be directly accessed through the subscript, and the time complexity is O(1).

The access operation of the linked list is time-consuming, and the entire linked list needs to be traversed, and the time complexity is O(n), where n is the length of the linked list.

内存占用:

Since the elements of the array are stored continuously in the memory, its space utilization rate is higher than that of the linked list, but when the size of the array exceeds the pre-allocated memory, the memory needs to be reallocated, which may waste some memory space.

The elements of the linked list are not necessarily stored continuously in memory, so its space utilization rate is low, but the linked list can dynamically allocate memory without wasting too much memory space.

To sum up, arrays and linked lists have their own advantages and disadvantages, and the appropriate data structure should be selected according to the actual situation. In general, if you need efficient random access to elements, you can choose an array; if you need efficient insertion and deletion operations, you can choose a linked list.

5. What is the underlying implementation of map?

In C++ STL, map is an associative container, and its underlying implementation is usually implemented using a red-black tree (Red-Black Tree). The red-black tree is a self-balancing binary search tree, which can perform operations such as insertion, search, and deletion within the time complexity of O(log n), ensuring the high-efficiency performance of the map container.

The basic properties of red-black tree:

每个节点不是红色就是黑色。
根节点是黑色的。
每个叶子节点(NIL节点,空节点)是黑色的。
如果一个节点是红色的,则它的两个子节点都是黑色的。
对于每个节点,从该节点到其所有后代叶子节点的简单路径上,均包含相同数目的黑色节点。

Operations such as insertion and deletion of the red-black tree are realized by rotating and changing node colors to ensure that the red-black tree always meets the above basic properties.

In actual use, the map container can store key-value pairs, and will be sorted according to the size of the key. When using operator[] to access the elements in the map, a lookup operation will be performed, and the time complexity is O(log n).

In short, the map container uses a red-black tree as the underlying implementation, which ensures efficient operations such as insertion, search, and deletion, and can be automatically sorted. It is a very practical data structure.

6. What about iterator failure?

Iterator failure means that when the iterator is used to traverse the container, the element pointed to by the iterator or the container itself is deleted or moved due to the modification operation of the container, so that the iterator can no longer correctly access the elements in the container or the pointed position changes. Not sure.

The problem of iterator failure is a common problem in the use of containers, mainly in the following situations:

插入元素导致迭代器失效

When inserting an element, the position of the original element may change, thus invalidating the iterator. For example, when inserting elements into a vector, if the size of the vector has reached its capacity, the vector will reallocate memory and copy the original elements to the new memory space, causing the original iterator to fail.

删除元素导致迭代器失效

When an element is deleted, the position of the original element may change, which invalidates the iterator. For example, when using the erase() function of the list container to delete an element, the positions of all elements behind the element will be moved forward by one position, thus causing the original iterator to become invalid.

改变容器大小导致迭代器失效

When the size of the container is changed, the position of the elements inside the container may change, thus invalidating the iterator. For example, when using the resize() function of a vector container to change the size of the container, if the size of the container becomes smaller, the elements behind the end of the container will be deleted, causing the original iterator to become invalid.

In order to avoid the problem of iterator invalidation, it is generally recommended to copy the elements to be accessed to another container before modifying the container, and then copy them back to the original container after modification. In addition, you can also use the return value of some container operation functions to avoid the problem of iterator invalidation, such as using the erase() function of the list container to return the iterator after the element is deleted.

7. What is the time complexity of bubble sort and quick sort? Implementation principle?

Bubble Sort

Bubble sorting is a simple sorting algorithm. Its basic idea is to compare the elements to be sorted in pairs, and move the larger elements backward until the largest element is moved to the end of the array. The time complexity of bubble sort is O(n^2).

Implementation principle:

遍历数组,对于数组中的每一个元素,都和它后面的元素比较。
如果当前元素比后面的元素大,则交换它们的位置,使较大的元素向后移动。
重复上述操作,直到所有元素都按照从小到大的顺序排好

quick sort

Quick sort is a sorting algorithm based on divide and conquer. Its basic idea is to select a reference element and divide the array into two parts, so that the elements on the left are all less than or equal to the reference element, and the elements on the right are greater than or equal to the reference element. Then recursively do a quicksort on the left and right parts. The time complexity of quick sort is O(nlogn).

Implementation principle:

选取一个基准元素,通常选择数组的第一个元素或最后一个元素。
从数组的两端开始搜索,找到第一个比基准元素大的元素和第一个比基准元素小的元素,然后交换它们的位置。
继续从上一步停止的位置开始搜索,直到搜索完整个数组。
将基准元素与搜索结束时第一个比基准元素小的元素交换位置,这样基准元素就位于数组的中间位置,左边的元素都小于等于它,右边的元素都大于等于它。
递归地对左右两部分进行快速排序。

Both bubble sorting and quick sorting are common sorting algorithms. Bubble sorting is simple but has a high time complexity, and is suitable for sorting small-scale data; while quick sorting is more complicated, but has a lower time complexity, and is suitable for large-scale data sorting. Sorting of scale data.

8. Algorithm - reverse string?

9. Algorithm - the nearest common ancestor of the binary tree?

10. What is the difference between vector and list?

Vector and list are two common C++ containers with the following differences:

底层实现不同

The bottom layer of the vector uses a continuous memory space, which is implemented by an array. The bottom layer of the list uses a doubly linked list, which is implemented through pointers.

随机访问效率不同

Since the bottom layer of vector uses an array, it can randomly access elements through subscripts, and the time complexity is O(1); while list does not support subscript access, and can only be accessed sequentially through iterators, and the time complexity is O( n).

插入和删除效率不同

Since the bottom layer of vector uses an array, when inserting or deleting elements in the middle, it is necessary to move the following elements backward or forward, and the time complexity is O(n); and because the bottom layer of list is a linked list structure, inserting or Deleting an element only needs to change the pointers of adjacent nodes, and the time complexity is O(1).

内存使用效率不同

Because the bottom layer of vector uses an array, it needs to pre-allocate a certain size of memory space. When expansion is required, it needs to reallocate memory and copy the original elements to the new memory space, resulting in a waste of memory space; and because the bottom layer of list is Linked list structure, its memory usage efficiency is relatively high.

According to specific needs, choosing different containers can make the code more efficient and easier to implement. For example, when you need to access elements randomly, you can use vector; when you need to frequently insert and delete elements, you can use list.

11. What are the characteristics of red-black tree?

Red-black tree is a self-balancing binary search tree, which has the following characteristics:

节点是红色或黑色。根节点是黑色,所有叶子节点(NIL节点)都是黑色。

每个红色节点的两个子节点都是黑色的,即不存在连续的红色节点。

从任意一个节点到其叶子节点的所有路径都包含相同数目的黑色节点。

新插入的节点都是红色的。

通过旋转和变色操作来维持红黑树的平衡。

These characteristics of the red-black tree ensure its balance and search efficiency. In the red-black tree, each node only needs to perform at most two rotation operations to achieve balance, so the time complexity of its insertion, deletion, search and other operations is O(log n).

Red-black trees are often used in the underlying implementation of containers such as set and map in C++ STL.

12. What is the difference between binary search tree, balanced binary tree and red-black tree?

Binary search tree, balanced binary tree and red-black tree are commonly used tree data structures, and their main differences are in the following aspects:

结构不同

A binary search tree is a binary tree, each node has at most two child nodes, and the value of the left child node is less than the value of the parent node, and the value of the right child node is greater than the value of the parent node.

A balanced binary tree is a binary search tree, but when inserting or deleting nodes, it will maintain balance through rotation or other operations, that is, the height difference between the left and right subtrees does not exceed 1.

A red-black tree is a self-balancing binary search tree that maintains balance by marking nodes as red or black, and through operations such as rotation and color transformation.

平衡性不同

The balance of the binary search tree is poor, and it may degenerate into a linked list, resulting in a time complexity of O(n) for operations such as search, insertion, and deletion.

A balanced binary tree can ensure that the height difference between the left and right subtrees does not exceed 1, so the time complexity of operations such as search, insertion, and deletion is O(log n).

The red-black tree can ensure the balance of the tree by maintaining the rules such as the number and color of red and black nodes, and operations such as rotation and color transformation are less than the rotation operations of a balanced binary tree, so the time complexity of operations such as search, insertion, and deletion Also O(log n).

存储结构不同

The node structure of the binary search tree is relatively simple, usually only need to store a value and two pointers.

The node structures of balanced binary trees and red-black trees are relatively complex, and additional information needs to be stored to maintain balance or red-black node rules.

Based on the above differences, if there are fewer data insertion and deletion operations, but frequent searches are required, you can choose a binary search tree; if the insertion and deletion operations are frequent and need to be balanced, you can choose a balanced binary tree; if you need to support efficient search, Insert and delete operations, and need to maintain balance, you can choose a red-black tree.

13、

14. What is the difference between B-tree and B+ tree?

Both B-tree and B+ tree are commonly used balanced multi-way search trees. Their main differences lie in the following aspects:

节点结构不同

The nodes of the B tree usually contain keywords and pointers to subtrees, while the nodes of the B+ tree only contain keywords, and all data are stored in the leaf nodes.

存储方式不同

The nodes of the B-tree can store data or not, and the data can be stored in any node; while all the data of the B+ tree are stored in the leaf nodes, and the non-leaf nodes are only used for indexing and do not store data.

叶子节点的链表结构不同

In the B-tree, all leaf nodes do not need to be connected to form a linked list, while in the B+ tree, all the leaf nodes form an ordered linked list through pointers, which is convenient for range search.

搜索方式不同

In the B-tree, if a certain keyword is found on a non-leaf node, you can continue to search directly through the subtree pointed to by the node; while in the B+ tree, all data is stored in the leaf node, so you only need Just search for leaf nodes.

Based on the above differences, B-trees are suitable for random reading and modification, while B+ trees are suitable for range search and sequential traversal. Therefore, B+ trees are often used in application scenarios that need to support fast range search, such as database indexes and file systems.

15. Thread safety of stl container?

Containers in the STL (Standard Template Library) are generally not thread-safe, which means that if multiple threads access the same container at the same time, and at least one thread writes to the container, there is a possibility of data races and non-deterministic the behavior of.

In a multi-threaded environment, the following methods can be adopted to ensure the thread safety of the container:

采用互斥锁:在每个线程访问容器之前,先获取一个互斥锁,并在访问完成后释放锁。这种方法可以保证同时只有一个线程访问容器,从而避免数据竞争。

采用读写锁:如果读操作比写操作频繁,可以采用读写锁来提高并发性能。读写锁允许多个线程同时进行读操作,但在写操作时会阻塞其他线程的读写操作。

使用线程安全的容器:一些库(如C++11及以后版本的标准库)提供了线程安全的容器,这些容器可以同时被多个线程访问,而不需要额外的同步机制。这些容器一般采用锁或其他并发控制机制来保证线程安全性。

16. Algorithm - delete the nodes of the linked list?

17. Algorithm - merge two linked lists?

18. Algorithms - commonly used sorting algorithms, which ones are stable? Which are unstable?

19. Algorithm - the principle and implementation of quick sort?

Quick Sort (Quick Sort) is a commonly used sorting algorithm, which uses the idea of ​​​​divide and conquer to achieve sorting by recursively dividing the array into smaller sub-arrays. The core idea is to select a reference element, place the elements smaller than or equal to the element in the array on the left, and the elements greater than the element on the right, then recursively sort the left and right sub-arrays, and finally complete the sorting of the entire array .

The implementation steps of quick sort are as follows:

选择一个基准元素(pivot),一般选择数组的第一个元素或最后一个元素。
将数组中小于等于基准元素的元素放在左边,大于基准元素的元素放在右边,这个过程叫做分区(partition)。
对左右两个子数组分别递归地进行快速排序。

The time complexity of quicksort is O(nlogn), where n is the length of the array. In the worst case, the time complexity of quick sorting may degenerate to O(n^2). For example, when the array is already sorted or reversed, only one element can be reduced each time. To avoid this situation, randomized quicksort or other optimization methods can be used.

20. Algorithm - Two stacks implement queue?

Using two stacks to implement a queue, you can complete the operation of the queue by using one stack as an input stack and the other stack as an output stack.

The specific implementation method is as follows:

当需要插入一个元素时,将元素压入输入栈中。
当需要删除队首元素时,如果输出栈不为空,直接弹出输出栈的栈顶元素;否则,将输入栈中的所有元素依次弹出并压入输出栈中,然后再弹出输出栈的栈顶元素。

21. How is the Tower of Hanoi realized?

The Towers of Hanoi problem can be easily solved using a recursive algorithm, where the recursive function hanoi(n, A, B, C) represents the process of moving n plates from A through B to C:

当n等于1时,直接把盘子从A移动到C;
当n大于1时,先将上面n-1个盘子从A通过C移动到B;
然后将最下面的盘子从A移动到C;
最后将上面n-1个盘子从B通过A移动到C。

22. Design an optimal algorithm to find all possible subsets of a set

23、

24、

25、

26、

27、

28、

29、

30、

Guess you like

Origin blog.csdn.net/qq_45908742/article/details/129087958