STL source code analysis (5) - associative container

1. Overview

The so-called associative container is conceptually similar to a relational database. Each piece of data has a key and a real value. When an element is inserted into an associative container, the internal structure of the container (may be a red-black tree) and hash table) according to the size of the key value, the element is placed in the appropriate position according to a certain rule, associative containers have no so-called head and tail (only the largest element and the smallest element).

1.1, tree

It consists of nodes and edges. The top is the root node. Each node has directional edges (directed edges), which are used to connect with other nodes. A leaf node without child nodes is called a leaf node . If there are at most two child nodes, it is called a binary tree. Different nodes are siblings of each other if they have the same parent node . The length of the path from the root node to any node, that is, the depth of the node, and the depth of the root node is 0. The length of the path from a node to its deepest child is called the height of the node. The height of the entire tree is represented by the height of the root node.

1.2, binary search tree

A binary search tree can provide element insertion and access in logarithmic time. The node placement rule of a binary search tree is: the key value of any node must be greater than the key value of each node in its left subtree, and less than the key value of any node in its right subtree. Therefore, going all the way from the root to the left is the smallest, and going all the way to the right is the largest.
Insertion and removal operations are cumbersome. Starting from the root node, the key value is larger to the left, and the key value is smaller to the right until the end is the insertion point. insert image description here
As shown in the figure, there are two cases for deleting node A. If A has only one child node, connect the child node of A to the parent node of A, and delete A. If A has two children, replace A with the smallest node in the right subtree. (The minimum point of the right subtree is to find the left node from the right node).insert image description here
insert image description here

1.3 Balanced Binary Search Tree

Because the input value is not random enough and other reasons, after some insertion and deletion operations, the binary search tree may be out of balance, resulting in reduced search efficiency, as shown in the figure. , insert image description here
"Balance" roughly means: no node is too deep, different balance conditions create different efficiency performance and different implementation complexity. There are several special structures such as AVL-tree, RB-tree, AA-tree that can implement balanced binary search tree, they are more complex than general binary search tree, therefore, the average time of insertion and deletion is longer, but they can be By avoiding high imbalances, search time is typically reduced by 25%.

1.4 AVL-tree

AVL-tree is a binary search tree with additional balancing conditions. The balancing conditions are established to ensure that the depth of the tree is O(logn). Intuitively, the optimal balance condition is that the left and right subtrees of each node have the same height, but this is too harsh, so AVL-tree requires that the height difference between the left and right subtrees of any node is at most 1.
As shown in the AVL-tree, after inserting node 11, the gray node violates the balance condition of the ATL tree, so adjust the deepest node to rebalance the tree.
insert image description here
Adjust the "insertion point to the root node" path, the deepest one of the nodes after the balance state is destroyed, so that the entire tree can be rebalanced. Assuming that the deepest node is X, since the node has at most two child nodes, the balance is destroyed, which means that the height difference between the left and right subtrees is 2, so there are four cases:
1. The insertion point is located in the left subtree of the left child node of X— - Left and right.
2. The insertion point is located in the right subtree of the left child of X - left and right.
3. The insertion point is in the left subtree of the right child of X - right left.
4. The insertion point is located in the right subtree of the right child node of X - right right.
1 and 4 are symmetrical to each other, and 2 and 3 are symmetrical to each other.
1, 4 are called lateral insertions and are resolved by a single rotation. 2, 3 are called medial insertions and are resolved by double rotation.
insert image description here

1.4.1 Single rotation

As shown in the figure, we want A to increase by 10% and C to decrease by 10%. Lift K1 up, make K2 slide down naturally, and hang the B subtree to the left of K2. The rules of the binary search tree show that k2>k1, so k2 must become the right child node of the new tree k1, and all the B subtrees The key value of the node must be between k1 and k2, so the new tree B must fall to the left of K2.
insert image description here

1.4.2 Double rotation

The left side of the picture shows the unbalanced state caused by the inner insertion, which cannot be solved by a single rotation. First, we cannot use k3 as the root node. In fact, we cannot make a single rotation of k3 and k1, because the rotation is still unbalanced. The only possibility is that k2 is the new root node, which makes k1 must be called the left child of k2 and k3 must be called the right child of k2. The new tree shape satisfies the equilibrium condition of the AVL-tree, and, like the single rotation case, it restores the height before the node was inserted.
insert image description here

1.5、RB-tree

The red-black tree needs to meet the following rules:
1. Each node is either red or black
. 2. The root node is black
. 3. The node is red. The child nodes must be black
. 4. Any path from any node to NULL must contain black nodes. sameinsert image description here

1.6、hashtable

Binary search trees have log-averaged time performance, but such performance is built on the assumption that the input data is sufficiently random. Hashtable also has "constant average time" performance in operations such as insertion, deletion, and search. This performance is based on statistics and does not depend on the randomness of input elements.
Hashtable can access and delete any named item. Since the operation object is a named item, hash_table is regarded as a classical structure (dictionary), which is intended to provide basic operations of "constant average time", similar to Like stack and queue.
Array search can achieve a search with a time complexity of 0 (1), but there are two problems:
(1) When the range of elements is large, the range of the array is large, and the memory is huge
(2) What if the non-numeric type of string cannot be used as an index
? Avoid using huge arrays: use some kind of mapping function that maps large numbers to decimals, so that mapping an element to an "acceptable size index" is called a hash function (hash function), such as x is any integer, TableSize is the array size, then X%TableSize will get an integer in the range of 0~TableSize-1.
Using the hash function will bring a problem: there may be different elements mapped to the same position, because the number of elements is greater than the capacity of the array, this is the collision problem, and the methods to solve the collision problem include linear detection, secondary detection, open chain, etc.
Linear Probing: Load FactorIt refers to the number of elements divided by the size of the table, and the load factor is always between 0-1, unless an open chain is used. When the hash function calculates the insertion position of an element, and the space at this position is no longer available, the easiest way is to search for the next one in sequence until the available space is found. As long as the table is large enough, one can always be found. Space, search is the same. However, the deletion of elements must be lazy deletion. Only marking the deletion number is not enough for the actual operation. The actual operation waits for the table to be reorganized, because each element in the hash table not only expresses itself, but also affects the arrangement of other elements.
Linear detection worst-case patrols the entire table, average case patrols half the table, and is far from constant time. In fact, unless the new element falls on #4-#7, the position #4-#7 can never be used, and any of the new elements 8, 9, 0, 1, 2, 3 will fall on 3, 3 , 5, 6, and 7 will fall into their respective positions, and the growth rate of the average insertion cost is much higher than the growth of the load factor, which is called
the main group
.
The secondary detection mainly solves the main group problem. Named because the equation F(i) = i^2 that solves the collision problem is a quadratic equation, and if the hashfuction calculates the position of H, we try the trivial H+i in order instead of linear detection.
Using this method, when the table size is a prime number and the load factor is always kept below 0.5, it can be determined that the number of probes required for each insertion of a new element is no more than 2.
Secondary probes can eliminate the main group, and the lack of it may cause subgroups . The two elements have said that the positions calculated by hashfuction are the same, and the positions of the probes during insertion are also the same, causing a kind of waste. Double hashing can solve the subgroup problem.
Open chain : each table element maintains a list, the hash function allocates a list, and performs insertion, deletion and search on the list, the list is short enough and the speed is fast enough.insert image description here

2、set

2.1 Common Interfaces

begin()
end()
rebgin()
rend()
lower_bound() //delete the upper bound of an interval
upper_bound() //delete the lower bound of an interval
find() //find //use clear()
with the iterator below /
/Delete all elements in the set container
empty() //Determine whether the set container is empty
max_size() The maximum number of elements the set may contain
size() //Returns the number of elements in the set container

2.2 Implementation

#include<iostream>
#include<set>
#include<iterator>
using namspace std;
void print(set<int> s){
    
    
	cout<<"正向迭代器" << endl;
	set<int>::iterator it1 = s.begin();
	while(it != s.end()){
    
    
		cout << * it << endl;
		++it;
	}
	cout << endl;
}
void testSet()
{
    
    
	set<int> s1;
	for(int i = 0; i < 10; i++){
    
    
		s1.insert(i * 10);
	}
	//输出pos
	set<int>::iterator pos = s1.find(20);
	if(pos != s1.end())
		s1.erase(pos);
	s1.erase(30);
	//删除15-50之间的数
	set<int>::iterator low = s1.lower_bound(15);
	set<int>::iterator up = s1.uppder_bound(50);
	s1.erase(low, up);
	print(s1);
	cout <<endl;
	if(s1.count(60)) cout << "60存在" << endl;
	s1.clear(); print(s1);
}

3、map

3.1 Common Interfaces

The map stores key-value pairs, in which the key acts as an index and the
value represents the data related to the index. For example, a dictionary is an example of a practical map. The word is used as the key, and the explanation is used as the value
begin()
end()
count()
empty()
end()
equal_range()
erase()
find()
get_allocator()
insert()
key_comp()
lower_bound()
max_size()
rbegin()
rend()
size()
swap() //swap two maps
upper_bound( ) //return the key value > the first position of the given element
value_comp() //return the value function of the compared element

#include<map>

map<int, string> map1;
map1[3] = "Saniya";
map1.insert(map<int,string>::value_type(2,"Diyabi"));
map1.insert(pair<int, string>(1, "sijgj"));
map1.inset(make_pair<int, string>(4, "v4"));
string str = map1[4]; //当key不存在返回空字符串
auto iter_map = map1.begin(); //根据迭代取首地址
int key = iter_map->first;
string value = iter_map->second;
map1.erase(3);
map1.clear();
map1.size();
map1.clear();
//遍历
for (map<int,string>::iterator iter=map1.begin(); iter !=map1.end(); iter++)
{
    
    int keys = iter->first; string value = iter->second;}

4, multi_map multi_set can be repeated

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324454233&siteId=291194637