(Boolan) C++ STL and Generic Programming - Containers 2

For the standard library, the container is a very large piece of content, so I have talked about the content of list, vector, array, forward_list (slist) before, and there are many containers that have not been discussed, so I will leave the rest today. Scan all the containers, go through them all, and see what secrets are hidden behind them.

Container structure classification

Derivative relationship of Sequence Container

array（C++2.0）连续空间

vector连续空间

heap以算法形式呈现（xxx_heap()）

priority_queue

list双向链表

slistC++2.0中为forward_list,单向链表

about分段连续空间

stackContainer Adapter

queueContainer Adapter

Derivative relationship (composite) of Associative Containers

rb_tree红黑树，非公开

set

map

multiset

multimap

hashtable非公开

hash_set非标准，C++2.0为unordered_set

hash_map非标准，C++2.0为unordered_map

hash_multiset非标准，C++2.0为unordered_multiset

hash_mulitmap非标准，C++2.0为unordered_multimap

Container deque - a "surface continuous" space with two-way openings

On the surface, the graph of deque looks like hi as shown in the figure below. It can be considered as a continuous memory space. At the same time, deque can be expanded in both directions. Not only can you increase the memory space to the front, you can also increase the memory space to the rear.

Surface representation of the memory structure

But for actual memory, bidirectional expansion is a more complicated thing. Because, from the point of view of memory, memory is allocated to all running programs, so the memory space before or after our program, maybe the operating system has already allocated it to other programs for us, if, those spaces If it is reserved for our program, then it is a waste of space, and, how much is reserved for you, you still need to expand in two aspects. So, if the memory is designed in the same way as we feel, in fact, it is not easy to implement in actual operation.

Then it is necessary to design a set of effective methods to solve the problem of bidirectional expansion in such a memory space.

Then the actual solution of stl, the memory diagram, is as shown in the following figure.

How to actually manage memory

For deque, there is actually a continuous memory space. The bottom layer of his implementation is implemented by vector. In this continuous space, only these pointers are maintained . Each of these pointers points to a continuous block. Space, that is, buffer. In each buffer, the actual data is saved.

This can solve the problem raised before. If you expand, you only need to consider changing the pointer array in the continuous space of the vector. The increased space only needs to be placed in the corresponding buffer, and does not need to be affected to a great extent. The problem of continuous space.

So since the actual memory space is discontinuous, it becomes very important how to make the user not feel the actual design. Next, we will carefully study why deque can make users feel that he is continuous through such a design.

deque's iterator

The most critical part here is the part shown in the figure above, that is, the role of this iterator, so that the situation of "continuity is an illusion, and segmentation is a fact" can be realized. When the iterator runs to the boundary, it needs to detect whether the boundary is reached, and then manage the boundary buffer by returning to the vector that controls the buffer. In the iterator, cur, first, last and node respectively point to the current data when the user is using it, first points to the first space of the buffer, last points to the space after the buffer that is not in the buffer, and node points to the control buffer The actual position in the sequence of pointers.

The principle is almost the same, but what does the actual code look like, let's take a look.

//如果BufSiz不为0，则返回对应值，表示buffer size由开发者自己确定
//如果BufSiz为0，表示buffer size 由预设值决定。
template<class T, class Alloc = alloc, size_t  BufSiz = 0>
//T:存储的数据类型
//Alloc:分配器
//BufSiz:buffer的大小
class deque{
public:
      typedef T value_type;
      typedef __deque_iterator<T, T&, T*, BufSize> iterator;  //buffer size 是指每个buffer容纳的元素个数
//在接下来会给出__deque_iterator的源代码
protected:
      typedef pointer* map_pointer; //T**
protected:
      iterator start;
      iterator finish;
      map_pointer map;
      size_type map_size;
public:
      iterator begin(){return start;}
      iterator end() {return finish;}
      size_type size() const {return finish - start; }
...................

//确定BufSiz的大小，如果sz（sizeof(value_type)） < 512， 返回 512/sz
//如果sz>= 512，返回1
    inline size_t __deque_buf_size(size_t n, size_t sz){
        return n!=0? (sz<512?size_t(512/sz):size_t(1));
    }
}

//__deque_iterator的源代码
template<class T, class Ref, class Ptr, size_t BufSiz>
struct __deque_iterator{
    typedef random_access_iterator_tag  iterator_datagory
    typedef T value_type;
    typedef Ptr Pointer;
    typedef Ref reference;
    typedef size_t size_type;
    typedef ptrdiff_t difference_type;
    typedef T** map_pointer;
    typedef __deque_iterator self;

    //迭代器的数据部分，也就是之前的cur、first、last和node
    T* cur;
    T* first;
    T* last;
    map_pointer node;
............
}

//deque的插入问题
//元素插入的时候，因为是按顺序排列，如果插入在中间的位置，应该会改变其他元素的位置
//就相当于在在书架中插入一本书，肯定需要移动前后的书
//如果插入点，距离前端比较近，那么移动前端比较合适，效率较高
//如果插入点距离后端比较近，那么将插入点之后的元素向后移动比较快

//在postion处插入一个元素x
iterator insert(iterator postion, const value_type& x){
    if(postion.cur == start.cur)  //如果安插点是deque的最前端
    {
        push_front(x);  //直接使用push_front
        return start;
    }
    else if(postion.cur == finish.cur)  //如果安插点是deque的最末位
    {
        push_back(x);  //直接交给push_back
        iterator tmp = finish;
        --tmp;
        return tmp;
    }
    else
    {
        return insert_aux(postion, x);
     }
}

template <class T, class Alloc, size_t BufSize>
typename deque<T, Alloc, BufSize>::iterator_deque<T, Alloc, BufSIze>:: itert_aux(iterator pos, const value_type& x){
    difference_type index = pos - start;    //安插点之前的元素个数
    value_type x_copy = x;
    if(index < size() / 2){  //如果安插点之前的元素较少
        push_front(front());  //在最前端加入第一个元素同值的元素
        .......
        copy(front2, pos1, front1);  //元素搬移
    }
    else {    //安插点之后的元素较少
        push_back(back());//在尾端加入最末元素同值的元素
        ......
        copy_backward(pos, back2, back1);//元素搬移
    }
    *pos = x_copy;//在安插点上设定新值
    return pos;
}

How deques model continuous space

The main credit is the coordination of iterator

reference operator[](size_type n)
{
      return start[difference_type(n)];
}
reference front()
{
    return *start;
}
reference back()
{
    iterator tmp = finish;
    --tmp;
    return *tmp;
}
size_type size() const
{
    return finish - start; 
    // 此处内存不连续，说明操作符- 进行了重载
}
bool empty() const
{
    return finish == start;
}
reference operator* () const
{
    return *cur;
}
pointer operator->() const
{
    return &(operator*());
}
//两个iterator之间的距离相当于
//1.两个iterator之间的buffer的总长度
//2.加上itr至buffer末尾的长度
//3.加上x至buffer开头的长度
difference_type
operator- (const self& x) const
{
    return difference_type(buffer_size()) * (node - x.node - 1) + (cur - first) + (x.last - x.cur);
    //buffer size * 首尾buffer之间的buffer之间的数量 + 末尾（当前）buffer的元素量 + 起始buffer的元素量
}

//-- 和++ 的操作符重载
self& operator++()
{
    ++cur;  //切换至下一个元素
    if(cur == last){  //如果抵达缓冲区的末尾
        set_node(node + 1);  //就跳至下一个节点（缓冲区）的起点
        cur = first;  
    }
    return *this;
}
self operator++(int)
{
    self tmp = *this;
    ++*this;
     return tmp;
}

self& operator--()
{
    if(cur == first){
        set_node(node - 1);
        cur = last;
    }
    --cur;
    return *this;
}
self operator--(int)
{
    self tmp = *this;
    --*this;
    return tmp;
}

void set_node(map_pointer new_node)
{
    node = new_node;
    first = *new_node;
    last = first + difference_type(buffer_size));
}

self& operator+=(difference_type n ){
    difference_type offset = n + (cur - first);
    if(offset >= 0 && offset < difference_type(buffer_size())
    //目标位置在同一级缓存区
         cur += n;
     else{
       //目标位置不在同一级缓存区内
         difference_type node_offset = offset > 0? offset / difference_type(buffer_size()): -difference_type((-offset - 1) / buffer_size;
          //切换至正确的的缓存区
          set_node(node + node_offset);
          cur = first + (offset - node_offset * difference_type(buffser_size());
      }
      return *this;
}

operator+(difference_type n) const 
{
     self tmp = *this;
     return tmp += n;
}

self& operator-=(difference_type n)
{
    return *this += - n;
}
self operator-(difference_type n)
{
    self tmp = *this;
    return tmp -= n;
}
reference operator[] (difference_type n)const
{
    return *(*this + n);
}

Versions in GNU 4.9

UML

container queue

A deque is maintained internally, and some functions are opened to implement first-in, first-out.

template <class T, class Sequence = deque<T>>
class queue
{
............
public:
    typedef typename Sequence::value_type value_type
    typedef typename Sequence::size_type size_type
    typedef typename Sequence::reference reference;
    typedef typename Sequence::const_reference const_reference;
protected:
    Sequence c;  //底层容器
 public:
    bool empty() const{return c.empty();}
    size_type size() const{return c.size();}
    reference front() const {return c.front();}
    const_reference front() const{ return c.front();}
    reference back(){return c.back(); }
    const_reference back() const {return c.back();}
    void push (const value_type& x){ c.push_back(); }
    void pop(){c.pop.front();}
}

Container stack

A deque is maintained internally, and some functions are opened to implement first-in, first-out.

template <class T, class Sequence = deque<T>>
class stack
{
............
public:
    typedef typename Sequence::value_type value_type
    typedef typename Sequence::size_type size_type
    typedef typename Sequence::reference reference;
    typedef typename Sequence::const_reference const_reference;
protected:
    Sequence c;  //底层容器
 public:
    bool empty() const{return c.empty();}
    size_type size() const{return c.size();}
    reference top() const {return c.back();}
    const_reference top() const{ return c.back();}
    void push (const value_type& x){ c.push_back(); }
    void pop(){c.pop.back();}
}

Both stack and queue can choose list or deque as the underlying structure
Queue cannot choose vector as the underlying structure (there is no pop_front() function in vector, it is provided to the pop() function to call)
Neither stack nor queue allow traversal, nor provide iterator

rb-tree (red-black tree)

red-black tree

Red-Black tree is a kind of balanced binary search tree. The characteristics of balanced binary search tree are: regular arrangement, easy to find and insert, and can maintain a moderate balance. Generate deep subtrees

rb_tree provides "traversal" operations and iterators, traversing according to normal rules (++ite), you can get the sorted state.

We should not use rb_tree's iterator to change the value of elements (because of the strict arrangement of elements). The programming level does not prohibit this, but if the design is correct, because rb_tree is about to serve set and map (as its bottom support), and map allows the data of the element to change, only the key of the element cannot be changed at this time.

rb_tree provides two insertion methods: insert_unique() and insert_equal() The former means that the key of the node must be unique in the entire tree, otherwise the insertion will fail; the second means that the key of the node can be repeated.

Standard library implementation of rb_tree

template <class Key,
                class Value,
                class KeyOfValue,
                class Compare,
                class Alloc = alloc>
class rb_tree{
protected:
    typedef __rb_tree_node<Value> rb_tree<node;
    .....
public:
    typedef rb_tree_node* link_type;
......
protected:
    //rb_tree只以三种数据表现自己
    size_type node_count;  //rb_tree的大小
    link_type header;  //一个rb_tree_node的指针
    Compare key_compare;  //key的大小比较，应该是function object
     ..........
};

Structure of rb_tree after GNU4.9

UML

container set, multiset

Set and multiset use rb_tree as the underlying structure, so they have "the function of automatic sorting of elements". Features: The sorting is based on the key, and the key and value of the set and multiset elements are unified, and the value is the key;

Set and multiset provide "traversal" operations and iterators, traversing according to normal rules (++ite), you can get the sorted state

We cannot use the iterator of set and multiset to change the value of the element (because the key is particularly strict in arrangement rules). The iterator of set and multiset is the const_iterator of the red-black tree at the bottom, which is to prohibit developers from assigning elements.

The key of the element of the set must be unique, so insert() uses the red-black tree's insert_unique()
The key of the multiset element can be repeated, so the use of insert() uses the red-black tree's insert_equal()

template <class Key, class Compare = less<Key>, class Alloc = alloc>
class set{
public:
      //typedefs:
      typedef Key key_type;
      typedef Key value_type;
      typedef Compare key_compare;
      typedef Compare value_compare;
private:
    typedef rb_tree<key_type, value_type, identity<value_type<. key_compare, Alloc> rep_type;
     rep_type t;
 public:
      typedef typename rep_type::const_iterator iterator;  //此处为rep_type::const_iterator，所以不能够修改
..........
//set的所有操作，都调用底层rb_tree的函数，从这点看来，set实际应该为container adapter
}

set to rb_tree template

The principles of multiset and set are basically the same, and there are some differences in calling the insert part.

Container map and multimap

Map and multimap use rb_tree as the underlying mechanism. Therefore, there is a "feature of automatic arrangement of elements", and the sorting is based on the key.

map and multimap provide traversal operations and iterators. According to normal ++ite, you can get the sorted result

We can't use the iterator of map and multimap to change the key of the element (because the key is particularly strict in the ordering rules), but we can use it to change the data of the element. Therefore, map and multimap automatically set the key type specified by the developer to const, so that the developer can be prohibited from assigning the key of the element.

The key of the map element must be unique, so insert() uses the red-black tree's insert_unique()
The key of the multimap element can be repeated, so insert() uses the red-black tree's insert_equal()

template <class Key,class T,  class Compare = less<Key>, class Alloc = alloc>
class map{
public:
      //typedefs:
      typedef Key key_type;
      typedef T data_type;
      typedef T mapped_type;
      typedef pair<const Key, T> value_type;
      typedef Compare value_compare;
private:
    typedef rb_tree<key_type, value_type, select1st<value_type>, key_compare, Alloc> rep_type;
     rep_type t;
 public:
      typedef typename rep_type::const_iterator iterator;  //此处为rep_type::const_iterator，所以不能够修改
..........
//set的所有操作，都调用底层rb_tree的函数，从这点看来，set实际应该为container adapter
}

The map implements the key to find the element by overloading the operator []. The key of the multimap can be repeated, so without this method,
if you use [] to assign the value to the key that does not exist in the space, the element will be automatically added to the map.

hashtable

In order to facilitate the management of elements, for multiple elements, it can be divided by the size of the space where the elements can be placed, and the remainder is used as the number of the storage element. For different elements placed in the same position, there will be collisions. In order to avoid this problem, a linked list is used to organize repeated elements, which can avoid the problem of storing multiple elements in one space (separate chaining).

Memory diagram of hash table

The original management in this way solves the problem of space, but if the linked list is too long, the process of traversing the linked list will consume a lot of time, and the efficiency will become low, so how to solve it?

For the space for storing the linked list, make them into baskets, then, it is worth it with experience. If the number of elements exceeds the number of baskets, adjust the number of baskets and re-assign the positions of elements (rehashing)

The number of baskets is generally made of prime numbers as the number of baskets. When expanding, a prime number that is more than twice the original number of baskets will be selected as the number of baskets.

template<class Value, class Key, class HashFcn, class ExtractKey, class EqualKey, class Alloc = alloc>
class hashtable{
public:
      typedef HashFcn hasherl
      typedef EqualKey key_equal;
      typedef size_t size_type;
private:
      hasher hash;
      key_equal equals;
      ExtractKey get_key;
      typedef __hashtable_node<Value> node;
      vector<node*, Alloc> buckets;
      size_type num_elements;
public:
      size_type bucket_count() const{return buckets.size();}
...........
};

template<class Value, class Key, class HashFcn, class ExtractKey, class EqualKey, class Alloc = alloc>
struct __hashtable_iterator{
    .....
    node* cur;
    hashtable* ht;
};

template<class Value>
struct __hashtable_node{
  __hashtable_node* next;
  Value val;
};

The purpose of the hash Function (HashFcn)
: I hope that according to the value of the element, a hash code (a value that can be used for modulus operations) can be sorted out, so that the elements can be placed in the hashtable in a "random enough" hashtable after being mapped by the hash code, the more chaotic , the less likely to collide.

//泛化
template<class Key> struct hash{};
//特化
#define __STL_TEMPLATE_NULL template<>

__STL_TEMPLATE_NULL struct hash<char> {
    size_t operator()(char x) const {return x;}
};
__STL_TEMPLATE_NULL struct hash<short> {
    size_t operator()(short x) const {return x;}
};
__STL_TEMPLATE_NULL struct hash<unsigned short> {
    size_t operator()(unsigned short x) const {return x;}
};
.......

//char* 的hash function的特化版本
//标准库没有提供hash<std:string>的版本
__STL_TEMPLATE_NULL struct hash<char *> {
    size_t operator()(char* x) const {return __stl_hash_string(x);}
};
inline size_t __stl_hash_string(const char* s)
{
    unsigned long h = 0;
    for(; *s; ++s){
        h = 5 * h + *s;
    }
    return size_t(h);
}

C++ string (string) uses hash table as the underlying container, and needs to rewrite the hash function

unordered_set(hash_set)和unordered_multiset(hash_multiset)、unordered_map(hash_map)和unordered_multimap(hash_multimap)

The bottom layer of these containers is hashtable, so the basket of these containers must be larger than the number of elements