C++ STL—vector,map,set,list,deque等

What is STL

STL is the Standard Template Library, including algorithms, containers, and iterators.

  • Algorithms: including common algorithms such as sorting and duplication
  • Container: data storage form, including serial container and associative container, serial container is list, vector, associative container is set, map, etc.
  • The iterator traverses the container without exposing the internal structure of the container

Which is better, iterator ++it or it++

The former returns a reference, and the latter returns an object; the former does not generate a temporary object, and the latter generates a temporary object, resulting in reduced efficiency

// ++i实现代码为:
int& operator++()
{
    
    

  *this += 1;
  return *this;

} 
//i++实现代码为:                 
int operator++(int)                 
{
    
    
int temp = *this;                   

   ++*this;                       

   return temp;                  
} 


Iterator five member types

  • value type: Indicates the type of element pointed to by the iterator.
  • difference type: Indicates the distance between two iterators, usually a signed integer.
  • pointer: A pointer type representing the element type pointed to by the iterator.
  • reference: Indicates the reference type of the element type pointed to by the iterator.
  • iterator category: Indicates the type of iterator, including input iterator, output iterator, forward iterator, bidirectional iterator and random access iterator.

Hashtable implementation in STL

STL uses the open chain method to solve the hash conflict problem.
The list maintained by the bucket in hashtable is neither list nor slist, but a linked-list composed of hashtable_node data structure defined by itself, and the bucket aggregate itself uses vector (because vector itself has the ability of dynamic expansion) for storage . The hashtable iterator only provides forward operation, not backward operation.

  • In terms of the number of hashtable design buckets, 28 prime numbers [53,97,193…429496729] are built in. When creating a hashtable, select a prime number greater than or equal to the number of elements as the capacity of the hashtable according to the number of stored elements.

If the number of elements in the inserted hashtable exceeds the bucket capacity, the table needs to be rebuilt, that is, the next prime number is found, a new buckets vector is created, and the position of the elements in the new hashtable is recalculated.

The specific implementation of the open chain method

When inserting an element, the hash value of the element is subtracted from the size of the hash table to obtain an index value, and then the element is inserted into the linked list corresponding to the index. If there is already an element with the same key as the element in the linked list, insert the new element into the head of the linked list and become the new head of the linked list; otherwise, insert the new element into the head of the linked list.

  • When looking for an element, first calculate the hash value according to the element key, and then search in the corresponding linked list, and return the address of the element if the element is found, otherwise return a null pointer.

  • When deleting an element, the element is first searched in the corresponding linked list, and if found, the element is deleted from the linked list. If the linked list is empty after deletion, set the hash slot to empty (that is, point to a null pointer).

What is RAII

Resource acquisition is initialization, that is, to apply for allocation of resources in the constructor, and to release resources in the destructor. Smart pointers are the most representative implementation of RAII, so don't worry about memory leaks caused by forgetting to delete

Two-level space configurator for STL

Why do you need a secondary space configurator?

When dynamically allocating memory, you need to apply on the heap. However, when applying frequently, it is easy to cause a lot of external fragments and reduce efficiency. Therefore, a secondary space configurator is set up: when the allocated memory is less than or equal to 128 bytes, a small memory is allocated

Secondary space configurator implementation process

  1. Maintain 16 linked lists, from 8 bytes to 128 bytes, pass in a byte parameter, and then the system selects the linked list at the corresponding position, and pulls out if the linked list is not empty.
  2. If the linked list is empty, check whether the memory pool is empty. If it is not empty, if the remaining space has the size of 20 nodes (required memory * 20), then take 20 node-sized spaces from the memory pool and allocate one of them For the user, the other 19 are hung under the free_list; if there are not enough 20, but enough for 1, then take out one and allocate it to the user, and the remaining space is hung under the free_list; if the size of a node is not satisfied, the remaining The space hangs in the free_list, and then apply for memory
  3. If the memory pool is empty, use malloc() to apply for memory from the heap [the memory size requested at one time is 2 * the required node memory size (after upgrading) * 20 + an extra space], half of which is used, usually placed in memory pool
  4. If malloc fails, the secondary space configurator will search from the free_list that is larger than the required node space, and pull out a node to use. If it has not been found, then use the primary adapter.

Call the deallocate function when freeing, if the freed space is greater than 128, then call the first-level space configurator, otherwise directly hang the memory block on the appropriate position of the free linked list.

Secondary Space Configurator Cons

  • cause internal debris
  • All members in its implementation are static, so all the memory it applies for will only be released when the process ends and returned to the operating system. If you continue to open up small memory, the entire space on the heap will eventually be hung on the free linked list. At this time, if you try to open up a large block of memory, you will fail; if many memory blocks on the free linked list are not used and the current process occupies too much memory At this time, other processes cannot apply for space on the heap, nor can they use the free memory of the current process, which will cause many problems.

underlying implementation

remove element from container

sequence container

The erase iterator not only invalidates the deleted iterator pointed to, but also invalidates all iterators after the deleted element (except list), so the erase(it++) method cannot be used, but the return value of erase is the next valid iteration device;

Because elements are continuous in memory, deleting an element will cause all elements behind this element to move forward by one position, which will cause the iterator originally pointing to the following element to point to the wrong position.
For a list, the elements are stored in a scattered manner, and deleting an element will not affect the position of other elements.

It = c.erase(it);

associative container

The erase iterator is only invalid for the iterator of the deleted element, but the return value is void, so the iterator should be deleted by means of erase(it++);
c.erase(it++)

vector

Allocation space and underlying implementation

  1. The operation method is similar to that of array, but vector maintains a continuous linear space. When the space is insufficient, it can automatically expand the space.
  2. When expanding space, you need to: reconfigure space, move data, and free up space . In Windows, it is expanded to 1.5 times, and under Linux+GCC, it is doubled
  3. Why expand twice or multiply? Experiments have shown that the use of doubling expansion can ensure constant time complexity, while increasing the specified size and capacity can only achieve O(n) time complexity.

free up space

All memory space is reclaimed when the vector is destructed. If you need to dynamically shrink the space, you can consider using deque. If you want to clear memory, you can use swap.

The clear function only clears the content, and cannot change the capacity; remove generally does not change the container size. Member functions such as pop_back() and erase() will change the size of the container. If you want to change the capacity size, you can use deque

swap

#include <iostream>
#include <vector>

int main() {
    
    
    std::vector<int> vec1 = {
    
    1, 2, 3};
    std::cout << "vec1 size: " << vec1.size() << ", capacity: " << vec1.capacity() << std::endl;

    std::vector<int> vec2 = {
    
    4, 5, 6, 7};
    std::cout << "vec2 size: " << vec2.size() << ", capacity: " << vec2.capacity() << std::endl;

    vec1.swap(vec2);

    std::cout << "After swap:" << std::endl;
    std::cout << "vec1 size: " << vec1.size() << ", capacity: " << vec1.capacity() << std::endl;
    std::cout << "vec2 size: " << vec2.size() << ", capacity: " << vec2.capacity() << std::endl;

    return 0;
}

delete element

dynamic growth

When the size and capacity are the same, it means that the current space of the vector has been used up, and adding new elements will cause the dynamic growth of the vec space.
Since dynamic growth will cause reallocation of space and copying of the original space, these processes will reduce efficiency. Therefore, you can use reserve(n) to pre-allocate a large memory space of a specified size. Only when n>capacity, call reserve(n) to change the vector capacity; when the memory space of the specified size is not used up, Space will not be reallocated.

void resize(size_type __new_size, const _Tp& __x) {
    
    
      if (__new_size < size()) 
            erase(begin() + __new_size, end());
      else
            insert(end(), __new_size - size(), __x);
      }


Resize and erserver comparison
#include <iostream>
#include <vector>

int main() {
    
    
  std::vector<int> my_vector;

  // 在插入元素之前,预留 10 个 int 类型的空间
  my_vector.reserve(10);

  for (int i = 0; i < 10; ++i) {
    
    
    my_vector.push_back(i);
  }

  std::cout << "Vector size: " << my_vector.size() << '\n';//10
  std::cout << "Vector capacity: " << my_vector.capacity() << '\n';//10

  return 0;
}
  • When the space size is insufficient, the size of the newly allocated space is twice the size of the original space
  • After using reserve to pre-allocate a block of memory, it will not cause reallocation if the space is not full
  • When the space allocated by reserve is smaller than the original space, it will not be reallocated
  • resize only changes the number of containers, not the container size
  • Using reserve(size_type) just expands the capacity value. These memory spaces may still be "wild". If you use "[ ]" to access at this time, it may go out of bounds. And resize(size_type new_size) will actually make the container have new_size objects.

emplace_back

The function is to insert a new element at the end of the container, and the value of this element is constructed directly in the container by the given parameters.
Elements can be constructed directly in the container without first creating an object and then copying or moving it into the container, avoiding move and copy overhead

It should be noted that emplace_back() may throw an exception when constructing elements, which will cause the state of the container to become undefined, so you need to be careful when using emplace_back() to ensure that the provided parameters can successfully construct a new Elements.

map and set

The bottom layer is implemented by a red-black tree. Insertion and deletion operations are completed within O(logn) time, and are automatically sorted. Traversing according to the middle order will be an ordered traversal. map is key+value, set is value.

The difference between map and set

set only provides an interface of one data type, but it will assign this element to key and value, and its compare_function uses the identity() function, which is what is input and what is output, so that the set mechanism is realized. The key and value of set are actually the same. In fact, he saves two elements, not just one element

map provides two types of data interfaces, which are placed in the key and value positions respectively. Its comparison function uses the compare function () of the red-black tree, and it does store two elements.

Map implementation principle

Once the key of the map is determined, it cannot be modified, but the value corresponding to the key can be modified, so the iterator of the map is neither a constant iterator nor a mutable iterator.
The bottom layer of the map is also a red-black tree. When it is constructed, it uses an ascending sort key by default, and also uses the alloc configurator to configure the space size. It should be noted that when inserting elements, the insert_unique() method in the red-black tree is called instead of insert_euqal() (used by multimap)

subscript operation

It should be noted that the subscript (subscript) operation can be used as an lvalue (to modify the content) or as an rvalue (to obtain a real value).
First make an element based on the key value and real value. The real value of this element is unknown, so a temporary object of the same type as the real value is generated instead, and then this object is inserted into the map, and a pair is returned. The pair is the first The first element is an iterator, pointing to the new element currently inserted. If the insertion is successful, it returns true. At this time, the corresponding lvalue is used to insert the real value according to the key value. If the insertion fails (repeated insertion), false is returned. At this time, the existing element is returned, and its real value can be obtained.Since this real value is passed by reference, it can be used as an lvalue or an rvalue

How is the map expanded?

First apply for a larger memory space, and then copy the original elements to the new memory space one by one. During the copying process, the key-value pair of each element will be stored in a new location according to the in-order traversal order of the red-black tree, and the old space will be released.

set bottom layer and iterator

  • The bottom layer is a red-black tree, and almost all operations are operations that use red-black trees.
  • set does not allow iterators to modify the value of elements, and iterators are a kind of consstance iterators.

unordered_map和unordered_set

They are all unordered, neither sorted by size nor by insertion order. The bottom layer of unordered_map is hash_table.

The elements in the hash_table table are called buckets, and vector is selected as the basic container for storing bucket elements, because the vector container itself has the ability of dynamic expansion.
Why is sunordered_set not sorted by insertion order? Because it is organized into buckets according to the hash value hash value, in order to quickly access an element

multimap、unordered_multimap 和multiset 、unordered_multiset

mutimap: the bottom layer is a red-black tree, which can be
repeated unordered_mutimap: the bottom layer is a hash table, unordered, and can be repeated
mutiset: the bottom layer is a red-black tree
, ordered, and repeatable

Unordered_map and map application scenarios

map is suitable for the application scenario of ordered data, and unordered_map is suitable for the application scenario of efficient query

list and slist

list

list is not only a doubly linked list, but also a circular doubly linked list, so only one pointer is needed.
list is a bidirectional iterator: Bidirectional iterators.

space management

By default, alloc is used as the space configurator. In order to use the node size as the configuration unit for convenience, a list_node_allocator function is defined to configure multiple node spaces at one time.

slist

slist is a one-way linked list, and the iterator is a forward iterator, which consumes less space and operates faster. forward_list appears in C++11, and the difference from slist is that there is no size() method. The C++ standards committee did not adopt the name slist.

therefore

The data structure of deque is as follows

class deque
{
    
    
    ...
protected:
    typedef pointer* map_pointer;//指向map指针的指针
    map_pointer map;//指向map
    size_type map_size;//map的大小
public:
    ...
    iterator begin();
    iterator end();
    ...
}

There is a pointer inside the deque pointing to the map. The map is a small continuous space, each of which is called a node node, and each node is a pointer, pointing to another larger continuous space called a buffer, = = This is the area where data is actually stored in the deque, and the default size is 512bytes.

insert image description here
The data structure of the deque iterator is as follows

struct __deque_iterator
{
    
    
    ...
    T* cur;//迭代器所指缓冲区当前的元素
    T* first;//迭代器所指缓冲区第一个元素
    T* last;//迭代器所指缓冲区最后一个元素
    map_pointer node;//指向map中的node
    ...
}

insert image description here

zero copy

Typically, we need to copy data from one application's memory space to the operating system's kernel space, and then copy data from the kernel space to another application's memory space. This copying process is expensive because each copy consumes time and memory resources.

Zero-copy technology can be used in the following ways

  1. Use memory-mapped file (mmap) technology to map files into memory, avoiding the process of file copying.
  2. Using packet direct memory access (DMA) technology, the data is directly transferred from the buffer of the network device to the memory, avoiding the process of data copying.
  3. Using shared memory (shared memory) technology, the data is shared among multiple applications, avoiding the process of data duplication.

In C++, STL also supports zero copy

  • The reserve() member function of std::vector can reserve a certain capacity, thus avoiding unnecessary expansion and data copying when inserting elements.
  • The substr() member function of std::string can return a pointer to a substring in the original string, avoiding the copying of the string.

Guess you like

Origin blog.csdn.net/qaaaaaaz/article/details/130655878