"STL source code analysis" reading notes

Reprinted: https://www.cnblogs.com/xiaoyi115/p/3721922.html

 

     Get straight to the point. Standard Template Library referred to as STL. STL can be divided into six parts: containers , iterators , allocators, adapters, algorithms , and functors .

      The ideas of iterators and generic programming are used almost to the extreme here. Template or generic programming actually means that the specific type is not specified when the algorithm is implemented, but the type is specified when the algorithm is invoked to perform specialization. In STL, iterators ensure the continuity and compactness of algorithms in STL, so that each algorithm does not need to consider specific data structures.

    The essence of the  functor is to overload operators such as operator (), <, > in the structure . Functors are actually more of a conceptual and ideological thing than a physical enhancement.

     Containers and algorithms are the most core components in STL, and all work (such as iterators, space configurators, etc.) is based on them. The following outlines the main contents of containers and algorithms.

     container:

     1、vector

     Vector is actually a sequential container. Of course, the order here does not mean sorting, but means that operations are performed in a certain order. In addition, vector and array array are very similar, the biggest difference is that array is static, while vector is dynamic, which is why we use vector most of the time, because we often don't know how large the array is. In order to waste, it can't be saved if it is opened small, choosing vector is a better choice. In fact, a vector is essentially an array, but it is a changing array, so how does a fixed-size array become a vector? Let's take a look at its data structure and maybe it will be clearer.

copy code
1 template<class T,class Alloc=alloc>
 2 class vector{
 3 public:
 4     typedef T  value_type;
 5     typedef value_type* point;
 6     typedef value_type* iterator;
 7     typedef value_type& reference;
 8     typedef size_t size_type;
 9 ...
10 protected:
11 iterator start;//The head of the currently used space
12 iterator finish;//The end of the currently used space
13 iterator end_of_storage;//The end of the space can be used
14 ...
15 };
copy code

      We notice start, finish, end_of_storage in the vector.

     Well, with these three variables, everything has an answer. For example, we initialize the size of the vector to 10, then end_of_storage=start+10. We can make it look dynamic by maintaining the finish variable. For example, if we insert 5 values, then we can get size() to be 5 by calculating the difference between finish and start. We can also continue to insert, but at this time we need to move the finish one bit backwards to be OK. However, the ensuing question comes again, what to do when finish is equal to end_of_storage? Obviously, if you want to continue to insert, you can't operate.

     Don't panic, please don't doubt that programmers are the type with the highest IQ. We will create a new array, and this time the application space will be larger than the previous one, isn't it OK? At the same time, copy the old array to the newarray, and delete the old array at the same time. This ensures that you can continue to add. But sometimes the application for resources will be unsuccessful, so the insertion operation is not always OK. In the world of programmers, bugs are always everywhere. At this time, you need to return to the old array, and delete the newly applied array at the same time.

     We have solved the problem of how to make a fixed-size array into a dynamic vector by building a larger array. So how big is bigger? This larger is defined in STL as twice the current SIZE. This may be based on empirical data, and twice will be a better solution.

    The vector provides methods such as insert, erase, pushback and overloaded operators. In fact, insert is to add a value at the back. If the capacity of the vector is not enough, first expand the capacity and then insert.

    Erase, that is, the operation of erasing a value is actually to delete a value, and then move the following value forward. The efficiency of erase in vector is extremely low. If frequent erase operations are required, it is better to change a data structure.

    There are also examples such as begin(), end(), size(). These are actually obtained by maintaining start, finish, and end_of_storage.

    2、list

    In STL, list represents a circular doubly linked list (header). Compared with vector, which needs to maintain start finish end_of_storage, list only needs to maintain a node pointer node (head node).

    The initial process of a doubly circular linked list is as follows:

1 node=getNode();//Create a new node
2 node->next=node;//The next node is node
3 node->pre=node;//The previous node is node

   Other operations such as insertion, deletion, and value retrieval only need to understand the operation of the linked list, which is very easy.

    3, deque (double-ended queue, both sides can enter and exit)

    The deque is actually an array here , but it is just a segment . The so-called segment means here. The deque may have ten consecutive spaces for storing values ​​in the first group, and ten spaces for storing values ​​in the second group, but these two groups of spaces are not continuous, that is to say, the tail of the first group of spaces +1 is not equal to the second group. head of space. The organization and management of these two groups of spaces are formed by means of linked lists (called map in STL, (this map is more appropriate to translate into a map, not a map in the data structure)).

   4. stack (first-in, last-out data structure), queue (first-in, first-out data structure)

    Their underlying implementations are all lists, and they are just customized according to the List. For example, it's all one piece of clothing. When different people wear it, a little modification is made on the appearance, but the essence does not change.

   The above are all sequential containers. Of course, in the sequential containers, CNOOC priority queue, slist, etc., they are actually similar, and will not be elaborated.

  5、set、map、multiset、multimap

    The biggest difference between set and map is that set is a single value, while map is a key-value pair. Of course, set can also be understood as a special key-value pair with equal key-value pairs. In this way, we probably know that their underlying implementation is one thing. So what is the difference between multiset and set? The question of whether the main value can be repeated, if you can save multiple identical values, use multiset, otherwise use set. Because when inserting, they call insert and unique_insert respectively; for the same reason, map and multimap also mean this.

    map has sorting function, and can quickly insert and search, and even delete. How does it do it?

     It depends on the underlying implementation. Their bottom layers are all based on red-black trees , please see this blogger's article: The bottom layer implementation of map and set: red-black trees [multiple pictures, mobile phones be careful]

    The blog post has made it very clear, so I won't repeat it here.

  6、hashtable、hashset、hashmap、hash_multiset,hash_multimap

    In fact, it mainly talks about one thing, which is hashtable. Similarly, the problem of hashset and hashmap is whether it is a key-value pair. And mullti.. is a question of whether it is multi-valued.

    The most important function of hash is that insertion and search are quite fast. We use hash more because the time complexity of his search is almost O(1). So how does hash achieve such efficient search?

   We need to look at his data structure. In fact, the hash is an array, and the value is mapped to an array[i] of the array in a reasonable way. This is the core idea of ​​hash. To put it simply, we have a few problems to solve here:

    a. What is an efficient mapping?  

    b. What if two different elements map to the same array[i]?

     c. If the amount of data is too small and the array is too large, it will cause waste; if the amount of data is too large and the array is too small, the search efficiency will be very low, how to solve it?

     Let's look at them one by one. First, we will go to a. Here the mapping relationship needs to use a suitable hash function. The experiment shows that prime numbers are a very good choice, such as (value)mod (a prime number).

     b. The problem mentioned is actually the problem of conflict handling. The conflict generally has the following methods, linear detection: after the conflict, the element is placed in the next position, and the next element also continues behind. Secondary detection: that is, after a hash function collides, the next hash function is used; open-chain method: it is represented by a linked list under the array[i]. Conflicts are placed in the linked list. The open-chain method is chosen in STL.

     For problem c, the solution in STL is to select a prime number K as N, when K is less than the number of elements in the hash, replace N with a larger prime number, and reconstruct the hash for each element before.

     Other algorithms :

     Other algorithms such as find, accumulate... are actually maintaining iterators, generally fistIterator, lastIterator, operatorValue. This kind of architecture.

     From start to finish, iterators and generics are present in almost every STL container or algorithm, and that's the essence!

  

    space configurator

     The space configurator mentioned in STL here refers more to the application and release of memory. In STL, there are generally two levels of space configurators, of which the first level configurator is more like an application in our traditional sense; while the second level configurator is a pool, generally called a memory pool.

       Here is a brief overview of the space configurator. The strategy of the space configurator to manage the application and release of storage resources is to call the first-level configurator and use the Apply and release by malloc and free; when it is less than 126BYTES, the second-level configurator is called. The first-level configurator is actually maintained by a linked list, but a suitable size of memory is found in freeList, which is directly used by the applicant , if it is not enough, you can only use part, or even apply for part of the heap, that is, apply for malloc. It is added to freeList when it is released.

       One of the advantages of using a secondary configurator is that it can reduce fragmentation to a large extent, improve internal memory usage, and reduce resource waste.

 

     References: STL source code analysis

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326521200&siteId=291194637