On the analysis of STL source code-vector container

First of all

Vector is our most commonly used container in STL, and we are fully aware of its various operations. However, when we use vector, we always have a very virtual feeling, because we don't know how the internal interface is implemented. It looks like a black box in our eyes, dangerous and charming.

In order to break this concern, I will then take you to the bottom of the vector, thoroughly understand the internal implementation details of the vector interface, and open this black box. In this way, we won't panic when using vector, so we can really understand it.

Overview of vector underlying principles

Vector is a dynamic space. As elements increase, its internal mechanism will expand the space by itself to accommodate new elements.

When the vector size is dynamically increased, it is not to continue the new space after the original space (because there is no guarantee that there is still space for configuration after the original space), but to configure a larger space twice the original size, and then add the content Copy it over, and then start to construct new elements after the original content and release the original space,

Key source code understanding

1. Internal type of iterator

Let's take a look at how iterators are defined in the STL source code.

template <class T, class Alloc = alloc>
class vector {
    
    
public:
    // vector 的嵌套型别定义
    typedef T value_type;
    typedef value_type* iterator;    // 迭代器本身是一个模板类的对象
    typedef value_type& reference;
    ...
};

As shown in the code above, the iterator itself is a class type, and the operator * is overloaded. The iterator points to the internal elements of the vector, which can be understood as the iterator and the internal elements of the vector are bound together. Its behavior is similar to a pointer, but it cannot be regarded as a pointer.

Soul torment 1: What is the difference between iterator and pointer?

We can understand that iterator is essentially an object produced by the template class, and its operators * and -> are implemented through operator overloading. This object points to the internal element of the vector (the element is the object of the iterator), so when the element pointed to by the iterator is deleted or moved, the iterator and the element are disconnected, and the iterator is useless, which is what we usually Said iterator is invalid. Iterators behave like pointers, but they are different.

In contrast to pointers, pointers and memory are linked. If the element stored in the memory address pointed to by the pointer is deleted or moved, the pointer will not be invalidated, and it will still point to the address.

According to the above definition, an iterator can be declared like this:

vector<int>::iterator ivite;
vector<Shape>::iterator svite;

With the above source code, we also know why the iterator should be declared like this.

2. vector data structure

Vector uses two iterators start and finish to indicate the range of used space, and uses the iterator end_of_storage to point to the end of the allocated space. code show as below:

template <class T, class Alloc = alloc>
class vector {
    
    
...
protected:
    iterator start;          // 表示目前使用空间的头
    iterator finish;         // 表示目前使用空间的尾，即最后一个元素的下一个元素
    iterator end_of_storage; // 表示目前分配的整个空间的尾
    ...
};

Using the above three iterators, we can encapsulate various member functions of vector.

template <class T, class Alloc = alloc>
class vector {
    
    
...
public:
  iterator begin() {
    
     return start; }
  iterator end() {
    
     return finish; }
  size_type size() const {
    
     return size_type(end() - begin()); }
  bool empty() const {
    
     return begin() == end(); }
  reference front() {
    
     return *begin(); }
  reference back() {
    
     return *(end() - 1); }
  reference operator[](size_type n) {
    
     return *(begin() + n); }// 运算符[]重载，能够使用迭代器来访问元素
};

Some of the basic operations above are already clear at a glance, so I won't describe them all here. There are only two points to mention here. First, you can see from the above code that operatorThe operator [] is overloaded so that the iterator can traverse the vector like an array index.

Second, the iterator finish points to the next element of the last element of the vector, as does the encapsulated end() function. This is what we often call the vector's front-to-close and back-to-open feature.

Soul torture 2: Why should the container be designed to be closed before opening and then opening?

This is done to reduce the judgment conditions when traversing the container elements. Because the core of STL is generic programming, the designed interface is universal. Since only some containers support the> and <operator overloading, and != is supported by all containers, the != overload operator is preferred when traversing elements.

If end() points to the next element after the last element of the container, the traversal operation only needs to be written as:

vector<int> vec;
auto it = vec.begin();
while (it != vec.end()) {
    
    
    ... 
    ++it;
}

But if end() points to the last element, the above code will traverse one element less, which requires additional judgment conditions in the while loop, and this judgment condition may need to be modified depending on the container, and the above code is in Any sequential container can be called in this way, reducing a lot of redundant work.

3. element operation of vector

The constructor of vector The constructor of
vector has many forms. The following is an excerpt from the source code:

// 构造函数，允许指定 vector 大小和初值
vector() : start(0), finish(0), end_of_storage(0) {
    
    }
vector(size_type n, const T& value) {
    
     fill_initialize(n, value); }
explicit vector(size_type n) {
    
     fill_initialize(n, T()); }

Respectively correspond to the following initialization:

vector<int> vec;
vector<int> vec(2,3);
vector<int> vec(2);

push_back() 与 pop_back()

When we insert a new element into the end of the vector with push_back(), the function first checks whether there is spare space, and if there is, it constructs the element directly on the spare space and adjusts the iterator finish. If there is no spare space, expand the space (reconfigure, move data, release the original space).

void push_back(const T& x) {
    
    
  if (finish != end_of_storage) {
    
     // 还有备用空间
    construct(finish, x);
    ++finish;
  }
  else                            // 已无备用空间
    insert_aux(end(), x);         // 插入函数
}

The insert function prototype is:

void insert_aux(iterator position, const T& x);

This function is relatively long, the specific idea: when there is spare space, an element is constructed at the beginning of the spare space, and the iterator finish is incremented by one; in the case of no spare space, twice the original memory space is reconfigured to replace the original The content of the vector is copied to the new vector, and the original space is released.

Note: The insert function is to insert the element to the corresponding position, the original position and the following elements are moved one bit backward.

The pop_back() operation to delete the tail elements of the vector is easier.

void pop_back() {
    
    
  --finish;
  destroy(finish);
}

Move the tail iterator finish forward one bit directly, and then release it. Since the tail iterator finish points to the next bit of the last element, it is exactly the original last element after subtracting one.

erase() and clear()
erase() means to delete a certain element of a vector or all elements in a certain range.

// 删除 vector 的某一个位置的元素
iterator erase(iterator position) {
    
    
  if (position + 1 != end())
    copy(position + 1, finish, position);
  --finish;
  destroy(finish);
  return position;
}
// 删除 vector 的某一个区间的元素
iterator erase(iterator first, iterator last) {
    
    
  iterator i = copy(last, finish, first);
  destroy(i, finish);
  finish = finish - (last - first);
  return first;
}

If the erase() function is not used carefully, the iterator may fail.

Soul torture 3: Under what circumstances will the iterator of the erase() function fail?

Usually we write such a code iterator will fail.

for(auto it = vec.begin();it != vec.end();++it) {
    
    
    if(/* 删除某元素的判断条件 */) {
    
    
        vec.erase(it);
    }
}

From the soul torture one, it can be known that after the element is deleted, since the data behind the deleted element will move, the subsequent iterators will become invalid. Therefore, after the above code deletes an iterator, the following ++it traversal has lost its meaning and will not get the correct result.

How should it be changed? According to the source code of deleting the element at a certain position of the vector, erase() returns an iterator, which is actually the iterator bound to the next element of the deleted element, and this iterator is data movement After the new valid iterator. It can also be said that the iterator has been updated.

The correct writing is:

for(auto it = vec.begin();it != vec.end();) {
    
    
    if(/* 删除某元素的判断条件 */) {
    
    
        it = vec.erase(it);  // 更新了迭代器
    }
    else {
    
    
        ++it;
    }
}

clear() means to clear all elements on the vector.

void clear() {
    
     erase(begin(), end()); }

At last

Just as Mr. Hou Jie said "before the source code, there is no secret", after reading the implementation source code of the vector container, I have a deeper understanding of the underlying implementation mechanism. I used to be panicked when I was shrouded in fog, but now I slowly started to lift the cloud and fog and saw the scenery I couldn't see before.
Insert picture description here
The above are some video resources I collected, which helped me a lot in this process. If you don't want to experience the feeling that you can't find the information during self-study, no one answers your questions, and insists on giving up after a few days, you can join our deduction group [313782132], which has various software testing resources and technical discussions.