The vector class in C++ [detailed analysis and simulation implementation]

vector class

image-20230320121558193

1. Introduction and use of vector

1. Document introduction of vector

vector is a sequence container representing a variable size array

②Just like an array, vector also uses continuous storage space to store elements. That means that the elements of the vector can be accessed using subscripts, which is as efficient as an array. But unlike an array, its size can be changed dynamically, and its size will be automatically handled by the container.

③ Essentially, vector uses a dynamically allocated array to store its elements. When new elements are inserted, the array needs to be resized, in order to increase storage space. It does this by allocating a new array and then moving all elements into this array. In terms of time, this is a relatively expensive task, because the vector does not resize each time a new element is added to the container.

④Vector allocation space strategy: vector will allocate some additional space to accommodate possible growth, because the storage space is larger than the actual storage space required. Different libraries employ different strategies for weighing space usage and reallocation. But in any case, the reallocation should be of logarithmically growing interval size, so that inserting an element at the end is done in constant time complexity.

⑤ Therefore, vector takes up more storage space, in order to obtain the ability to manage storage space, and dynamically grow in an effective way .

⑥ Compared with other dynamic sequence containers (deque, list and forward_list), vector is more efficient when accessing elements, and adding and deleting elements at the end is relatively efficient. For other deletion and insertion operations that are not at the end, it is less efficient. Better than list and forward_list unified iterators and references.

2. The structure of vector

constructor declaration Interface Description
vector() (emphasis) No parameter construction
vector (size_type n, const value_type& val = value_type()) Construct and initialize n vals
vector (const vector& x); (emphasis) copy construction
vector (InputIterator first, InputIterator last); Initialize construction using iterators

Due to the existence of templates, explicit instantiation is required at construction time , for example:

vector<int> v1;

Since most of the interface names and usages of the vector class and the string class are very similar, we can quickly get started with the vector class:

void test_vector1()
{
    
    
	vector<int> v;
	v.push_back(1);
	v.push_back(2);
	v.push_back(3);
	v.push_back(4);

	//for循环变流
	for (size_t i = 0; i < v.size(); i++)
	{
    
    
		cout << v[i] << " ";
	}
	cout << endl;

	//迭代器
	vector<int>::iterator it = v.begin();
	while (it != v.end())
	{
    
    
		cout << *it << " ";
		it++;
	}
	cout << endl;

	//范围for
	for (auto e : v)
	{
    
    
		cout << e << " ";
	}
	cout << endl;
}

image-20230320123432645


Therefore, we mainly learn the different and unique interface properties of vector and string classes

3. About expansion

The following is a small program that can visually see the capacity change:

void TestVectorExpand()
{
    
    
	size_t sz;
	vector<int> v;
	sz = v.capacity();
	cout << "making v grow:\n";
	for (int i = 0; i < 100; ++i)
	{
    
    
		v.push_back(i);
        //判断容量是否改变
		if (sz != v.capacity())
		{
    
    
			sz = v.capacity();
			cout << "capacity changed: " << sz << '\n';
		}
	}
}

image-20230320140222326

Therefore, under the Windows system, the expansion speed is about 1.5 times by default ; while under the Linux environment, the expansion speed is generally 2 times

Generally, it is more appropriate to expand the capacity by 2 times each time:

Each time the expansion is less, it will lead to frequent expansion

Every time the capacity is expanded too much, it can’t be used up, and there is waste

Similarly, since multiple expansions have a large overhead on resources, we can use reserve to reserve space:

image-20230320141342490

4. Does the interface provide const?

in conclusion:

1. Read-only interface function, providing const size();

2. Only some interface functions provide non-const push_back();

3. Readable and writable interface functions, providing const+non-const operator[]

For example, the **size() of the read-only function is already readable and writable operator[]**, and its corresponding function interface is as follows:

image-20230320153309070

image-20230320153339908


Like **operator[] , at also has the same function and two (const, non-const) interfaces, the difference is: at is implemented by a function, and operator[]** has an out-of-bounds assertion check;

vector<int> v;
v.reserve(10);
v[0]=1;
v.at(0)=1;

5、assign

image-20230320162138606

① Assignment:

//测试用例:
void test_vector()
{
    
    
	vector<int> v;
	v.push_back(1);
	v.push_back(2);
	v.push_back(3);
	v.push_back(4);

	for (auto e : v)
	{
    
    
		cout << e << " ";
	}
	cout << endl;
	//将它赋值为10个1
	v.assign(10, 1);

	for (auto e : v)
	{
    
    
		cout << e << " ";
	}
	cout << endl;
}

The function is to assign n positions to val:

image-20230320162518906

②Iterators are supported and templates are provided

image-20230320162541666

//测试用例:
void test_vector()
{
    
    
	//迭代器
	vector<int> v;
	v.push_back(10);
	v.push_back(20);
	v.push_back(30);

	v.assign(v.begin(), v.end());
	for (auto e : v)
	{
    
    
		cout << e << " ";
	}
	cout << endl;

	string str("hello world");
    //模板的作用
	v.assign(str.begin(), str.end());
    //整形提升:将char型转换为int型(因此这里输出的是字符的ASCII码)
	for (auto e : v)
	{
    
    
		cout << e << " ";
	}
	cout << endl;
}

image-20230320163316537

It should be noted that the iteration interval end() is the next position of the last element, which is a left-closed and right-open interval

6、insert

The insert interface of the string class is mainly implemented in the form of subscripts, while the vector class provides us with the iterator method

image-20230320163521251

void test_vector()
{
    
    
	vector<int> v;
	v.push_back(1);
	v.push_back(1);
	v.push_back(1);

	//头插
	v.insert(v.begin(), 2);
	v.insert(v.begin(), 3);

	//尾插
	v.insert(v.end(), 20);
	v.insert(v.end(), 30);

	//中间插入(将其理解为指针的用法)
	v.insert(v.begin() + 2, 10);

	for (auto e : v)
	{
    
    
		cout << e << " ";
	}
	cout << endl;
}

image-20230320193127925

For insert , based on the learning experience of the string class, we can use find + insert to insert at a specified position, and the vector class does not provide us with the find interface. The essence is that find exists in the class template algorithm—reuse, Make all containers available :

image-20230320193401844

Examples are as follows:

vector<int>::iterator it = find(v.begin(), v.end(), 1);
if (it != v.end())
{
    
    
	v.insert(it, 30);
}

Why doesn't the string class directly use find in the algorithm library?

The string class not only looks for an object like other containers, the string class also needs to provide an interface for finding substrings , and the return method is a subscript instead of an iterator, so the string class provides us with a separate find interface

7、shrink_to_fit

image-20230320194436313

As the name suggests, the function of this function is to reduce the capacity

For reserve, it will not shrink, and resize only changes size and should not change capacity (design concept: no shrinking of fixed space); from this we can see that for memory management, segment release is not allowed, only Can be released as a whole

Therefore, the way of shrinking capacity can only be done by copying the past in different places to shrink the capacity in different places

Therefore, for shrink_to_fit , we generally do not use it (remote shrinking consumes a lot of resources)

作用:Requests the container to reduce its capacity to fit its size

v.shrink_to_fit();
cout << v.size() << endl;
cout << v.capacity() << endl;

image-20230320200739762

2. Vector depth analysis and simulation implementation

We take the source code of the vector class and get the following statement:

template<class T>
class vector
{
    
    
public:
	typedef T value_type;
	typedef value_type* iterator;
	typedef const value_type* const_iterator;
	//...

protected:
	iterator start;
	iterator finish;
	iterator end_of_storage;
};

And through our study of the analog implementation of the string class, we may implement the analog implementation of the vector class through the following declarative methods:

T* a;
size_t size;
size_t capacity;

The following figure can correspond to the relationship between the two, so that we can understand its meaning:

image-20230320202414526

For the string class, we first need to build the basic framework of the string class:

//模板,代表各种类型
template<class T>
class vector
{
    
    
public:
	typedef T* iterator;
	typedef const T* const_iterator;

	//迭代器(const和非const)
	iterator begin();
    
	iterator end();
		
	const_iterator begin(); const
		
	const_iterator end(); const
		
	//operator[](const和非const)
	T& operator[](size_t pos);

	const T& operator[](size_t pos); const

	//构造函数
	vector();
    
	//迭代器区间构造
	template <class InputIterator>
	vector(InputIterator first, InputIterator last);

	//拷贝构造v2(v1)
	vector(const vector<T>& v);
			
	//用n个val值对其进行构造
	vector(size_t n, const T& val = T());

	//operator=赋值   v1=v2(v1是v2的拷贝)
	vector<T>& operator=(vector<T> v);
		
	//析构函数
	~vector();
		
	//容量调整
	void reserve(size_t n);

	//元素个数调整resize
	void resize(size_t n, T val = T());        

	//获取元素个数size
	size_t size(); const
		
	//获取容量大小capacity
	size_t capacity(); const
		
    //判断是否为空
	bool empty();
		
	//尾插
	void push_back(const T& x);

	//尾删
	void pop_back();
    
	//删除指定位置
	iterator erase(iterator pos);

	//交换
	void swap(vector<T>& v);
		
private:
	iterator _start;
	iterator _finish;
	iterator _endofstorage;
};

For this simulation implementation class, the specific meanings of its member variables are as follows: (You can refer to the above figure to understand its meaning)

iterator _start;: an iterator pointing to the position of the first element of the vector

iterator _finish;: an iterator pointing to the position after the last element of the vector

iterator _endofstorage;: an iterator pointing to the position past the maximum element the vector can hold

In the above code, two typedef declarations are used:

typedef T* iterator;
typedef const T* const_iterator;

The effect of these two declarations is to define two iterator types for the vector type: iterator and const_iterator. Among them, the iterator type can be used to modify the value of the elements in the vector, while the const_iterator type can only be used to access the values ​​of the elements in the vector, but they cannot be modified. Therefore, const_iterator is often used to traverse the vector, and iterator is often used to modify the elements in the vector

It should be noted that since both iterator and const_iterator are pointer types, they can use pointer arithmetic to access the elements in the vector. For example, ++iterator can move the iterator to the next element in the vector, and ++const_iterator is similar. These iterator operations make the use of vector very convenient, and are consistent with iterator operations of other container classes, further improving the readability and maintainability of the code

1. Iterator

Since the bottom layer of the vector class is still implemented based on arrays, the iterator types of the vector class are all pointer types. We can provide the begin() and end() interfaces for const objects and non-const objects respectively:

//迭代器(const和非const)
iterator begin()
{
    
    
	return _start;
}
iterator end()
{
    
    
	return _finish;
}
const_iterator begin() const
{
    
    
	return _start;
}
const_iterator end() const
{
    
    
	return _finish;
}

2. [] Overload

We can easily provide the vector class with an array-like [] operator so that it can read and write through this. For const objects, we need its return value to be const as well:

T& operator[](size_t pos)
{
    
    
	assert(pos < size());
	//_start代表数组名,可直接访问元素
	return _start[pos];
}

const T& operator[](size_t pos) const
{
    
    
	assert(pos < size());
	//_start代表数组名,可直接访问元素
	return _start[pos];
}

3. Constructor

3.1 Default construction

The default constructor can only use the initialization list

vector()
	:_start(nullptr)
    ,_finish(nullptr)
    ,_endofstorage(nullptr)
    {
    
    }

3.2 Copy construction

The so-called copy construction, we still refer to deep copy, so it is inseparable from the process of opening space:

vector(const vector<T>& v)
    :_start(nullptr)
    ,_finish(nullptr)
    ,_endofstorage(nullptr)
    {
    
    
        reserve(v.capacity());
        //将被拷贝对象v的元素插入开辟的空间中
        for(const auto& e:v)
        {
    
    
            push_back(e);
        }
    }

reserve(v.capacity());Note: The meaning of and used here push_back(e);is very clear, but we have not introduced it in the simulation implementation. These two interfaces will be mentioned later, and they are used in advance for the convenience of writing.

3.3 Iterator Interval Construction

The construction of the vector class in the official library also involves the construction of the iterator range , as follows:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-g48ejtgV-1679812103496)(E:\Typora pictures\image-20230326003111641.png)]

The so-called iterator interval construction means that the constructor of the vector class can accept an iterator interval as a parameter, and template parameters are used here:

template <class InputIterator>

The reason is that it hopes to accept any type of iterator as a parameter, so as to support different types of containers. Specifically, this constructor accepts a pair of iterators that point to a range of elements. In C++, an iterator is a type used to access elements in a container, and different types of containers (such as vector, list, set, etc.) may use different types of iterators. Using template parameters can make this constructor suitable for different types of containers, as long as they provide the definition of iterators, as follows:

emplate <class InputIterator>
vector(InputIterator first, InputIterator last)
	:_start(nullptr)
	, _finish(nullptr)
	, _endofstorage(nullptr)
{
    
    
	while (first != last)
	{
    
    
		push_back(*first);
		first++;
	}
}

With the iterator interval construction, we can use it to construct a tmp temporary object to implement the modern way of copy construction, just like we used its direct construction to create a tmp temporary variable for copying when we completed the analog implementation of the string class structure:

Similarly, we still need to provide the swap function to realize our exchange of member variables:

//交换
void swap(vector<T>& v)
{
    
    
	std::swap(_start, v._start);
	std::swap(_finish, v._finish);
	std::swap(_endofstorage, v._endofstorage);
}
//拷贝构造:现代写法
vector(const vector<T>& v)
	:_start(nullptr)
    ,_finish(nullptr)
    ,_endofstorage(nullptr)
    {
    
    
        //构造一个对象然后交换
		//这里我们用到了迭代器构造区间构造
        vector<T> tmp(v.begin(),v.end());
        swap(tmp);
    }

3.4 val value construction

The construction of the vector class in the official library also involves constructing it with n val values , as follows:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-QRKiSUFv-1679812103496)(E:\Typora pictures\image-20230326004633828.png)]

vector (size_type n, const value_type& val = value_type());

size_type nRepresents the size of the constructed vector, that is, the number of elements in it; const value_type& val = value_type()represents the initial value of each element of the constructed vector. The default value is value_type(), which is the value generated by the default constructor of type T, and when we write, we can provide an anonymous object as its default parameter value

vector(size_t n, const T& val = T())
	:_start(nullptr)
	, _finish(nullptr)
	, _endofstorage(nullptr)
{
    
    
	reserve(n);
	for (size_t i = 0; i < n; i++)
	{
    
    
		push_back(val);
	}
}

3.5 Destructors

~vector()
{
    
    
	delete[] _start;
	_start = _finish = _endofstorage = nullptr;
}

4. Assignment overloading


With string class simulation implementation experience, we know that here we can use copy to construct a tmp temporary variable and then exchange it with the assigned object. At the same time, we provide a more optimized solution on this basis: pass parameter copy, so that we can even connect tmp Objects do not need to be opened:

vector<T>& operator=(vector<T> v)
{
    
    
	//这里用到了拷贝:仍然是构造拷贝了一个对象然后交换
	swap(v);
	return *this;
}

5. Capacity adjustment

5.1 reserve

Reserve only expands capacity and does not shrink capacity, and capacity expansion can only be done in different places, so it is inseparable from the development of new space

void reserve(size_t n)
{
    
    
    if(n>capacity())
    {
    
    
        size_t oldSize=size();
        T* tmp=new T[n];
        if(_start!=nullptr)
        {
    
    
            for(int i=0;i<size();i++)
            {
    
    
                tmp[i]=_start[i];
            }
            delete[] _start;
        }
        _start=tmp;
        _finish=tmp+oldSize;
        _endofstorage=_start+n;
    }
}

In this program, oldSize is defined to record the size of the current vector before reallocating memory ; when the capacity of the vector needs to be increased to n, the program will create a new array of size n (tmp), and then copy the existing elements in the vector to the new array. During copying, store the value of size() in the oldSize variable.

The reason for this is that the program needs to use this value to update the _finish pointer afterwards to ensure that the number of elements in the vector will not be affected by the new capacity

5.2 resize

Resize is divided into three cases, which have been introduced in detail in the simulation implementation in the string class. We only need to follow this idea to achieve:

void resize(size_t n,T val=T())
{
    
    
    //扩容+添加数据
    if(n>capacity())
    {
    
    
        reserve(n);
    }
    //添加数据
    if(n>size())
    {
    
    
        while(_finish<_start+n)
        {
    
    
            *_finish=val;
            _finish++;
        }
    }
    //删除数据
    else
    {
    
    
        _finish=_start+n;
    }
}

In a function resize, a formal parameter valis an Targument of type that specifies the default value for the new element. The default value of this parameter T()is to use Tthe default constructor to initialize new elements. If no valparameters are specified, the default constructor is used by default to initialize new elements

Why use anonymous objects?

① Ensure the correctness of the default value

One of the benefits of using an anonymous object as a function parameter is to ensure that the default value of the parameter is correct. In this example, using T()to initialize valthe parameter ensures valthat the default value Tis the same as the default value produced by the default constructor. If no valparameters are provided when the function is called, the anonymous object will be used T()as the default parameter, thus ensuring that the new element is properly initialized.

② Avoid unnecessary constructor calls

Another benefit is that using anonymous objects as function parameters avoids unnecessary constructor calls. In this example, if an ordinary parameter is used to represent the default value of the new element, a new object must be constructed as the value of the parameter when the function is called. However, since we only need a default value, not an actual object, using an anonymous object avoids creating unnecessary objects and calling related constructors.

In short, using anonymous objects as function parameters can not only ensure the default value of the parameter is correct, but also avoid unnecessary constructor calls

6. Size and capacity

Since the member variables of the simulation implementation class are given by start and finish, we need an appropriate way to provide us with size and capacity:

//获取元素个数size
size_t size() const
{
    
    
	return _finish - _start;
}
//获取容量大小capacity
size_t capacity() const
{
    
    
	return _endofstorage - _start;
}

7. empty and clear

//判断是否为空
bool empty()
{
    
    
	return _finish == _start;
}
//清除
void clear()
{
    
    
	//不用更改空间
	//这里不能定为空指针:内存泄漏(找不到这片空间了)
	_finish = _start;
}

8. Add, delete, check and modify

8.1 Rear plug

Just check the expansion problem when inserting:

void push_back(const T& x)
{
    
    
    if(_finish==_endofstorage)
    {
    
    
       	size_t newCapacity=capacity()==0?4:capacity()*2;
        reserve(newCapacity);
    }
    *_finish=x;
    _finish++;
}

8.2 Tail deletion

Just check if it is empty:

void pop_back()
{
    
    
    assert(!empty());
    _finish--;
}

8.3 Insert at specified position

Inserting at a specified location can be understood as how to insert in the middle of the time. ①First, we need to check the validity of the insertion position. ②Secondly, we need to consider the expansion problem.

It should be noted that since the expansion is done in different places, and we give the position of pos in the form of an iterator (pointer), after the expansion in different places, pos does not point to the original position, and the position of pos needs to be updated

✍The above behaviors are the key issues of this category: the problem of iterator failure, which will be introduced in detail later

void insert(iterators pos,const T& val)
{
    
    
    //检查插入位置的合法性
    assert(pos<_finish);
    assert(pos>=_start);
    
    //检查空间大小
    if(_finish==_endofstorage)
    {
    
    
        size_t len=pos-_start;
        size_t newCapacity=capacity()==0?4:capacity()*2;
        reserve(newCapacity);
        //更新pos位置
        pos=_start+len;
    }
    
    //挪动数据
    iterator end=_finish-1;
    while(end>=pos)
    {
    
    
        *(end+1)=*end;
        end--;
    }
    *pos=val;
    _finish++;
}

8.4 Delete the specified location

In the same way, first we need to check the legitimacy of the deleted location, and then move and delete

It should be noted that the implementation of the erase function also needs to pay attention to the issue of iterator invalidation. When an element is deleted, the positions of other elements may change, which means that the iterator pos pointing to the element before will be invalid; in order to solve this problem , the vector's erase function usually returns an iterator pointing to the successor of the deleted element. You can use this iterator to update your iterators and keep them valid

//迭代器更新过程:
//更新之前指向下标为i的元素的迭代器
it = v.begin() + i;

//使用返回的迭代器来更新之前的迭代器
it = v.erase(it);

//现在it指向被删除元素的后续元素
iterator erase(iterator pos)
{
    
    
	assert(pos >= _start);
	assert(pos < _finish);
	//挪动删除
	iterator begin = pos;
	while (begin < _finish - 1)
	{
    
    
		*(begin) = *(begin + 1);
		begin++;
	}
	_finish--;
	return pos;
}

Delete the i-th (subscript i) element:

iterator it = v.begin() + i;
v.erase(it);

3. Iterator invalidation problem

In vector, when it is added or deleted, the iterator may become invalid, because the vector will dynamically reallocate the storage space, causing the position of the original element in memory to change

The following operations may invalidate iterators:

  1. Inserting elements into the vector: If the capacity of the vector is full when inserting elements, the memory space needs to be reallocated, and the original elements are copied to the new memory space, and the iterator may become invalid at this time.
  2. Delete elements in vector: When deleting an element, the vector will move all elements after the deleted element to fill the hole left by the deleted element, and the iterator may become invalid at this time.

When dealing with iterator invalidation in vector, you can use methods or techniques provided by vector to avoid it, such as:

  1. Use subscript operations instead of iterator operations: Subscript operations will not invalidate iterators.
  2. If you need to change the size of the vector, you can use the reserve() function to reserve enough space first, so as to avoid frequent reallocation of memory space when adding elements to the vector.
  3. Use the iterator returned by the erase() function to update the original iterator to ensure the validity of the iterator.

In short, when it comes to the addition and deletion operations of vector, you need to pay attention to the problem that the iterator may be invalid, and use the methods or techniques provided by vector reasonably to avoid iterator invalidation

Guess you like

Origin blog.csdn.net/kevvviinn/article/details/129779141