[STL] Simulation implementation of vector

Table of contents

foreword

Structure analysis

constructor destructor

structure

default construction

initialized to n vals 

constructed from an iterator range

copy construction

destruct

operator overloading

assignment overloading

subscript access

iterator

const iterator

capacity operation

View size and capacity

capacity modification

data modification

Tail plug and tail delete

insert and delete

insert

erase

empty

exchange

source code


foreword

From the beginning of vector, it is necessary to use class templates for generic programming, so that the container can store various types.

Since they are all containers that open up continuous space, the actual operation is similar to that of string. The main difficulty lies in the use of templates and the invalidation of iterators.

If you don't know much about vector, you may wish to read the previous article [STL] The use of vector , and then learn about the simulation implementation.

Structure analysis

The data structure used by vector is a linear continuous space. For the convenience of management, we use an iterator _start to point to the starting address of the current space, and then use two iterators _finish and _end_of_storage to point to the next space that is currently used by the space. end of bit and block space.

That is, when _finish is equal to _end_of_storage, it means that the current space is full, and if you want to insert it again, you need to expand the capacity.

namespace Alpaca
{
	template <class T>
	class vector
	{
    public:	
        typedef T* iterator;
		typedef const T* const_iterator;
    private:
		iterator _start = nullptr;    //起始位置
		iterator _finish = nullptr;   //数据结束位置
		iterator _end_of_storage = nullptr;  //内存空间结束位置
    };
}

constructor destructor

structure

The constructor of vector can be divided into three types.

  • default construction
  • initialized to n vals
  • Constructed from an iterator range

default construction

Since the default values ​​are set for the three variables when the member variables are defined, the default structure can be left blank. (doge

vector()
{}

initialized to n vals 

If you initialize with n values, you need to apply for memory space for the vector, and then add the values ​​into the vector.

We need to know that not only int type can be stored in vector but also string, double and even vector can be stored, so we don't know what value to use as default for val. Therefore, we might as well use an anonymous object to call the default constructor of the incoming type to construct a formal parameter , so as to achieve the default effect. The expansion and tail insertion functions will be explained below.

vector(size_t n, const T& val = T())
{
	reserve(n);
	for (size_t i = 0; i < n; i++)
	{
		push_back(val);
	}
}

However, didn't you say that the life cycle of anonymous objects is only in this line of life ? Obviously the type is a reference but it can still be used in the function below.

This is because when we use a const reference type to receive an anonymous object , it extends its lifetime to be the same as the const reference. We can use the following code to verify it.

class A
{
public:
	A()
	{
		cout << "A()" << endl;   //输出A()表示调用构造
	}
	~A()
	{
		cout << "~A()" << endl;  //输出~A()表示调用析构
	}
};

int main()
{
	A();
	const A& a3 = A();
}

 

It can be clearly seen that the first anonymous object automatically calls the destructor when the line ends , while the second anonymous object does not call the destructor until the end of the main function .

constructed from an iterator range

By passing in an iterator range, the vector is initialized with the content in the range. It is worth noting that the iterator of the vector cannot be used directly , but a template is used to make this function a function template. In this way, any iterator range can be initialized with this function .

template <class InputIterator> 
vector(InputIterator first, InputIterator last)
{
	reserve(last - first);   //申请空间
	while (first != last)
	{
		push_back(*first);
		++first;
	}
}

When this function is written, we will find that if we use the above constructor for initialization, an error will occur.

This is because the type of n in the function we wrote above is size_t, and usually we pass integers directly , and type conversion occurs in it . Although the new function we wrote was not written for it, the reality is that no type conversion occurs when calling this function .

Therefore, the function with less consumption is actually called , and thus the int type is dereferenced , and an error of illegal indirect addressing is reported.

The solution is also very simple, just overload a function where n is an int type .

vector(int n, const T& val = T())
{
	reserve(n);
	for (int i = 0; i < n; i++)
	{
		push_back(val);
	}
}

copy construction

Passing in a vector as a parameter is the copy construction of this class, you only need to apply for space in advance, and then copy it later.

vector(const vector<T>& v)
{
	_start = new T[v.capacity()];    //申请空间
	for (size_t i = 0; i < v.size(); i++)  //拷贝
	{
		_start[i] = v._start[i];
	}
	_finish = _start + v.size();    //更新边界值
	_end_of_storage = _start + v.capacity();
}

But you must not use memcpy to copy here. The built-in type is fine. Once the custom type is used as a template parameter, it will cause two destructors , which will cause the program to crash. Therefore, it is necessary to perform a deep copy through the method of assignment during construction .

destruct

The destructor function plays a role in dealing with the aftermath. First, the originally requested space is released , and then all the member variables are empty .

~vector()
{
	delete[] _start;
	_start = _finish = _end_of_storage = nullptr;
}

operator overloading

assignment overloading

Similar to the above copy construction, but the vector has already been opened at this time, so whether the current space is enough needs to be judged .

Then copy the data, and then update the boundary value to complete the copy.

vector& operator =(const vector<T>& v)
{
	if (capacity() < v.capacity())  //判断是否需要扩容
	{
		reserve(v.capacity());
	}
	for (int i = 0; i < v.size(); i++)  //拷贝数据
	{
		_start[i] = v._start[i];
	} 
	_finish = _start + v.size();  //更新边界值
	_end_of_storage = _start + v.capacity();
	return *this;
}

subscript access

In order to support the case of const, the subscript access is read-only and cannot be written , so it is necessary to write two overloads of the [ ] operator .

After the boundary is judged, the dereferenced content is returned. Different types of vector will call different functions.

T& operator [](size_t pos)
{
	assert(pos>=0 && pos < size());
	return _start[pos];
}

const T& operator[](size_t pos) const
{
	assert(pos>=0 && pos < size());
	return _start[pos];
}

iterator

Vector maintains a continuous linear space, so no matter what type of elements are stored in it, using raw pointers can satisfy the condition of being a vector iterator.

For example, operations such as ++, --, *, etc., pointers are inherently available. So we use raw pointers as vector iterators.

typedef T* iterator;
typedef const T* const_iterator;

iterator begin()
{
	return _start;
}

iterator end() 
{
	return _finish;
}

The selection of boundary values ​​has always been left open and right closed. In fact, begin and end are the positions pointed by _start and _finish, and the iterator can be returned directly.

const iterator

It can be seen that we not only define an ordinary iterator, but also define a const iterator. In order to maintain its characteristics, we need to implement a const modified begin and end function to return the const iterator to the user.

const_iterator begin() const
{
	return _start;
}

const_iterator end() const
{
	return _finish;
}

Now that we have iterator ranges, we can use range for freely.

capacity operation

Memory management is crucial to a container, so attention needs to be paid to implementation details.

View size and capacity

Since both _finish and _end_of_storage point to the next bit of the end position, they frame the range of left-closed and right-open with _start , so subtracting _start from _finish is the length of the range.

size_t size() const
{
	return _finish - _start;
}
size_t capacity() const
{
	return _end_of_storage - _start;
}

capacity modification

In the previous, we called the reserve interface every time we expanded the capacity, precisely because this function only expands the capacity without increasing the number of elements .

When implementing, we need to pay attention to:

  • The incoming value is only processed if it is greater than the capacity
  • Remote development to avoid data loss
  • Use deep copy when copying
void reserve(size_t n)
{
	if (n >= capacity())
	{
		iterator tmp = new T[n];  //异地开辟避免申请失败导致数据丢失
		int len = size();
		if (_start)
		{
			for (int i = 0; i < len; i++)  //进行深拷贝
			{
				tmp[i] = _start[i];
			}
			delete[] _start;
		} 
		_start = tmp;         //更新成员变量
		_finish = tmp + len;
		_end_of_storage = tmp + n;
	}
}

 Then let’s look at the resize interface again. There are three situations depending on the incoming value.

  • n is less than the number of elements
  • n is greater than the number of elements and less than the current capacity
  • n is greater than the number of elements and greater than the current capacity

Different operations need to be performed according to different situations. In the first case, you only need to directly modify the iterator pointing , while in the second and third cases, you need to judge whether to expand the capacity and then fill the incoming value into the container . Therefore, two or three types can be written together, and the filling operation is the same, only a special judgment is needed on whether expansion is required.

void resize(int n, T val = T())
{
	if (n < size())     //n小于元素个数时,减少元素个数至n
	{
		_finish = _start + n;
	}
	else
	{
		if (n > capacity())  //n大于元素个数且大于当前容量
		{
			reserve(n);      //扩容至n
		}
		while (_finish != _start + n)   //再进行值的拷贝
		{
			*_finish = val;
			++_finish;
		}
	}
}

data modification

Tail plug and tail delete

We have written the operation of tail insertion countless times. First, judge whether the capacity is enough for us to insert a piece of data. If it is enough, continue to run, otherwise expand the capacity .

The _finish pointer points to the position where the next element is inserted , so it is directly assigned, and finally _finish iterates.

void push_back(const T& x)
{
	if (_finish >= _end_of_storage)  //判断容量
	{
		reserve(capacity() == 0 ? 4 : capacity() * 2);
	}
	*_finish = x;   //赋值
	_finish++;      //迭代
}

void pop_back()
{
	assert(!empty());
	--_finish;
}

The tail deletion operation is even simpler. First, determine whether there are elements in the vector, otherwise it cannot be deleted. If it can be deleted, just  _finish -- can be.

The data in this space will be directly overwritten the next time the element is inserted again

insert and delete

insert

There are many kinds of overloads for insert, and here are two of them.

insert a value

Although it is inserted at any position, it is still necessary to judge whether the selected position is out of bounds , where pos can be equal to _finish, which is equivalent to tail insertion at this time .

We have all talked about insert, so we have to talk about the problem of iterator failure. In the previous article on the use of vector, we mentioned that after the expansion operation is involved in vector, space is opened in a different place, so the iterator at this time The space pointed to has been released , and the pointer in it will naturally become a wild pointer.

In order to solve this problem, we might as well record the relative position of pos before expansion , and use _start to add the relative position to find pos again after expansion .

iterator insert(iterator pos, const T& val)
{
	assert(pos >= _start);    //判断越界
	assert(pos <= _finish);
	if (size() == capacity())  //判断容量
	{
		size_t sz = pos - _start;    //解决扩容带来的迭代器失效
		reserve(capacity() == 0 ? 4 : capacity() * 2);
		pos = _start + sz;
	}
	iterator it = _finish - 1;  //移位
	while (it >= pos)
	{
		*(it + 1) = *it;
		it--;
	}
	*pos = val;   //赋值
	_finish++;    //迭代
	return pos;   //返回插入位置的迭代器
}

Then, as before, first shift and then assign a value at the insertion point, and finally iterate. Don’t forget to return the iterator of the insertion point , otherwise the iterator used externally will be invalid.

Insert an iterator range

Like the constructor that takes the iterator range as a parameter mentioned above, here we also need to write a template function , because you can't tell what iterator is passed in .

After that, we can reuse the insert written above to insert the data in the interval into the original array one by one.

template <class InputIterator>
void insert(iterator pos, InputIterator first, InputIterator last)
{
	for (; first != last; ++first) 
	{
		pos = insert(pos, *first);
		++pos;
	}
}

Let me mention here, because the use of insert is often accompanied by data movement , so it is not recommended to use it frequently. 

erase

Here I have implemented two kinds of erase, one is to delete a single position, and the other is to delete a range.

The first is to judge the boundary first, then move the data and define the boundary. Since the position of the next bit is exactly at the position of pos after the move, it directly returns to pos at the end.

iterator erase(iterator pos)
{
	assert(pos >= _start);  //判断边界
	assert(pos < _finish);
			
	iterator it = pos + 1;  //挪动数据
	while (it != _finish)
	{
		*(it - 1) = *it;
		it++;
	}
	_finish--;   //更新边界值
	return pos;  //返回迭代器
}

iterator erase(iterator first, iterator last)
{
	assert(first >= begin());  //判断边界
	assert(last <= end());
	while (last != _finish)   //last!=_finish时则代表后面还有数据,需要往前拷贝
	{
		*first = *last;
		++first;
		++last;
	} 
	_finish = first;  //此时first所在位置即更新后_finish的位置
	return _finish;
}

Needless to say, the second type of erase first judges the boundary. If there is still data after last, it needs to be copied forward.

Obviously, we can use a loop to solve this problem. When last is not equal to _finish, it means that there is data after last , otherwise just skip it. Since the deletion interval is also left-closed and right-opened, the element at the current position of last should also be retained . Then the value can be copied from last to the first position, and then both are added. The last position reached by first is the position of the new _finish .

empty

One of these two functions is clear and the other is empty. Clearing only needs to change the iterator so that _finish is the same as _start .

On the contrary, it is only necessary to judge whether _strat is the same as _finish to judge empty .

void clear()
{
	_finish = _start;
}

bool empty()
{
	return _start == _finish;
}

exchange

Just like the string at that time, the exchange cannot be just a simple shallow copy , nor can it be redundantly constructed and reassigned , only member variables need to be exchanged.

void swap(vector<T>& v)
{
	std::swap(_start, v._start);
	std::swap(_finish, v._finish);
	std::swap(_end_of_storage, v._end_of_storage);
}

Then we can use this swap function to simplify our assignment overload.

vector& operator =(vector<T> v)
{
    swap(v);
	return *this;
}

In essence, when passing parameters, since it is not passing by reference , the copy construction will be called to construct the formal parameters, and we can exchange the pointer of this formal parameter with the pointer of the current vector. After the function ends, the original space is automatically reclaimed when the formal parameters are destroyed

source code

If you want to see the source code, you can go here

source code


Well,  the simulation implementation of vector is over here today. If this article is useful to you, please leave your three consecutive comments.

Guess you like

Origin blog.csdn.net/Lin_Alpaca/article/details/130674840