Vector simulation implementation of C++STL

Article directory

Preface
Member variables
member function
- Constructor
- push_back
- pop_back
- insert
- erase
- destructor
- copy construction

Preface

Member variables

namespace but
{
    
    
	template<class T>
	class vector
	{
    
    
	public:
		typedef T* iterator;
	private:
		iterator _start;
		iterator _finish;
		iterator _end_of_storage;
	};
}

We previously implemented the sequence table by using pointers to arrays and the number and capacity of the arrays to maintain the sequence table. The implementation using three pointers here is actually very different.
Insert image description here

member function

Constructor

vector()
	:_start(nullptr)
	, _finish(nullptr)
	, _end_of_storage(nullptr)
{
    
    }

push_back

void push_back(const T& x)
{
    
    
	if (_finish == _end_of_storage)
	{
    
    
		reserve(capacity() == 0 ? 4 : capacity() * 2);//避免容量为0扩容还是0的情况
	}
	*_finish = x;
	++_finish;
}

How to put the data first?
Direct assignment.

Why can it be assigned directly?
Because the space is created by new, if it is a built-in type, it can be assigned directly with or without initialization.
However, custom types cannot be directly assigned without initialization.

Expansion

void reserve(size_t n)
{
    
    
	//避免缩容
	//1.缩容的代价太大
	//2.反复缩容与扩容，降低了性能。
	if (n > capacity())
	{
    
    
		size_t sz = size();
		T* tmp = new T[n];
		//如果旧空间没有数据就不用拷贝了
		if (_start)
		{
    
    
			//因为是类型不一定是字符串，所以得使用memcpy
			memcpy(tmp, _start, sizeof(T)*size());
			delete[] _start;
		}
		_start = tmp;
		//这里有个小坑
		//_finish=_start + size();
		
		//提前把size()记录下来，防止这里出错
		//改成tmp也可以，但是有点影响我们原本的理解，不利于维护。
		_finish = _start + sz;
		_end_of_storage = _start + n;
	}
}

Let's fill in some other code that needs to be used, and then test it.

	size_t capacity() const
	{
    
    
		return _end_of_storage - _start;
	}

	size_t size() const
	{
    
    
		return _finish - _start;
	}

Iterator

//这个普通迭代器实现起来也相当简单
iterator begin()
{
    
    
	return _start;
}

iterator end()
{
    
    
	return _finish;
}

pop_back

Deleting data seems to be very simple, just finish- -, but there are some problems, so what are the specific problems? Let's take a look.
Insert image description here
Let’s do a simple test.
When we keep popping, something goes wrong.

_finish keeps decreasing and goes to the front, so there will be a problem when we use the iterator.

Let's change it briefly.

void pop_back()
{
    
    
	assert(!empty());
	--_finish;
}
bool empty()
{
    
    
	return _start == _finish;
}

Access to const objects
Insert image description here
essentially involves amplification of permissions. We can just change all member functions to const.

resize()

void resize(size_t n, T val = T())//T()默认构造，是匿名对象，具体解释看下面
{
    
    
	if (n < size())
	{
    
    
		// 删除数据
		_finish = _start + n;
	}
	else
	{
    
    
		if (n > capacity())
			reserve(n);
			
		while (_finish != _start+n)
		{
    
    
			*_finish = val;
			++_finish;
		}
	}
}

Can the default value of resize above be set to 0?
In fact, the answer is obvious, obviously not, because T is a generic programming, the type is not necessarily int, if it is a double or pointer, object will not work.

Then the question comes again, does int have a default constructor?
When we studied classes and objects before, we knew that built-in types do not have constructors, but with templates they need to have them.

Insert image description here

insert

Insert image description here
What's wrong with the code below?

If the capacity is not enough, moving the data will cross the line.

Are there any other problems besides insufficient capacity?
As a reminder, pos==0; Well, in fact, the problem of the maximum value of the unsigned integer that occurred when simulating string implementation will not occur.

Insert image description here
If you look at it like this, something will go wrong when testing

. Note that func(v1) is read twice and
the program crashes when it is running.

Let's do a brief analysis first. This may be a memory problem or an array out of bounds.
Why is there no problem when pushing_back 5 times, but there is a problem after pushing_back 4 times?
What is the difference between 5 and 4?

There was a problem with expansion during insert.

Pay attention to
Insert image description here

what happened. After the expansion, start and finish changed. Why did start and finish change?
This is the most classic iterator failure problem we encounter.

pos becomes a wild pointer.
This also leads to a recurring problem.
Insert image description here
So how to solve this?
Update pos.

//void insert(iterator pos, const T& val)
iterator insert(iterator pos, const T& val)
{
    
    
	assert(pos >= _start);
	assert(pos <= _finish);

	if (_finish == _end_of_storage)
	{
    
    
		size_t len = pos - _start;
		reserve(capacity() == 0 ? 4 : capacity() * 2);

		// 扩容后更新pos，解决pos失效的问题
		pos = _start + len;
	}

	iterator end = _finish - 1;
	while (end >= pos)
	{
    
    
		*(end + 1) = *end;
		--end;
	}

	*pos = val;
	++_finish;

	return pos;
}

Next, continue to ask a question. Looking at the previous test picture, can the position of the iterator be modified after insert.

(*pos)++;//我想修改这个3的位置。
func(v1);

Insert image description here
The program did not report an error, but it obviously did not meet our expectations. Why?
Haven't you already updated the pos? Why is it still not working outside? Why doesn't it work.
Because the insert you wrote is a value-passing formal parameter, changes in the formal parameters will not affect changes in the actual parameters.

How to deal with it?

Can it be solved by passing parameters by reference? It looks good but is actually not good. When passing parameters by reference, an error occurs. Why?
Insert image description here

There are two situations in which insert causes iterator failure:
1. Wild pointer problem
2. The meaning has changed

Solve it by returning a value,
but it's best not to use it, because you don't know when it will fail.
After insert, we think pos is invalid and can no longer be used.

erase

Insert image description here
Now there is another question, will the POS become invalid after erasing?
No, butin the libraryIt failed, and VS reported a very strong error, see below.

An assertion error was reported.

Let’s take another look at the operation under g++.
Insert image description here

So does pos become invalid after erasing or not? Is vs more reasonable or g++ more reasonable?
If the pos position is 4, then this position is very unreasonable.

Therefore, we think it is invalid and should not be accessed. The behavioral results are undefined and related to the specific compiler.
You must pay attention, otherwise you will be cheated miserably.

To solve this situation, we all use the return value to deal with it. In fact, the essence is not to skip POS.

iterator erase(iterator pos)
{
    
    
	assert(pos >= _start);
	assert(pos < _finish);

	iterator start = pos + 1;
	while (start != _finish)
	{
    
    
		*(start - 1) = *start;
		++start;
	}

	--_finish;

	return pos;
}

In the following test, consecutive even numbers and the last even number can be solved, so there is no big problem.
Insert image description here

1. vs performs a mandatory check. After erasing, the iterator cannot be accessed.
2.g++ does not have mandatory checking, specific problems are analyzed on a case-by-case basis, and the results are undefined.

Does string have any iterator invalidation?
Yes, but string is not prone to iterator failure.
Insert and erase don't like to use iterators, they use subscripts.

destructor

~vector()
{
    
    
	delete[] _start;
	_start = _finish = _end_of_storage = nullptr;
}

constructor below.
Insert image description here

Let me show you a big pit.
First look at the code below. Are there any problems with writing it this way?
Insert image description here
The program crashed

Failure to initialize will cause various problems.
Once debugged it is easy to see what problems may occur.

plus initialization list

vector(size_t n, const T& val = T())//T()是什么前面已经讲得很清楚了
	: _start(nullptr)
	, _finish(nullptr)
	, _end_of_storage(nullptr)
{
    
    
	reserve(n);
	for (size_t i = 0; i < n; ++i)
	{
    
    
		push_back(val);
	}
}

Take a look at this constructor.
Insert image description here

If you use an iterator, it's hard to write. You must use a vector iterator to initialize it.

Is it necessary to use vectro's iterator initialization for iterator range initialization?
A container is initialized with an iterator range, an iterator of any type if needed.

This introduces another syntax that allows member functions of a class to be function templates.

// [first, last)
template <class InputIterator>
vector(InputIterator first, InputIterator last)
	: _start(nullptr)
	, _finish(nullptr)
	, _end_of_storage(nullptr)
{
    
    
	while (first != last) //不能用 <=,如果是链表肯定就不行
	{
    
    
		push_back(*first);
		++first;
	}
}

If we do not write an initialization list, we can use the C++11 syntax and give a default value when declaring the member variable. The default
value is used for the initialization list of the constructor.
Insert image description here

Insert image description here

Something strange happened. An error like this was reported during compilation.
Insert image description here

Why is the matching wrong? It matches the iteration interval initialization written above?
Insert image description here

We know that the compiler will call the one that best matches. After careful analysis, we can actually find that if we deduce it, if we want to match vector(size_ tn, const T& val =T());, type conversion will occur, and calling If the iterator range is initialized, we will deduce that if the type is int, it will match directly.

Then look at the code for initializing the iterator interval. Int cannot be dereferenced, so an error is reported directly.

How to deal with it?
1. Add a u, which means my variable is unsigned.
Insert image description here

2...Look at the source code of STL and see how the source code solves this problem.
We can solve it in a very simple way here and provide an overloaded version.

There is a risk when returning a reference, so use it with caution, unless you want to modify it like operator[].

Let me show you something magical.
Insert image description here
As long as the types match, char can be converted.

The most amazing thing is that you can play like this.
Insert image description here

It can even be an array. Why can it be an array?
A native pointer can be a natural iterator. There is a prerequisite that this native pointer points to an array.
In fact, vector iterators and string iterators can also be native pointers.

Then expand it.
sort

The default is ascending order.
Insert image description here
This is a function template, and its name is random access iterator. So what is random access? Usually the bottom layer is an array.

It helps us sort, and it’s fun to use.

If it's in descending order, we'll use that.
1.
Insert image description here
2.

These two are actually equivalent.

copy construction

We also cover the issue of dark and shallow copies.
Insert image description here
First of all, writing like this is our classic shallow copy problem.

Let's first write a deep copy of the traditional writing method.
Insert image description here

vector(const vector<T>& v)
	: _start(nullptr)
	, _finish(nullptr)
	, _end_of_storage(nullptr)
{
    
    
	_start = new T[v.capacity()];
	memcpy(_start, v._start, sizeof(T) * v.size());
	_finish = _start + v.size();
	_end_of_storage = _start + v.capacity();
}

Look at the code below, it crashes, why?
Insert image description here

Insert image description here
In other words, when our data is int, the program can run normally, but when our data is string, it will crash.

Because memcpy is also a shallow copy. What does memcpy do when calling copy construction.
Copy all values sequentially starting from the starting position.

Insert image description here
There is another layer here that we have not considered. This is another layer of deep copy inside the deep copy. memcpy is a deep shallow copy.
It will crash when calling destructor.

How to solve it?
We must solve three problems. The data is of int type, or string type, or vector type of vector.

How to complete a deep copy? Did we write it ourselves?
We can't solve it ourselves, because T is a template and we don't know what type it is.
You can't write a deep copy of it yourself, because they are private and you can't touch the contents inside.

So we call a deep copy function here to complete it.
Assignment is a deep copy.
Insert image description here

In fact, we have not completely solved all the problems. In addition to using memcpy for copy construction, memcpy is also used for expansion.

Modify the expansion code.

void reserve(size_t n)
{
    
    
	if (n > capacity())
	{
    
    
		size_t sz = size();
		T* tmp = new T[n];
		if (_start)
		{
    
    
			//memcpy(tmp, _start, sizeof(T)*size());
			for (size_t i = 0; i < sz; ++i)
			{
    
    
				tmp[i] = _start[i];
			}
			delete[] _start;
		}

		_start = tmp;
		_finish = _start + sz;
		_end_of_storage = _start + n;
	}
}

Then, what’s the problem? The data in the vector is, for example, a vector object, such as vectro<vector>. Let me show you an example of Yang Hui's triangle.
Insert image description here
test

Insert image description here
Something went wrong, why? What issues remain unresolved.
The outer vector is a deep copy, and the inner vector is a shallow copy.
The problem still occurs in the copy construction, we did not write the assignment ourselves. Therefore, the compiler still uses the default generated one, which is a shallow copy.

We can solve it by writing an assignment ourselves.
Insert image description here

Modern writing method
is directly reused.

void swap(vector<T>& v)
{
    
    
	std::swap(_start, v._start);
	std::swap(_finish, v._finish);
	std::swap(_end_of_storage, v._end_of_storage);
}
vector(const vector<T>& v)
{
    
    
	vector<T> tmp(v.begin, v.end());
	swap(tmp);
}

The last small problem is that it is grammatically allowed without adding template parameters, but it is not recommended to write it this way.
Insert image description here