[C++] Simulation implementation of vector (SGI version)

I can't bear the pain of self-discipline, and I can't accept the sin of mediocrity. I want to feel better, but I also want to be comfortable.
You, you... either don't think about it, think about it but don't do it, hesitate, hesitate, worry about gains and losses...

Insert image description here



1. Four constructors

1.Vector’s framework and parameterless construction

1.
The following is the framework of vector, in which the member variables are the _start pointer pointing to the first part of the currently used space, the _finish pointer pointing to the next position of the last element, and the _end_of_storage pointer pointing to the next position at the end of the available space. And for vector, since its bottom layer is implemented by a sequence table, its iterator is the original pointer T*. We define const and non-const iterators to facilitate the iterators of const and non-const objects. transfer.

The following is the source code implementation of the SGI version of stl_vector.h. What we simulate and implement is the source code of the SGI version.

Insert image description here
Insert image description here

namespace wyn
{
    
    
	template<class T>
	class vector
	{
    
    
	public:
		typedef T* iterator;
		typedef const T* const_iterator;
               ……
               ……
       private:
		iterator _start;
		iterator _finish;
		iterator _end_of_storage;
	};
}

2.
For the parameterless constructor, we use the initialization list to initialize. It is better to initialize with nullptr, because nullptr's free or delete will not go wrong.

vector()
	:_start(nullptr)
	,_finish(nullptr)
	,_end_of_storage(nullptr)
{
    
    }

2. Unclear constructor call (problems caused by matching priority when calling functions)

1.
In addition to no-parameter constructs, commonly used constructs include constructors with iterator intervals as parameters. The iterator here needs to be implemented using a function template, because the iterator used to construct the vector is not necessarily of vector type, but may also be of string type, so the iterator parameters here need to be implemented using a template.

Insert image description here

template<class InputIterator>
vector(InputIterator first, InputIterator last)
	:_start(nullptr)
	, _finish(nullptr)
	, _end_of_storage(nullptr)
{
    
    
	while (first < last)
	{
    
    
		push_back(*first);
		first++;
	}
}

2.
Another construction is to use n value values ​​to construct. The value may be a built-in type or a custom type. Therefore, if a reference is used as a parameter, a const reference, that is, a constant reference, is required. Parameters, otherwise actual parameters of built-in types cannot be received.

Insert image description here

3.
After implementing the constructor of n value construction, if we use 10 int type numbers 1 to construct the object v1, an error will actually be reported. The reason for the error is actually caused by the matching priority of the function. The parameter cannot be correctly matched to the corresponding constructor. However, no error will be reported when using 10 characters A of char type. This is actually determined by the matching priority of the function.

4.
For the construction of size_t and constant reference as parameters, its matching priority is not actually the highest for 10 1, because the constant reference requires the derivation of the class template parameter T type, and 10 is an integer, int To size_t, implicit type conversion is also required, which is a bit expensive.
For the construction of the iterator interval as a parameter, the function template parameter InputIterator only needs one type derivation to complete the matching. Therefore, when constructed with 10 1s, the actual matching constructor is the constructor with the iterator interval as a parameter. , and in the matching constructor, the iterator interval is dereferenced, that is, the constant 10 is dereferenced, and illegal indirect addressing occurs.

5.
To solve this problem, you can change size_t to int type, or convert top 10 to size_t type. However, the solution of stl source code is not like this. Instead, function overloading is used to solve this problem. Overloaded a constructor of type int.

Insert image description here

Insert image description here

//vector<int> v1(10, 1);
//1.两个都是int,则v1优先匹配的是两个参数均为同类型的模板class InputIterator,不会是下面的构造,因为10需要进行隐式类型转换
//1也需要进行类模板的显示实例化,优先级并没有同类型参数的函数模板高,函数模板只需要一次推导参数类型即可匹配成功。
//2.但是如果匹配了函数模板,则解引用int类型就会发生错误,非法的间接寻址。
// 
//vector<char> v1(10, 'A');一个是int一个是char,没有选择,只能选择下面的构造函数进行v1的构造。

//所以可以将构造函数的第一个参数类型改为int,但库在实现的时候,默认用的就是size_t,我们改成int就不太好。
//那该怎么办呢?答案就是看源代码。利用重载构造解决问题。
vector(size_t n, const T& val = T())//引用和指针在赋值时,有可能出现权限问题。这里需要用常量引用,否则无法接收常量值。
	:_start(nullptr)
	, _finish(nullptr)
	, _end_of_storage(nullptr)
{
    
    
	reserve(n);
	while (n--)
	{
    
    
		push_back(val);
	}
}
vector(int n, const T& val = T())//引用和指针在赋值时,有可能出现权限问题。这里需要用常量引用,否则无法接收常量值。
	:_start(nullptr)
	, _finish(nullptr)
	, _end_of_storage(nullptr)
{
    
    
	reserve(n);
	while (n--)
	{
    
    
		push_back(val);
	}
}
void test_vector8()
{
    
    
	std::string s("hello world");
	vector<int> v(s.begin(), s.end());
	for (auto e : v)
	{
    
    
		cout << e << " ";
	}
	cout << endl;

	//vector<int> v1(10, 1);//报错非法的间接寻址
	vector<char> v1(10, 'A');//这样的情况都不会出现问题,但是这样的情况,构造函数第二个参数需要加const修饰,以免权限放大
	for (auto e : v1)
	{
    
    
		cout << e << " ";
	}
	cout << endl;
	//编译错误:注释代码排除寻找出现问题的代码部分
	//运行错误:进行调试
}

2. Vector copy construction and assignment overloading

1.
For the implementation of copy structure, just like string, there are still traditional writing methods and modern writing methods that use workers. The essence of using workers is actually code reconstruction. There are two traditional writing methods. One is to create a new space and then use memcpy to copy the data. The other is to use reserve to reserve space in advance, and then push_back to insert the tail of the data into the space reserved in advance. However, The former traditional writing method actually has potential risks. If the data type copied is a custom type, it is a shallow copy, because when memcpy copies the type requested by the design resource, it will perform a shallow copy byte by byte. So let’s use modern writing.

2.
From the perspective of code readability and practicality, modern writing is better. Here, the iterator of the formal parameter object v is used to construct the temporary object tmp, and then the object tmp and *this are exchanged. In order to To prevent the object tmp from causing access problems to wild pointers when it leaves the function stack frame and is destroyed, an initialization list is used to initialize all three members of *this to nullptr when calling the constructor.

3.
After implementing copy construction, it is relatively simple to implement assignment overloading. Just use the temporary object constructed by copying by value, and then call the swap class member function to complete the assignment of the custom type. In order to comply with the meaning of continuous assignment, we use references as return values.

4. By using the copy construction and assignment overloading of modern writing methods, whether it is a type like vector<vector<int>> or a more complex vector type, deep-level copy problems can be completed.

Below is the pit I fell into, and then I jumped out after thinking about it.

Insert image description here

vector& swap(vector<T>& v)
{
    
    
	std::swap(_start, v._start);
	std::swap(_finish, v._finish);
	std::swap(_end_of_storage, v._end_of_storage);

	return *this;
}
// v2(v1)---v1拷贝构造v2
vector(vector<T>& v)
	:_start(nullptr)
	, _finish(nullptr)
	, _end_of_storage(nullptr)
{
    
    
	//1.普通的传统写法就是开空间和拷贝数据
	//_start = new T[v.capacity()];
	//2.进阶的传统写法
	reserve(v.capacity());
	for (auto& e : v)
	//遍历v,将遍历所得的迭代器里面的值依次赋值给e,如果T是自定义类型,这里会发生多次的拷贝构造,所以最好加引用。
	{
    
    
		push_back(e);
	}
}
vector(const vector<T>& v)//打工人依旧是带参数的构造函数
	:_start(nullptr)
	, _finish(nullptr)
	, _end_of_storage(nullptr)
{
    
    
	//string时候的拷贝构造调用的是带参的构造函数,依次来创建出tmp,构造函数就是打工人
	vector<T> tmp(v.begin(), v.end());
	swap(tmp);//交换的时候就可以体现出拷贝构造函数初始化列表的好处了。
}

// v1 = v2;
// v1 = v1;//极少数情况才会出现自己给自己赋值,况且自己给自己赋值也不报错,所以我们容忍这种情况的出现。
vector<T>& operator=(vector<T> v)//返回引用,符合连续赋值的含义
{
    
    
	swap(v);
	return *this;
}

3. Iterator failure (wild pointer problem caused by off-site expansion)

1. Iterator failure caused by insert remote expansion

1.
The iterator is the glue that binds the container and the algorithm. It allows the algorithm to not care about the implementation of the underlying data structure. The underlying layer is actually a pointer, or a product that encapsulates the pointer.
The iterator of vector is a typedef of a native pointer, so the essence of iterator failure is pointer failure. In other words, it is a problem caused by wild pointer access and access to the invalid space pointed by the pointer.

2.
When using insert, we need to pass an iterator at a certain position. If expansion does not occur during insert, the iterator will still be valid after insert, but as long as expansion occurs during insert, the iterator will become invalid. , because the reserve is expanded off-site, the original iterator cannot be used, because the original iterator points to a certain location in the old space that has been released, so if it continues to be used, wild pointer access will inevitably occur. , this is what we call iterator failure.

Insert image description here
3.
If you want to solve it, the way is very simple. Just use the return value of insert. Its return value is an iterator pointing to the first newly inserted element. By using the return value of insert, the problem of iterator failure will not occur.

Insert image description here


//迭代器失效:野指针问题
void insert(iterator pos, const T& val)
{
    
    
	assert(pos >= _start);
	assert(pos < _finish);
	if (_finish == _end_of_storage)
	{
    
    
		size_t len = pos - _start;//记录一下原空间pos和_start的相对位置。
		size_t new_capacity = capacity() == 0 ? 4 : 2 * capacity();
		//扩容导致了迭代器pos失效,因为进行了异地扩容,pos位置指向的是已释放空间的迭代器
		reserve(new_capacity);
		pos = _start + len;
		//所以如果发生扩容,则需要更新pos的位置
	}
	//挪动数据,对于vector来说,就不存在pos等于0时的越界问题,但没有下标的问题,又会产生指针的问题
	iterator end = _finish - 1;
	while (end >= pos)
	{
    
    
		*(end + 1) = *end;
		--end;
	}
	*pos = val;
	++_finish;
}
void test_vector4()//测试迭代器失效
{
    
    
	string s;
	vector<int> v;
	//v.reserve(10);//如果提前开辟好空间,就不会产生迭代器失效的问题。实际就是因为异地扩容导致的迭代器失效
	v.push_back(1);
	v.push_back(2);
	v.push_back(3);
	v.push_back(4);

	vector<int>::iterator it = find(v.begin(), v.end(), 3);
	if (it != v.end())
	{
    
    
		v.insert(it, 30);
	}
	//insert以后,it不能继续使用,因为迭代器可能失效(野指针),虽然在insert内解决了迭代器失效的问题,但那只是修改了内部的pos
	//it是外部的,依旧不可以使用,传值传递,外部的问题依旧没有被解决。
	(*it)++;//这里一定是一个野指针,因为发生了异地扩容,it指向的是旧空间的位置,但野指针的访问并没有被编译器报错,这很危险。
	//所以一定要小心对于野指针的使用,如果it指向的旧空间被分配给某些十分关键的金融数据,则野指针访问会修改这些关键数据,非常危险
	//如果野指针的使用影响到其他的进程就完蛋了,公司里出现这样的问题直接废球了。

	//1.为什么不用传引用来解决这里的问题呢?因为对于地址这样的常量不能作为变量进行传递,无法从int*转换为int*&
	//2.所以在insert之后不要继续使用it,因为他很有可能失效,就算在vs上不失效,但你能保证在其他平台下也不失效吗?比如g++呢?
	for (auto e : v)
	{
    
    
		cout << e << " ";
	}
	cout << endl;
}

2. How vs and g++ deal with whether the iterator becomes invalid after erasing (one extreme, one mild)

1.
For the vs compiler, it believes that the iterator is invalid after erasing. In the 2013 version, an assertion error will be reported directly. In the 2022 version, it will be an out-of-bounds access. However, g++'s detection of iterator failure is not very good. Strict and handling is not extreme.

Insert image description here
Insert image description here
2.
After erase deletes the code at any position, the iterator under Linux does not become invalid, because the space is still the original space, the subsequent elements have been moved forward, and the position of it is still valid, but an error will be reported directly under vs, so for Regarding the discussion on whether the iterator becomes invalid after erase, in order to ensure good portability of the program, we unanimously believe that the iterator becomes invalid after erase . If you want to use it, you need to use the return value of erase to reassign the iterator.

Insert image description here

iterator erase(iterator pos)
{
    
    
	assert(pos >= _start);
	assert(pos < _finish);

	iterator begin = pos + 1;
	while (begin < _finish)
	{
    
    
		*(begin - 1) = *(begin);
		begin++;
	}
	--_finish;

	return pos;
}
void test_vector5()//测试迭代器失效
{
    
    
	
	std::vector<int> v;
	v.push_back(1);
	v.push_back(2);
	v.push_back(3);
	v.push_back(4);

	// it是失效还是不失效呢?
	std::vector<int>::iterator it = find(v.begin(), v.end(), 3);
	if (it != v.end())
	{
    
    
		v.erase(it);
	}
	//读
	cout << *it << endl;//PJ版实现的非常复杂,相对于SGI版本。能够检查出来越界访问
	//写
	//++(*it);
	for (auto e : v)
	{
    
    
		cout << e << " ";
	}
	cout << endl;
}

3.
Let me say one more thing here. When the application scenario requires deleting all even numbers, the situation is slightly more complicated. If the condition control is not appropriate, even g++ may report an error accidentally. In condition control, we Mainly let’s talk about the difference between if and if else in the while loop. If it is an if conditional judgment, then it will ++ no matter what the situation is. Whether it is an even number or an odd number, it will ++. Such control conditions are easier. Something went wrong. And if you use if and else to solve it, it will only be ++ in odd cases, and it will not be ++ in even cases. This is the difference in usage between if and if else.
When using it, you have to choose which of these two branch statements is appropriate according to the specific scenario. When I didn't know the difference between the two before, a lot of bugs appeared because the scenario was inappropriate.

void test_vector6()//这里我们用自己实现的erase来更新失效之后的迭代器
{
    
    
	// 要求删除所有的偶数
	vector<int> v;
	v.push_back(1);
	v.push_back(2);
	v.push_back(3);
	v.push_back(4);
	v.push_back(4);
	v.push_back(9);
	
	vector<int>::iterator it = v.begin();
	while (it != v.end())
	{
    
    
		if (*it % 2 == 0)
		{
    
    
			it = v.erase(it);
			//所以正确的使用方式就是利用erase的返回值进行迭代器的使用,这样不管你怎么用,迭代器都不会失效。
			//因为在erase之后我们统一认为迭代器it失效了,所以需要erase的返回值来更新迭代器,代码在VS和g++下面都可以跑。
		}
		//it++;//更新迭代器之后,不要继续it++了,因为erase的返回值已经帮你把it挪到下一个元素的位置了,所以你就不要再++了,
		else
		{
    
    
			++it;
		}
		//vs结果还是一样的,erase之后it++,vs认为迭代器失效。所以不要尝试使用erase之后的迭代器。
		//统一认为erase之后的迭代器失效。
	}
	for (auto e : v)
	{
    
    
		cout << e << " ";
	}
	cout << endl;
}
//insert扩容迭代器失效和erase后统一认为迭代器失效。如果想用,那就不要去直接访问迭代器,而是利用返回值更新一下迭代器。

4. Deeper copy of vector

1. Shallow copy problem caused by memcpy byte-by-byte copy

1.
In the test interface part, we created an array of type vector<vector<int>>. Each element of the array is of type vector<int>, a vector<int> composed of 10 numbers 1, before pushing_back four times. , the program will not report an error, but on the fifth time, the program will crash.

Insert image description here

2.
In fact, it is because in the fifth time, when the reserve interface is called, reserve will open space and copy data, and the data copy uses the memcpy byte-by-byte copy function, so once the copied data type is self- Define the type, which is a shallow copy of the pointer. When the temporary object leaves the function stack frame and destroys the tmp object, the destructor will be called to destroy the space pointed by the pointer. In this way, each vector < int > in the array corresponding to *this All pointers of the object will become wild pointers. At this time, push_back will access the wild pointers, and naturally the program will report an error.

Insert image description here
3.
So if you want to solve it, you not only need to perform a deep copy of the vector<vector<int>> type array, but the elements of the array also need to be deeply copied, so you need a deep copy. Of course, continue to use memcpy. It's not possible, but in fact we can use assignment overloading to solve the problem. Assignment overloading indirectly calls the copy constructor, and the copy constructor indirectly calls the constructor with the iterator range as a parameter, using it as a worker to help us construct a and Copy the same object and finally create the copied object successfully. Within the assignment overload, you only need to exchange the three pointers of the one-dimensional array inside the two-dimensional array.
Therefore, the idea of ​​​​code refactoring is very elegant and helps us solve the deep copy problem effortlessly.

Insert image description here

Insert image description here

~vector()
{
    
    
	delete[] _start;
	_start = _end_of_storage = _finish = nullptr;
}
void reserve(size_t n)
{
    
    
	if (n > capacity())//只进行扩容
	{
    
    
		size_t oldsize = size();
		T* tmp = new T[n];//不需要检查new失败,因为C++new失败直接抛异常
		if (_start)//空对象扩容时,_start是nullptr,没必要拷贝数据
		{
    
    
			//memcpy(tmp, _start, sizeof(T) * size());
			//如果是自定义类型,则浅拷贝后的同一空间释放两次,势必会造成野指针使用问题的出现
			//仅仅只发生vector<int>对象数组的深拷贝是不行的,还需要进行其中每个对象的深拷贝,则需要深层次的深拷贝。

			for (size_t i = 0; i < oldsize; ++i)
			{
    
    
				tmp[i] = _start[i];//这里直接调用赋值重载,赋值重载会帮助我们进行对象数组元素的深拷贝
			}
			delete[]_start;
		}
		/*_start = tmp;
		_finish = _start + size();*///如果是空对象,则_finish没改变为空,而start变为tmp的首元素地址了。
		_start = tmp;
		_finish = tmp + oldsize;
		_end_of_storage = _start + n;
	}
}
void resize(size_t n,T val = T())//如果是自定义类型,则将自定义类型的对象先进行初始化,然后在插到vector里面。
{
    
    
	if (n > capacity())
	{
    
    
		reserve(n);
		while (_finish < _start + n)//要小心这里的条件控制,_finish不能等于_start+n,否则发生越界访问。
		{
    
    
			*_finish = val;//如果T是自定义类型,则这里发生赋值重载
			++_finish;
		}
	}
	else
	{
    
    
		_finish = _start + n;
	}
}
void push_back(const T& x)
{
    
    
	if (_finish == _end_of_storage)
	{
    
    
		size_t new_capacity = capacity() == 0 ? 4 : 2 * capacity();
		reserve(new_capacity);
	}

	*_finish = x;
	++_finish;
}
void test_vector9()//这里涉及很深的深拷贝问题。
{
    
    
	vector<vector<int>> vv;
	vector<int> v(10, 1);
	vv.push_back(v);//第一次是开空间,然后存放数据,并不存在拷贝数据的情况。
	vv.push_back(v);
	vv.push_back(v);
	vv.push_back(v);
	vv.push_back(v);//这个地方会挂,挂在扩容,问题出在拷贝数据上面。

	for (size_t i = 0; i < vv.size(); ++i)
	{
    
    
		for (size_t j = 0; j < vv[i].size(); ++j)
		{
    
    
			cout << vv[i][j] << " ";
		}
		cout << endl;
	}
	cout << endl;


	vector<vector<int>> vvret = Solution().generate(5);
	//非静态成员函数的调用必须与某个特定对象相对。1.搞成静态就可以解决了。2.或者利用匿名对象调用非静态成员函数
	//这里出现问题的原因还是因为reserve里的memcpy浅拷贝,因为拷贝构造利用的打工人是迭代器区间为参的构造函数,依旧绕不开
	//push_back和reserve,那么一旦出现对象数组的拷贝构造时,reserve里面的memcpy就会造成野指针问题。
	for (size_t i = 0; i < vvret.size(); i++)
	{
    
    
		for (size_t j = 0; j < vvret[i].size(); j++)
		{
    
    
			cout << vvret[i][j] << " ";
		}
		cout << endl;
	}
	cout << endl;
}

2. Other small interfaces that are convenient to use

iterator begin()
{
    
    
	return _start;
}
iterator end()
{
    
    
	return _finish;
}
const_iterator begin()const
{
    
    
	return _start;
}
const_iterator end()const
{
    
    
	return _finish;
}
T& operator[](size_t pos)
{
    
    
	assert(pos < size());
	return _start[pos];//把_start当成数组玩就行。解引用比较麻烦
}
const T& operator[](size_t pos)const
{
    
    
	assert(pos < size());
	return _start[pos];//把_start当成数组玩就行。解引用比较麻烦
}
size_t size()const//const和非const对象都能调
{
    
    
	return _finish - _start;
	//左闭右开的差正好就是这个区间内的元素个数,[0,10)是正好10个元素,左闭右闭算的是间隔数,如果算元素个数还得+1
}
size_t capacity()const
{
    
    
	return _end_of_storage - _start;
}
bool empty()const
{
    
    
	return _start == _finish;
}
void pop_back()
{
    
    
	assert(!empty());//判断为假就报错。
	--_finish;
}

Guess you like

Origin blog.csdn.net/erridjsis/article/details/129269431