[C++] Use of list and implementation of basic iterator framework & explanation of string structure under vs and g++

True maturity should not be about pursuing perfection, but facing your own shortcomings. This is the essence of life.

Insert image description here



1. First meeting list

1. List iterator invalidation and basic use

1.
The bottom layer of list is implemented by a headed bidirectional circular linked list. Different from vector and string, list traversal is implemented through iterator. Even if we don’t know the underlying implementation mechanism of list iterator, it does not affect our use. Iterators, this is the benefit that iterators bring to all containers. No matter what container you are, there is a unified traversal method, and that is iterators.

2.
The essence of the implementation of range for is through iterators. Range for can traverse the iterator of the container, dereference the iterator, and then copy it to element e in turn. Therefore, there is no new trick in C++11's range for. It is still implemented using iterators. When the compiler compiles the code, it will foolishly replace the range for with the iterator code, and then compile and run it.

3.
The tail insertion, tail deletion and head plug deletion used in the initial stage of the data structure can still be used normally for lists.

void test_list1()
{
    
    
	list<int> lt;
	lt.push_back(1);
	lt.push_back(2);
	lt.push_back(3);
	lt.push_back(4);

	list<int>::iterator it = lt.begin();//迭代器属于类的内嵌类型
	while (it != lt.end())
	{
    
    
		cout << *it << " ";
		++it;
	}
	cout << endl;

	lt.push_front(10);
	lt.push_front(20); 
	lt.push_front(30);
	lt.push_front(40);

	for (int e : lt)
	{
    
    
		cout << e << " ";
	}
	cout << endl;

	lt.pop_back();
	lt.pop_back();
	lt.pop_front();
	lt.pop_front();

	for (int e : lt)
	{
    
    
		cout << e << " ";
	}
	cout << endl;
}

4.
For lists, just like vectors, we can use the iterator range with find to search for a node corresponding to the element and return the iterator position corresponding to the node.

5.
In the list container, as long as a certain node is operated, it cannot be separated from the iterator. The iterator is the only one of the list, because data structures like linked lists cannot support random access, so random access is done through subscripts. The access method is not feasible, then we can only operate on a certain node through the iterator provided by STL.

6.
For list, insert will not cause iterator failure. The reason for vector iterator failure is that reserve adopts the method of off-site expansion during expansion, which causes the original iterator to point to the space that has been released.
But there is no such operation as list expansion. The list directly applies for space on demand. I will apply for as many nodes as you want to insert, and then link all the nodes to the back of the head node, so iteration after insert The server can still be used because its corresponding node space will not be destroyed and still exists well.

7.
It is different for erase. Erase will release the space of the node corresponding to the iterator. Naturally, the iterator will become invalid after erase. If you want to continue to use the iterator, you can use the return value of erase, and erase will return The iterator of the node next to the deleted node, we can use the return value of erase to update the iterator.

void test_list2()
{
    
    
	list<int> lt;
	lt.push_back(1);
	lt.push_back(2);
	lt.push_back(3);
	lt.push_back(4);

	list<int>::iterator pos = find(lt.begin(), lt.end(), 3);
	if (pos != lt.end())
	{
    
    
		lt.insert(pos, 30);//insert之后,pos迭代器肯定不会失效。
	}

	cout << *pos << endl;
	(*pos)++;

	for (int e : lt)
	{
    
    
		cout << e << " ";
	}
	cout << endl;

	lt.erase(pos);//erase之后,迭代器会失效,因为节点空间被释放了
	cout << *pos << endl;
	for (int e : lt)
	{
    
    
		cout << e << " ";
	}
	cout << endl;
}

2. List operations interface (it looks like a pretty good interface, but unfortunately it’s not very practical)

1.
Resize is used to adjust the space of the linked list. If the adjustment is larger, apply for nodes one by one and insert the tail into the linked list.
If the adjustment is smaller, it will also need to release nodes one by one, which is equivalent to tail deletion of nodes.
But I don’t like to use this interface list.

2.
clear is used to release all nodes except the head node. After calling clear, the size of the linked list becomes 0, but it needs to be distinguished from the destructor. The destructor will remove the space of the head node. Also released, while clear only releases all nodes that store valid data. After the list simulation is implemented later, you will have a deeper understanding.

Insert image description here
Insert image description here
3.
The operation interfaces of the following operations are rarely used. They look really useful and feel good, but they are not commonly used in actual applications. This is what the designer of the library was thinking about. That's great, but it's not that practical when programmers actually use it.
This means that when planning, the design is very good, but when it is actually used, it is found to be useless and of little value .

4.
Remove is equivalent to find+erase, which can be used to delete a specific node in the linked list. If the deleted data does not exist, nothing will happen and no error will be reported.

5.
The linked list provides a separate sorting interface sort, instead of using the sort in the algorithm library. This actually involves the type of iterator.
Functionally, iterators can be divided into three categories: one-way iterators that can only ++ (singly linked lists, hash tables), and two-way iterators that can both ++ and - - (list-headed bidirectional circular linked lists) , it can be both ++ and - - and it can also be a random iterator (string, vector) of a specific number.

6.
The bottom layer of sort in the algorithm library uses quick sort. In order to select the appropriate key value, quick sort will perform three-number centering, so iterators will be used for difference, and the bidirectional iterator of list definitely does not support difference. Therefore, calling sort in the algorithm library will report an error.
If you want to sort the linked list, you can only call the member function sort of the list class to sort. The bottom layer of list's sort uses merge sort.

Insert image description here
Insert image description here

7.
Unique can deduplicate linked lists, but deduplication must be based on sorting. If deduplication is performed without sorting, problems will arise in the result of deduplication.
The same numbers that are next to each other will be deduplicated. If they are not next to each other, the result after unique is called will be wrong. This is a bit like the problem of removing duplicate elements in an array. The fast and slow pointer method is used to deduplicate. I remember that it is also based on Duplicate elements are deleted in order. The principle of unique here is the same as the speed pointer.

8.
Merge can merge two linked lists, reverse is used to reverse the linked list, and splice can transfer nodes of a linked list, nodes of a certain range, or all nodes to another linked list.

//operations操作接口:有用,但用处不大,和我们做的时间规划表一样,想的挺好,但在实际用的时候,并没有那么常用。
void test_list3()
{
    
    
	list<int> lt;
	lt.push_back(1);
	lt.push_back(9);
	lt.push_back(5);
	lt.push_back(2);
	lt.push_back(5);
	lt.push_back(2);
	for (int e : lt)
	{
    
    
		cout << e << " ";
	}
	cout << endl;
	lt.remove(3);//remove=find+erase
	lt.remove(30);//如果删除的元素不存在,则什么也不会发生
	for (int e : lt)
	{
    
    
		cout << e << " ";
	}
	cout << endl;

	lt.sort();//链表单独提供一个排序,没有用算法库里面的
	//sort(lt.begin(), lt.end());//这样进行链表排序是不行的
	//迭代器功能分类:
	//1.单向迭代器 ++ 单链表
	//2.双向迭代器 ++ -- list
	//3.随机迭代器 ++ -- + - vector&&string
	for (int e : lt)
	{
    
    
		cout << e << " ";
	}
	cout << endl;

	//必须先排序,再去重
	lt.unique();//去重算法是建立在有序的基础上。去重有点像快慢指针删除数组重复元素,所以如果重复数字不挨着,unique就会出现错误。
	for (int e : lt)
	{
    
    
		cout << e << " ";
	}
	cout << endl;

	lt.reverse();逆置
	lt.merge();归并
	lt.splice();拼接转移
}

3. Comparison of the sorting performance of vector and list (the reason why the sort interface of list is not commonly used: the sorting efficiency of list is not high)

1.
To test the sorting performance, it is recommended to test under the release version. Different phenomena will occur under the debug version due to differences in compiler versions. The performance test is more accurate under the release version. The debug bottom layer will cause problems due to certain optimizations, etc. The results are not precise enough.

2.
When the amount of data is about 100,000, the sorting performance of vector is about twice that of list, so the sorting performance of list is very low compared to vector.

Insert image description here
3.
Someone has made an analogy. If you want to sort a list, it is better to copy the data of the list to the vector for sorting, and then copy the data back to the list after sorting. Even the performance of such sorting is better than that of direct sorting. The performance of sorting with list is much higher. As can be seen from the results, the sorting performance of vector is significantly higher than that of list.
Of course, if the amount of data is small, the difference between vector and list will not be that big, and the sorting time will be almost the same.

4.
Therefore, if the amount of data is large, the sort of the list will not be selected. In fact, the main reason is that the space of the list is discontinuous. When accessing the discontinuous space, the time consumption is quite large. When the continuous vector space is accessed, the consumption is relatively small, and the hit rate of the CPU cache is also high. This is the unique advantage of the vector data structure.

Insert image description here
5.
If you don’t need to insert and delete headers, vector is better. If you frequently insert headers or insert and delete in the middle, the structural advantages of list will be reflected, because vector is a continuous space and list is a node one by one, and each data needs to be moved. , one does not need to move data.

//1.vector排序和链表排序的性能对比,所以如果你要排序,就不要将数据放到链表里面去,这也正是链表的sort接口不常用的原因。
//2.N个数据需要排序,vector+ 算法sort  list+ sort
void test_op()//优化这部分直接看release版本即可,debug版本对于不同的结构在底层优化达到的效果都不太一样。主要看release即可。
{
    
    
	srand(time(NULL));
	const int N = 100000;
	vector<int> v;
	v.reserve(N);

	list<int> lt1;
	list<int> lt2;
	for (int i = 0; i < N; ++i)
	{
    
    
		auto e = rand();
		v.push_back(e);
		lt1.push_back(e);
		//lt2.push_back(e);
	}

	// 拷贝到vector排序,排完以后再拷贝回来
	int begin1 = clock();
	//for (auto e : lt1)
	//{
    
    
	//	v.push_back(e);
	//}
	sort(v.begin(), v.end());//调用算法库的sort对vector进行排序
	//size_t i = 0;
	//for (auto& e : lt1)
	//{
    
    
	//	e = v[i++];
	//}
	int end1 = clock();

	int begin2 = clock();
	lt1.sort();
	int end2 = clock();

	printf("vector sort:%d\n", end1 - begin1);
	printf("list sort:%d\n", end2 - begin2);
}

2. The basic framework of the list iterator (structure pointers cannot meet the needs, class encapsulation + operator overloading makes the iterator behave like a pointer)

1.
In order to support generic programming, C++ comes up with the construction of built-in types. In fact, the compiler will perform special processing here to distinguish between generics and built-in types. When used, you can use the constructor of the type to initialize the built-in type. Generally initialized to a value such as 0, the custom type will call the default constructor of the class.

Insert image description here

2.
Iterators are built-in types of classes that behave like pointers and can be dereferenced and ++ or - -.
The iterators of vector and string are implemented by native pointers. That is because their bottom layer is a dynamic sequence table and the memory is continuous. Dereferencing the iterator is dereferencing the native pointer, and then you can naturally get the corresponding array. The content of the position, and the iterator of the list corresponds to a structure, which is a custom type, not the built-in type of the native pointer, so when we dereference the iterator, we get the structure object, not the data content. This It does not conform to the characteristics of an iterator, because the original intention of an iterator is to dereference to get the data, and what we get is a structure object, which is a problem.
So at this time we need class encapsulation and operator overloading to implement the iterator of the list, so that its iterator can dereference and ++ or - -. As long as operator overloading is used, of course it cannot be separated from the class, dereference iteration If the iterator can obtain the corresponding structure data, then the iterator is not simply a native pointer, it should be an object, and the class member function of this object can implement functions such as dereference ++ - -.

3.
This is just like the year, month and day does not support ++ - - and other operations, then we encapsulate a date class and implement the ++ - - and other operations of the date in the date class. The iterator here also means the same thing. Doesn’t your ordinary structure pointer node * not support dereferencing to get data, ++ - - and other operations? Then I would encapsulate a class and use operator overloading in this class to make your structure pointer node * support iterator operations. Wouldn't that be great?

4.
A pointer to a node can be used as a member variable of the list iterator. An iterator is essentially an object. The member variable of this object is a structure pointer. Only through the iterator class and the iterator object can we iterate the list. The device implements operations such as dereference, addition, subtraction, etc.

5.
In order to support generics, you can see that the STL library adopts the form of templates in parameter design. In the implementation part, built-in types are also regarded as custom types. C++ allows built-in types to also support member functions such as construction, assignment, and copy construction. , so that under generic programming, both custom types and built-in types can be processed uniformly using template parameters. When used specifically, different templates can be instantiated according to different template parameter types, so that they can be used during programming. Greatly improve the maintainability of the code, generic programming can save a lot of unnecessary code.

Insert image description here

namespace wyn
{
    
    
	template<class T>
	struct list_node
	{
    
    
		list_node* _next;//指向下一个结点的结构体指针
		list_node* _prev;//指向前一个结点的结构体指针
		T _data;//数据类型是泛型,可能是内置类型,也有可能是自定义类型

		list_node(const T& x)
		//new结点的时候会调用构造函数,但编译器默认生成的无参构造函数无法满足我们的要求
		//所以我们需要自己写一个带参数的构造函数,因为new结点时需要将数据作为参数传递,无参不符合要求。
			:_next(nullptr)
			,_prev(nullptr)
			,_data(x)
		{
    
    }
	};
	
	template<class T>
	struct __list_iterator
	{
    
    
		typedef list_node<T> node;
		
		node* _pnode;//迭代器类的成员就是一个结构体指针_pnode

		__list_iterator(node* p)
			:_pnode(p)
		{
    
    }

		T& operator*()//返回_data的引用,则解引用迭代器可以修改结点对应的数据
		{
    
    
			return _pnode->_data;
		}

		__list_iterator<T>& operator++()
		{
    
    
			_pnode = _pnode->_next;
			return *this;
		}

		bool operator!=(const __list_iterator<T>& it)
		//比较两个迭代器是否相等,就是比较结点指针相不相等
		{
    
    
			return _pnode != it._pnode;
		}
	};
	
	template<class T>
	class list
	{
    
    
		typedef list_node<T> node;//将实例化后的类模板list_node<T>类型重定义为node
	public:
		typedef __list_iterator<T> iterator;
		//将实例化后的类模板__list_iterator<T>类型重定义为iterator
		
		iterator begin()
		{
    
    
			//iterator it(_head->_next);
			//return it;
			//上下这两种写法是等价的。
			return iterator(_head->_next);
			//返回迭代器类的匿名对象,参数传结构体指针,迭代器类的成员变量只有一个结构体指针。
			//匿名对象可以省下我们自己定义出对象然后再返回,这样比较麻烦。
		}

		iterator end()//迭代器对象出了作用域被析构掉,所以用传值返回,不能用传引用返回
		{
    
    
			return iterator(_head);
			//end()返回的是最后一个元素的下一个位置的迭代器,所以我们返回的是哨兵卫结点的迭代器对象。
		}
		void empty_initialize() 
		{
    
    
			_head = new node(T());//node实现的构造函数是带参数的,调用T类型的默认构造初始化
			//new一个结点,new会自动调用node类的带参构造函数,我们给构造函数传一个泛型的匿名对象,
			//保证结点存储的数据类型是泛型,既有可能是内置类型也有可能是自定义类型,所以传匿名对象。
			//如果是自定义类型,会调用其类的无参构造函数,如果是内置类型,基本是0或NULL等初始值,
			//我们可以认为内置类型也有构造函数,这样的写法实际是为了支持C++的泛型编程所搞出来的,
			//如果是内置类型,编译器会做特殊处理。
			_head->_next = _head;
			_head->_prev = _head;
		}
		list()
		{
    
    
			empty_initialize();
		}
	
		void push_back(const T& x)
		{
    
    
			node* newnode = new node(x);
			//这里所传x的类型是不确定的,他的类型取决于调用方给模板参数所传的值。
			//如果T是自定义类型,那x就是对象,如果T是内置类型,x就是变量。
			node* tail = _head->_prev;

			tail->_next = newnode;
			newnode->_prev = tail;
			newnode->_next = _head;
			_head->_prev = newnode;

			//insert(end(), x);
		}
		
	private:
		node* _head;
	};
}
	void test_list1()
	{
    
    
		list<int> lt;
		lt.push_back(1);
		lt.push_back(2);
		lt.push_back(3);
		lt.push_back(4);
		//内嵌类型 -- 迭代器需要能够1.解引用能够取到结点的数据 2.并且可以++或--进行移动
		//string和vector的iterator原生指针能够使用,是因为数组结构正好支持迭代器行为。
		//list如果用原生指针,它的数组结构无法支持迭代器行为,因为list的空间是不连续的。
		//为了支持list的迭代器,我们用类的封装和运算符重载进行支持。

		list<int>::iterator it = lt.begin();
		//由于迭代器对象的拷贝构造没有实现,所以用编译器默认生成的浅拷贝。
		while (it != lt.end())//vector和string可以用<来进行判断,但list这里只能用!=,这里会调用it对应类的运算符重载
		{
    
    
			//it.operator*(){} --- 转换为调用自定义类型对应类内的运算符重载函数
			//it.operator++(){}
			cout << *it << " ";//*it是自定义类型iterator的运算符重载,iterator是进行封装的类型
			++it;//++it也是自定义类型iterator的运算符重载。
			//(*it)++; --- it.operator*()函数的引用返回值进行自增,返回值可能是自定义类型或内置类型。
		}
		cout << endl;

		for (auto e : lt)//范围for就是傻瓜式的替换迭代器的代码,begin()end(),迭代器支持解引用++--等操作,范围for就能用
		{
    
    
			cout << e << " ";
		}
		cout << endl;
	}

3. Description of string structure under vs and g++

1.String structure under vs

1.
The default environment mentioned below is a 32-bit platform, and the pointer is 4 bytes. From the printing results, we can get two pieces of information. One is that the byte sizes occupied by s1 and s2 are the same, and the other is that the byte sizes occupied by both are 28 bytes.
First of all, the reason why they are the same is because the size of the object has nothing to do with the stored data, because the data is dynamically allocated on the heap area. When analyzing the size of the object, we only look at the size of the object member variables, so this can explain Why are the sizes of object s1 and object s2 the same?

Insert image description here
2.
As for why it is 28 bytes instead of 12 bytes, this has something to do with the structure of string under vs. The string we implemented has three member variables: _ptr, _size and _capacity. According to the principle of memory alignment, it should be It is 12 bytes. But what we implemented is not standard and can only complete most of the functions of string.
In the vsPJ version of the STL source code, string occupies a total of 28 bytes. The internal structure is a little more complicated. First, there is a union. The union is used to define the storage space of the string in string: when the string length is less than 16, internal fixation is used. The character array _buf is used for storage. When the character length is greater than or equal to 16, space is opened from the heap and the _buf array is no longer used for storage .

Insert image description here
3.
From the debugging window, you can see that when the stored data is less than 16, the data content is stored in the _Buf array, and _Ptr does not point to a space with valid characters. When the stored data is greater than or equal to 16, the data content is stored in the dynamically opened space pointed to by _Ptr, and nothing is stored in _Buf.

4.
The main design idea of ​​vs for string is to exchange space for time and increase the size of the string object. If the amount of data is relatively small, use the _Buf array opened in advance to store it to save yourself the consumption of dynamically opening space.

Insert image description here
Insert image description here
Insert image description here

2. String structure under g++

1.
Under g++, string is implemented through copy-on-write. The string object occupies a total of 4 bytes and contains only one pointer internally. This pointer will point to a heap space in the future and contains the following fields: it is divided into 4 parts . , the total size of the space, the effective length of the string, the reference count, the pointer to the heap space, used to store the string .

2. But the displayed string object size is 8 bytes, because the default environment is 64-bit and the pointer size is 8 bytes.

x86_64 is a 64-bit platform and the pointer size is 8 bytes

Insert image description here

Insert image description here

3.
When data is not written to the space pointed by the pointer in the object, g++ will first produce a shallow copy without opening the space. In the third picture of the code below, we write to the space. At this time, Copy-on-write will occur and a deep copy of the space will be performed again.

Insert image description here

Insert image description here

Insert image description here

Guess you like

Origin blog.csdn.net/erridjsis/article/details/129337327