C++ STL string class simulation implementation

Table of contents

String class member variables

1. Constructor

2. Destructor

3. Copy construction

4. size(), capacity()

Five.operator [ ]

Six. operator =

 Seven. String comparison

 Eight.reserve()

九.push_back(),append()

Ten. operator+=

 Eleven.insert()

 12. Iterators

 twelve.erase()

thirteen.swap()

 14.find()

15. Stream extraction, stream output

16. Compare the library string with our String


In the last issue, we have given a brief introduction to the string class, as long as you can use it normally. In the interview, the interviewer always likes to let the students simulate and realize the string class by themselves, the most important thing is to realize the construction, copy construction, assignment operator overloading and destructor of the string class. Simultaneously, the implementation of the string class further improves our own understanding of classes and objects.

String class member variables

For a String class, there must be a basic storage body, the number of characters stored, and the storage capacity.

class String
{
public:
    //成员函数
private:
	char* _str;   //存储字符串
	int _size;    //字符个数
	int _capacity;//容量
    static const size_t npos = -1;
};

1. Constructor

When implementing the constructor, we need to know that generally a class must have a default constructor, and the string class must also support constant string initialization.

	//默认构造函数
    String()
		:_str(new char[1]),_size(0),_capacity(0)
	{
		_str[0] = '\0';
	}
    //支持常量字符串初始化
	String(const char* str)
		:_size(strlen(str))
	{
		_capacity = _size == 0 ? 3 : _size;
		_str = new char[_capacity + 1];//在实际开辟空间的时候,多开一个字节,用于存储‘\0’
		strcpy(_str, str);
	}

Of course, there is no problem with writing this way, but there are actually better ways of writing:


	//既是默认构造,又能接收常量字符串构造
	String(const char* str = "")
		:_size(strlen(str))
	{
		_capacity = _size == 0 ? 3 : _size;
		_str = new char[_capacity+1];//在实际开辟空间的时候,多开一个字节,用于存储‘\0’
		//将str数据拷贝到_str里
		strcpy(_str, str);
	}

Note: When setting _capacity, try not to set the capacity to 0 at the beginning. In order to avoid unnecessary trouble and judgment in the subsequent multiple expansion.

2. Destructor

There is nothing to introduce about the destructor, let’s just look at it

    ~String()
	{
		delete[] _str;
		_size = _capacity = 0;
 	}

Note: Although we do not write a destructor, the compiler itself will generate a destructor, but today the compiler itself is unreliable, because we need to release the requested heap space. If we don't write it will cause a memory leak.

3. Copy construction

Copy construction is to use an existing String object to initialize another String object.

Of course, if we don't write a compiler, one will be automatically generated, but the generated one is also unreliable, because the compiler will only make a shallow copy.

Note: Obviously the underlying _str addresses are the same, so operations on one will inevitably affect the other. Moreover, the same space will be released twice during destructuring, resulting in some memory problems.

Deep copy:

	//拷贝构造:使用一个已经有的String对象来初始化另一个String对象。
	//1.注意深拷贝
	String(const String& s)
	{
		_capacity = s._capacity;
		_size = s._size;
        //深拷贝重新开的空间
		_str = new char[_capacity+1];
		strcpy(_str, s._str);
	}

 The addresses of the underlying storage are different, so naturally the two no longer have any influence.

4. size(), capacity()

A String is managed by the number of characters (length). Since it is a private member in the class, we have to provide member functions for external access, and cannot be modified externally—the return value is const. Provided separately for ordinary objects and const objects.

	//普通对象调用
    const int size() 
	{
		return _size;
	}

    const int capacity() 
	{
		return _capacity;
	}
    
    
    //const对象调用
    const int size() const
	{
		return _size;
	}

	const int capacity() const
	{
		return _capacity;
	}

	

Five.operator [ ]

The overloading of the [] operator enables the String class to access each member character of the string like an array. There are also considerations for const objects when overloading. Consider out-of-bounds checks.

	//operator[]普通对象调用
	char& operator[](const int index)
	{
		assert(index >= 0 && index < _size);
		return _str[index];
	}
	//operator[] const 对象调用,且不允许修改
	const char& operator[](const int index) const
	{
		assert(index >= 0 && index < _size);
		return _str[index];
	}

 

Six. operator =

operator=overloaded assignment operator is to assign an existing String object to another String object. Here we need to consider the size of the capacity of the left operand. If the capacity of the left operand is sufficient, it is okay, even if it is too much, it is a problem of wasting a little space, if it is not enough, it will be very troublesome and needs to be expanded. Therefore, for simplicity of design, no matter whether the capacity of the left operand is sufficient, we directly re-open the space.


	// operator=重载赋值运算符
	void operator=(const String& s)
	{
		char* tmp = new char[s.capacity() + 1];
		_size = s.size();
		_capacity = s.capacity();
		strcpy(tmp, s._str);
		delete[] _str;
		_str = tmp;
	}

 For continuous assignment, the return value of operator[] must be the left operand of the assignment itself.

	// operator=重载赋值运算符
	String& operator=(const String& s)
	{
		if (this != &s)//str=str时无需多余的运算
		{
			char* tmp = new char[s.capacity() + 1];
			_size = s.size();
			_capacity = s.capacity();
			strcpy(tmp, s._str);
			delete[] _str;
			_str = tmp;
			return *this;
		}
    }

 Notice:

  • It is not possible to use realloc to expand the capacity here, because we use new to open up the space, and directly copy the new new space.
  • We should try our best to copy the data to the temporary tmp first, and then release _str. If we release _str first, if there is an error in the result of new, an exception will be triggered, resulting in the loss of the original data. And the same problem will occur when facing the same object to assign values ​​to each other.

 Seven. String comparison

For string comparison, we only need to implement one equal to, greater than or less than, and the others are directly multiplexed.

//相等
	bool operator==(const String& s)const
	{
		return strcmp(_str, s._str)==0;
	}
	//小于
	bool operator<(const String& s)const
	{
		return  strcmp(_str, s._str) < 0;
	}
	//不等于
	bool operator!=(const String& s)const
	{
		return !(*this == s);
	}
	//小于等于
	bool operator<=(const String& s)const
	{
		return  *this == s || *this < s;
	}
	//大于
	bool operator>(const String& s)const
	{
		return  !( * this == s || *this < s);
	}
	//大于等于
	bool operator>=(const String& s)const
	{
		return  *this == s || *this > s;
	}

 Eight.reserve()

To reset the capacity, the solution we adopted is to re-open the space and copy the original data.

	//重新设置容量
	void reserve(size_t capacity)
	{
		if (capacity > _capacity)//不允许容量的缩减
		{
			char* tmp = new char[capacity + 1];
			_capacity = capacity;
			strcpy(tmp, _str);
			delete[] _str;
			_str = tmp;
		}
	}

 Note: The capacity can be increased, but the capacity is generally not allowed to decrease.

九.push_back(),append()

push_back inserts a character at the end, and append() inserts a string at the end. Note that capacity judgment is required for each insertion to ensure normal insertion.

//重新设置容量
	void reserve(int capacity)
	{
		char* tmp = new char[capacity + 1];
		_capacity = capacity;
		strcpy(tmp, _str);
		delete[] _str;
		_str = tmp;
	}

	void push_back(char ch)
	{
		//容量不足时
		if (_size + 1 >= _capacity)
		{
			reserve(_capacity * 2);
		}
		_str[_size++] = ch;
		_str[_size] = '\0';
	}

	void append(const char* s)
	{
        容量不足时
		int len = strlen(s);
		if (len + _size >= _capacity)
		{
			reserve(_capacity + len);
		}
		strcpy(_str + _size, s);
		_size += len;
	}

Notice:

The expansion of append cannot be expanded by 2 times. Because it is possible that the length of the inserted string itself exceeds twice the original capacity.

Ten. operator+=

You can += a character, or += a string, or += a String object. The effect of operator overloading += is basically the same as that of append and push_back, but the feeling of using it is quite different. Here we reuse append and push_back.

    String& operator+=(char ch)
	{
		push_back(ch);
		return *this;
	}

	String& operator+=(const char* s)
	{
		append(s);
		return *this;
	}

	String& operator+=(const String& s )
	{
		append(s._str);
		return *this;
	}

 Eleven.insert()

insert supports inserting a character at the index position, inserting a string.

    void insert(size_t index, char ch)
	{
		//判断位置是否合法
		assert(index >= 0 && index < _size);
		//判断是否需要扩容
		if (_size + 1 >= _capacity)
		{
			reserve(_capacity * 2);
		}
		//挪动数据
		int end = _size+1;//end=_size+1,将‘\0’一起拷进去。
		//注意:这里的end时int,index是size_t类型,进行比较的时候
		//会发生类型提升,int --> size_t,当index=0,循环结束的条件是end为-1,
		// 但是由于类型提升,end实际在比较的时候的值是一个很大的数。因此仍会进入循环。
		//int end = _size;
		//while (end >= index)
		//{
		//	_str[end + 1] = _str[end];
		//}
		//如果我们代码这样写就会避免当index=0时,end的结束条件是-1。
		while (end >= index+1)
		{
			_str[end] = _str[end - 1];
			end--;
		}
		//插入数据ch	
		_str[index] = ch;
		_size++;
	}

	void insert(size_t index, const char* str)
	{
		//判断位置是否合法
		assert(index >= 0 && index < _size);
		int len = strlen(str);
		if (len + _size >= _capacity)
		{
			reserve(_capacity + len);
		}
		//挪动数据
		int end = _size+len;
		while (end >= index + len)
		{
			_str[end] = _str[end-len];
			end--;
		} 
		//插入数据
		int j = 0;
		for (int i = index; i < index + len; i++)
		{
			_str[i] = str[j++];
		}
		_size += len;
		
	}

 12. Iterators

The iterator of string, the bottom layer is the pointer. Iterators provide a general means of traversing containers.

	typedef char* iterator;
	typedef const char* const_iterator;
    //普通对象调用
	iterator begin()
	{
        //返回字符数组的第一个位置
		return _str;
	}
	iterator end()
	{
        //返回字符数组最后一个字符的下一个位置,与begin形成前闭后开。
		return _str + _size;
	}

    //const 对象调用,返回值const_iterator-->const char*
	const_iterator begin()const
	{
		return _str;
	}
	const_iterator end()const
	{
		return _str + _size;
	}

Notice:

const_iterator-->const char* iterator, the iterator itself can be modified, but the content pointed to by the iterator is not allowed to be modified.

The bottom layer of the range for is traversed even with the help of iterators.

 twelve.erase()

erase supports deleting the following len characters from a certain position.


	//删除pos位置之后的len个字符
	void erase(size_t pos, size_t len = npos)
	{
		assert(pos >= 0 && pos < _size);
		if (pos + len > _size || len == npos)
		{
			_str[pos] = '\0';
			_size = pos;
			return;
		}
		//1.挪动数据
		strcpy(_str + pos, _str + pos + len);
		//2.挪动数据
		//int index = pos;
		//while (index + len < _size)
		//{
		//	_str[index] = _str[index + len];
		//	index++;
		//}
		_size -= len;
	}

Note: erase only deletes characters and reduces the length of the string, but it does not affect the capacity.

thirteen.swap()

The string class itself also provides a swap function. In the last issue, when we introduced the string interface, we also introduced this interface. We also specifically mentioned that the swap provided by string is much more efficient than the swap provided by std.

	//string提供的
    void swap(String &str)
	{
		std::swap(_size, str._size);
		std::swap(_capacity, str._capacity);
		std::swap(_str, str._str);
	}


    //std提供的交换函数
    template<class T>
    void swap(T& e1,T& e2)
    {
	    T tmp = e1;
	    e1 = e2;
        e2 = tmp;
    }    

Note: The swap provided by string is more efficient than that provided by std, because string provides a member function of a class, and only the private member variables of the class can be exchanged. The swap function provided by std is a global function, and the swap process requires three deep copies.

 14.find()

It is string that provides the function of searching for a character from a certain position in the string, and searching for a string.

size_t find( const char ch, size_t pos = 0)
	{
		assert(pos < _size);
		for (int i = pos; i < _size; i++)
		{
			if (_str[i] == ch)
			{
				return i;
			}
		}
		return npos;
	}

	size_t find(const char* str, size_t pos = 0)
	{
		assert(pos < _size);
		char* pindex = strstr(_str + pos, str);
		if (pindex == nullptr)
		{
			return npos;
		}
		else
		{
			return pindex - _str;
		}
	}

Note: strstr is a function to find strings, and the underlying layer is a violent search algorithm. If the string is found, return the character address of the string, if not found, return nullptr. We only need to use the address of the first character of the string, and subtract the stored first address to get the number of characters in the middle interval, which is the subscript of the first character of the found string.

15. Stream extraction, stream output

//1.重载流输入
istream& operator>>(istream& in, String& str)
{
	str.clear();
	char ch = in.get();
	while (ch != ' ' && ch != '\n')
	{
		str.push_back(ch);
		ch = in.get();
	}

	return in;
}
//2.重载流输入
istream& operator>>(istream& in, String& str)
{
	str.clear();
	char buffer[128];
	char ch = in.get();
	int i = 0;
	while (ch != ' ' && ch != '\n')
	{
		buffer[i++] = ch;
		if (i == 127)
		{
			buffer[i] = '\0';
			str += buffer;
			i = 0;
		}
		ch = in.get();
	}
	if (i != 0)
	{
		buffer[i] = '\0';
		str += buffer;
	}

	return in;
}

//重载流输出
ostream& operator<<(ostream& out,const String& str)
{
	for (auto e : str)
	{
		cout << e;
	}
	return out;
}

Note: You can choose only one for stream insertion. The first implementation is easy to waste capacity, and the second one adds a buffer, which is better for capacity utilization.

16. Compare the library string with our String

 We can see that the string in the library is 16 bytes longer than ours. This is because the string in the library contains a 16-byte character array. When the string we store is less than 16 bytes, it is stored directly in the array. If it is larger than 16 bytes, open up space for storage on the heap.

//std库中实现的string类私有成员变量
class string
{
public:
	//....
private:
	char* _str;
	size_t _size;
	size_t _capacity;
	char __str[16];
};

The implementation in g++ is also different. In addition to the necessary length, capacity, and pointers, there will be a reference count in g++.

//g++下string类私有成员
class string
{
public:
	//...
private:
	char* _str;
	size_t _size;
	size_t _capacity;
	size_t _refcount;//引用计数
};

When copying, the deep copy will not be used first under g++, but the shallow copy will be used first, and the reference count will be increased by one. Only when one of the objects is changed by writing will a new space be opened for deep copying. We call this mechanism also called copy-on-write . This mechanism is commonly found in Linux, which can save space and improve efficiency to a certain extent.

17. Complete code example

#pragma once
#include<cstring>
#include<iostream>
#include<cassert>
using namespace std;

//g++下string类私有成员
class string
{
public:
	//...
private:
	char* _str;
	size_t _size;
	size_t _capacity;
	size_t _refcount;//引用计数
};
class string
{
	//....
private:
	char* _str;
	size_t _size;
	size_t _capacity;
	char __str[16];
};

class String
{
public:
	typedef char* iterator;
	typedef const char* const_iterator;
	iterator begin()
	{
		return _str;
	}
	iterator end()
	{
		return _str + _size;
	}
	const_iterator begin()const
	{
		return _str;
	}
	const_iterator end()const
	{
		return _str + _size;
	}

	//既是默认构造,又能接收常量字符串构造
	String(const char* str = "")
		:_size(strlen(str))
	{
		_capacity = _size == 0 ? 3 : _size;
		_str = new char[_capacity+1];//在实际开辟空间的时候,多开一个字节,用于存储‘\0’
		//将str数据拷贝到_str里
		strcpy(_str, str);
	}

	//拷贝构造:使用一个已经有的String对象来初始化另一个String对象。
	//1.注意深拷贝
	String(const String& s)
	{
		_capacity = s._capacity;
		_size = s._size;
		_str = new char[_capacity+1];
		strcpy(_str, s._str);
	}

	//operator[]普通对象调用
	char& operator[](const size_t index)
	{
		assert(index >= 0 && index < _size);
		return _str[index];
	}
	//operator[] const 对象调用,且不允许修改
	const char& operator[](const size_t index) const
	{
		assert(index >= 0 && index < _size);
		return _str[index];
	}
	
	// operator=重载赋值运算符
	String& operator=(const String& s)
	{
		if (this != &s)//str=str时无需多余的运算
		{
			//注意:这里不可以使用realloc来扩容,因为我们使用new来开辟的空间,直接new新的空间进行拷贝。
			//细节:我们要尽量先将数据拷贝到临时的tmp里,再将_str释放掉,如果使用首先将_str释放了,
			//如果在new的结果出现差错,就是出发异常,导致原数据丢失。
			char* tmp = new char[s.capacity() + 1];
			_size = s.size();
			_capacity = s.capacity();
			strcpy(tmp, s._str);
			delete[] _str;
			_str = tmp;
			return *this;
		}
	}
	//重新设置容量
	void reserve(size_t capacity)
	{
		if (capacity > _capacity)//不允许容量的缩减
		{
			char* tmp = new char[capacity + 1];
			_capacity = capacity;
			strcpy(tmp, _str);
			delete[] _str;
			_str = tmp;
		}
	}

	void push_back(char ch)
	{
		//容量不足时
		if (_size + 1 >= _capacity)
		{
			reserve(_capacity * 2);
		}
		_str[_size++] = ch;
		_str[_size] = '\0';
	}

	void append(const char* s)
	{
		int len = strlen(s);
		if (len + _size >= _capacity)
		{
			reserve(_capacity + len);
		}
		strcpy(_str + _size, s);
		_size += len;
	}

	String& operator+=(char ch)
	{
		push_back(ch);
		return *this;
	}

	String& operator+=(const char* s)
	{
		append(s);
		return *this;
	}

	String& operator+=(const String& s )
	{
		append(s._str);
		return *this;
	}

	void insert(size_t index, char ch)
	{
		//判断位置是否合法
		assert(index >= 0 && index < _size);
		//判断是否需要扩容
		if (_size + 1 >= _capacity)
		{
			reserve(_capacity * 2);
		}
		//挪动数据
		int end = _size+1;
		//注意:这里的end时int,index是size_t类型,进行比较的时候
		//会发生类型提升,int --> size_t,当index=0,循环结束的条件是end为-1,
		// 但是由于类型提升,end实际在比较的时候的值是一个很大的数。因此仍会进入循环。
		//int end = _size;
		//while (end >= index)
		//{
		//	_str[end + 1] = _str[end];
		//}
		//如果我们代码这样写就会避免当index=0时,end的结束条件是-1。
		while (end >= index+1)
		{
			_str[end] = _str[end - 1];
			end--;
		}
		//插入数据ch	
		_str[index] = ch;
		_size++;
	}

	void insert(size_t index, const char* str)
	{
		//判断位置是否合法
		assert(index >= 0 && index < _size);
		int len = strlen(str);
		if (len + _size >= _capacity)
		{
			reserve(_capacity + len);
		}
		//挪动数据
		int end = _size+len;
		while (end >= index + len)
		{
			_str[end] = _str[end-len];
			end--;
		} 
		//插入数据
		int j = 0;
		for (int i = index; i < index + len; i++)
		{
			_str[i] = str[j++];
		}
		_size += len;
		
	}

	//删除pos位置之后的len个字符
	void erase(size_t pos, size_t len = npos)
	{

		assert(pos >= 0 && pos < _size);
		if (pos + len > _size || len == npos)
		{
			_str[pos] = '\0';
			_size = pos ;
			return;
		}
		//1.挪动数据
		strcpy(_str + pos, _str + pos + len);
		//2.挪动数据
		//int index = pos;
		//while (index + len < _size)
		//{
		//	_str[index] = _str[index + len];
		//	index++;
		//}
		_size -= len;
	}

	void swap(String &str)
	{
		std::swap(_size, str._size);
		std::swap(_capacity, str._capacity);
		std::swap(_str, str._str);
	}

	size_t find( const char ch, size_t pos = 0)
	{
		assert(pos < _size);
		for (int i = pos; i < _size; i++)
		{
			if (_str[i] == ch)
			{
				return i;
			}
		}
		return npos;
	}

	size_t find(const char* str, size_t pos = 0)
	{
		assert(pos < _size);
		char* pindex = strstr(_str + pos, str);
		if (pindex == nullptr)
		{
			return npos;
		}
		else
		{
			return pindex - _str;
		}
	}

	void clear()
	{
		_str[0] = '\0';
		_size = 0;
	}


	//相等
	bool operator==(const String& s)const
	{
		return strcmp(_str, s._str)==0;
	}
	//小于
	bool operator<(const String& s)const
	{
		return  strcmp(_str, s._str) < 0;
	}
	//不等于
	bool operator!=(const String& s)const
	{
		return !(*this == s);
	}
	//小于等于
	bool operator<=(const String& s)const
	{
		return  *this == s || *this < s;
	}
	//大于
	bool operator>(const String& s)const
	{
		return  !( * this == s || *this < s);
	}
	//大于等于
	bool operator>=(const String& s)const
	{
		return  *this == s || *this > s;
	}

	const size_t size() 
	{
		return _size;
	}

	const size_t size() const
	{
		return _size;
	}

	const size_t capacity() const
	{
		return _capacity;
	}

	const size_t capacity()
	{
		return _capacity;
	}

	const char* c_str()
	{
		return _str;
	}

	~String()
	{
		delete[] _str;
		_size = _capacity = 0;
 	}



private:
	char* _str;
	size_t _size;
	size_t _capacity;
	static const size_t npos = -1;
};

//重载流输出
ostream& operator<<(ostream& out,const String& str)
{
	for (auto e : str)
	{
		cout << e;
	}
	return out;
}

1.重载流输入
//istream& operator>>(istream& in, String& str)
//{
//	str.clear();
//	char ch = in.get();
//	while (ch != ' ' && ch != '\n')
//	{
//		str.push_back(ch);
//		ch = in.get();
//	}
//
//	return in;
//}
//2.重载流输入
istream& operator>>(istream& in, String& str)
{
	str.clear();
	char buffer[128];
	char ch = in.get();
	int i = 0;
	while (ch != ' ' && ch != '\n')
	{
		buffer[i++] = ch;
		if (i == 127)
		{
			buffer[i] = '\0';
			str += buffer;
			i = 0;
		}
		ch = in.get();
	}
	if (i != 0)
	{
		buffer[i] = '\0';
		str += buffer;
	}

	return in;
}

//std提供的交换函数
template<class T>
void swap(T& e1,T& e2)
{
	T tmp = e1;
	e1 = e2;
	e2 = tmp;
}



Guess you like

Origin blog.csdn.net/qq_63943454/article/details/132124609