[Road to Advanced C++] Simulation and Implementation of String Class

Preface

The column to which this article belongs - [ Road to Advanced C++ ]

 In the previous article, we explained the basic use of the string class interface. Today we will actually implement our own string class from the bottom. Of course, it is very difficult to implement all interfaces. We mainly implement the commonly used interfaces today~

1. String class

①Key points

  • 1. In order to 不与库里面的string冲突, we need 命名空间to自己实现的类进行封装
  • 2. The framework we implement here is 顺序表的数据结构implemented as follows.
  • 3. For the sake of understanding, the following interfaces are explained separately, and I will give the source code at the end.

②Private members

By the way, the framework is given here~

namespace my_string
{
    
    
	class string
	{
    
    
	public:
		//迭代器——begin和end要用
		typedef char* iterator;
		typedef const char* const_iterator;
		
	private:
			char* _str;
			size_t _size;
			size_t _capacity;
			static	size_t npos;
	};
	size_t string::npos = -1;
}

You may all understand the first three private members, so we have to introduce the last member additionally. Because it is of size_t type, it is under the platform , and because it is only , and because it is vs, so , because here , so it can only be done when defining , and the storage class (static) does not need to be written. So that’s how it’s written. What's the use? We will use it next .64位8字节32位4字节无符号大于等于0static修饰不走初始化列表初始化只是声明类外定义只需声明类域和变量类型insert接口

③Constructor

We need to follow the library here 大致看齐, so here we talk about the concept of library parameters. _size and _capacity refer to valid characters and do not include strings 最后一个\0. Therefore, when initializing here, an extra byte of space will be opened for storage\ 0, but _capacity and _size are the same.

1.Construction

string(const char* str = "")
{
    
    
	int len = strlen(str);
	_str = new char[len + 1];
	memcpy(_str, str,len + 1);
	_size = len;
	_capacity = len;//_capacity存的是多少个有效字符
}

2.Copy construction

First, it involves deep copy and shallow copy. Second, this involves the issue of whom to follow (\0 and _size). Let’s solve the first problem first.

Insert image description here

  • Shallow copy manages the same space, which is very dangerous because it will be destructed twice, so an error will be reported directly.

When we only perform read-only operations, the space we deep copy will cause a certain degree of space consumption, but when we perform shallow copies, we will face the above problems.

So is there any way to solve this problem? There really is one 写时拷贝技术和引用计数.

Here I will only briefly introduce the principle, and will not introduce it in depth. If you are interested, you can understand it by yourself~

Let’s talk about the benefits first: we perform a shallow copy when reading only, and we only perform a deep copy when modifying.

Let’s talk about the principle:

Insert image description here

Let’s solve the second problem. This involves the implementation of the library. Let’s just talk about how the library is implemented. For example: that is to say, the
Insert image description here
copy structure in the library does not end at the first \0, but at the end. One, how to solve this? In fact, it is very simple. Just copy according to _size. The function used is memcpy.

string(const string& str)
{
    
    
	_str = new char[str._size + 1];
	memcpy(_str, str._str, str._size + 1);
	_size = str._size;
	_capacity = str._capacity;
}

④Destructor


~string()
{
    
    
	delete[]_str;
	_str = nullptr;
	_size = _capacity = 0;

}

⑤c_str

const char* c_str() const
{
    
    
	return _str;
}

The following const has two functions:

  • Make *this unmodifiable
  • This function can be called by both const modified string and string

⑤size

Since the _size member is private, we need to access it through public functions when using it.

size_t size() const
{
    
    
	return _size;
}
  • The function of const is the same as above

⑥[]

1.Read and write

  • Note: Need to check whether the position is out of bounds.
char& operator[](size_t pos)
{
    
    
	//判断pos位置是否越界
	assert(pos < _size);
	return _str[pos];
}

1.Read only

const char& operator[](size_t pos) const
{
    
    
	//判断pos位置是否越界
	assert(pos < _size);
	return _str[pos];
}

⑦reserve

  • We only expand the capacity when n is greater than the capacity.
void reserve(size_t n = 0)
{
    
    
	if (n  > _capacity)
	{
    
    
		char* tmp = new char[n+1];
		memcpy(tmp, _str, _size+1);
		delete[]_str;
		_str = tmp;
		_capacity = n;
	}
}

⑧push_back

Consider three questions:

  • Whether expansion is needed
  • Whether the string is empty during expansion
  • After the tail insertion, you need to add \0
void push_back(char c)
{
    
    
	//考虑扩容
	if (_size == _capacity)
	{
    
    
		//有可能是空字符串
		size_t new_capacity = _capacity == 0 ? 4 : _capacity * 2;
		reserve(new_capacity);
	}
	_str[_size++] = c;
	//注意这里一定要给'\0'
	_str[_size] = '\0';
}

⑨append

  • Since we are inserting a string, our expansion condition here is a range.
void append(const char* str)
{
    
    
	int len = strlen(str);
	if (len + _size >= _capacity)
	{
    
    
		reserve(len + _size);
	}
	memcpy(_str + _size, str, len + 1);
	//strcpy会将\0拷贝过去,但是这里我们是memcpy,所以要多拷贝一个。
	_size += len;
}

⑩+=

Reuse ⑧ and ⑨ above.

1. String

string& operator += (const char* str)
{
    
    
	append(str);
	return *this;
}

2. Characters

string& operator += (char c)
{
    
    
	push_back(c);
	return *this;
}

⑪insert

Main logic:
Insert image description here
Here we need to consider that if the position of pos (size_t) is 0, since it is moving from front to back, we have to continuously subtract the subscript, and the condition for judgment must be that the subscript is greater than or equal to pos, continue, and subtract the subscript again. , it will become a very large number, so either all强转成整形 data types are included here , or 设置npos用来判断是否越界, here we take the latter.
Notice:

  • To check the legitimacy of the pos position
  • See if expansion is needed

1.Insert characters

string& insert(size_t pos, size_t n, char c)
{
    
    
	//看pos位置是否合法
	assert(pos < _size);
	//看是否要扩容
	reserve(n + _size);
	//将pos位置的数据进行移动
	//npos是为了防止i--越界访问,因为size_t的范围是大于等于0的,
	//当pos为0,如果不设置npos会出现死循环
	for (size_t i = _size; i >= pos&& i!= npos; i--)
	{
    
    
		_str[i + n] = _str[i] ;
	}
	//将位置插入字符c
	for (int i = pos; i < pos + n; i++)
	{
    
    
		_str[i] = c;
	}
	_size += n;
	return *this;
}

2. Insert string

string& insert(size_t pos, const char* str)
{
    
    
	assert(pos < _size);
	int len = strlen(str);
	
	reserve(len + _size);
	for (size_t i = _size; i >= pos&&i != npos; i--)
	{
    
    
		_str[i + len] = _str[i];
	}
	//将字符串拷贝过去,当然这里\0就不用拷过去了。
	memcpy(_str + pos, str, len);
	_size += len;

	return *this;
}

⑫earse

Main logic:
Insert image description here
Notes:

  • Legality of pos position
  • Whether len and pos+len are greater than _size
string& earse(size_t pos = 0, size_t len = npos)
{
    
    
	assert(pos < _size);
	if (len == npos || pos + len >= _size )
	{
    
    
		_str[pos] = '\0';
		_size = pos;
	}
	else
	{
    
    
		for (size_t i = pos + len; i <= _size; i++)
		{
    
    
			_str[i - len] = _str[i];
		}
		_size -= len;
	}
	return *this;
}

⑬find

  • Return npos on failure

1. Characters

size_t find(char c, size_t pos = 0) const
{
    
    
	assert(pos < _size);
	int begin = 0;
	while (begin < _size && _str[begin] != c)
	{
    
    
		begin++;
	}
	if (begin == _size)
	{
    
    
		return npos;
	}
	else
	{
    
    
		return begin;
	}
}

2. String

  • If you are interested, you can learn about the KMP and BMP algorithms by yourself.

Here I directly use strstr in the library, and a null pointer will be returned if it fails.

size_t find(const char* str, size_t pos = 0)
{
    
    
	assert(pos < _size);
	char* ret = strstr(_str + pos,str);
	if (ret == nullptr)
	{
    
    
		return -1;
	}
	else
	{
    
    
		return ret - _str;
	}
}

⑭substr

Precautions:

  • Legality of pos position
  • Legality of string intervals

Main logic:

  • Find the start and end intervals
string substr(size_t pos = 0, size_t len = npos)
{
    
    
	assert(pos < _size);
	size_t begin = pos;
	size_t end = pos + len;
	if (len == npos|| pos + len >= _size)
	{
    
    
		end = _size;
	}
	string tmp;
	for (size_t i = begin; i < end; i++)
	{
    
    
		tmp += _str[i];
	}
	return tmp;
}

⑮resize

void resize(size_t n, char c = '\0')
{
    
    
	if (n > _size)
	{
    
    
		reserve(n);
		memset(_str + _size, c, n - _size);
	}
	_str[n] = '\0';
	_size = n;
}

⑯clear

void clear()
{
    
    
	_str[0] = '\0';
	_size = 0;
}

⑰>

Main logic:

Insert image description here

bool operator>(const string& s)
{
    
    
	//先比较size
	size_t less = _size > s._size ? s._size : _size;
	int  ret = memcmp(_str, s._str, less);

	if (ret == 0)
	{
    
    
		return _size > s._size;
	}
	return ret > 0;
}

⑱ ==

bool operator==(const string& s)
{
    
    
	//size_t less = _size > s.size() ? s.size() : _size;
	//int  ret = memcmp(_str, s.c_str(), less);

	//if (ret == 0)
	//{
    
    
	//	return _size == s.size();
	//}

	return _size == s._size &&
		memcmp(_str, s._str, _size);
}

As for < ,<= ,>= != can be reused, here we will reuse >=.

⑲>=

bool operator>=(const string& s)
{
    
    
	return *this > s || *this ==  s;
}

⑳swap and =

These two are closely related and will be discussed together here.

swap

void swap(string& str)
{
    
    
	std::swap(_str, str._str);
	std::swap(_size, str._size);
	std::swap(_capacity, str._capacity);
	//说明:不能进行对象的交换,因为swap里也有赋值,成递归调用了。
}

Traditional way of writing assignment

string& operator=(const string& str)
{
    
    
	//传统写法
	//前提是自己不能拷贝自己
	if (this != &str)
	{
    
    
		delete[]_str;
		_str = new char[str._capacity + 1];
		memcpy(_str, str._str, _size + 1);
		_capacity = str._capacity;
		_size = str._size;
	}
	return *this;
}

More modern writing style

  • Let the temporary variables help us destruct.
string& operator=(const string& str)
{
    
    
	string tmp(str);
	swap(tmp);
	return *this;
}

modern writing

  • Direct copy construction of temporary variables.
  • Let temporary variables be destructed
string& operator=(string tmp)
{
    
    
	swap(tmp);
	return *this;
}

Error example:

void swap(string& str)
{
    
    
	std::swap(*this, str);
	//说明:不能进行对象的交换,因为swap里也有赋值,成递归调用了。
}
string& operator=(string tmp)
{
    
    
	swap(tmp);
	return *this;
}

Illustration:
Insert image description here

㉑Iterator

begin

iterator begin()
{
    
    
	return _str;
}
const_iterator begin() const
{
    
    
	return _str;
}

end

iterator end()
{
    
    
	return _str + _size;
}
const_iterator end()const
{
    
    
	return _str + _size;
}

㉒<<

Two points need to be explained here:

  • Stream insertion must use references as parameters, because the copy library will force delete to release and cannot return.
  • The underlying principle of range for is essentially an iterator.

ostream& operator<<(ostream& out, const string& s)
//                                这里必须加const防止被修改
{
    
    
	for (auto c : s)
	{
    
    
		cout << c;
	}
	return out;
}

㉓>>

Notice:

  • Input is overwritten, so we need to clear the input string

  • In order to align with the library, the spaces and \n here are skipped, that is, entering: "space" "space" hello will eventually enter hello.

  • Here we set a capacity that is equivalent to expanding the capacity, just like you used to use a spoon to drink water, but now you want to use a bowl to drink water. After the bowl is full or there is no water, then put it into the string.

istream& operator>>(istream& in, string& s)
{
    
    

	//先清除s内部的内容 
	s.clear();
	//清除空格和\n
	char ch = in.get();
	while (ch == ' ' || ch == '\n')
	{
    
    
		ch = in.get();
	}
	//输入字符
	//由于+=要不断的进行开空间,我们可以设置一个buf将容量放大,
	//当这个buf满了,就在加上,这样节约了空间
	char buf[128];
	int i = 0;
	while (ch != ' ' && ch != '\n')
	{
    
    
		buf[i++] = ch;
		if (i == 127)
		{
    
    
			buf[i] = '\0';
			s += buf;
			i = 0;
		}
		ch = in.get();
	}
	//这里buf也可能会有数据
	if (i != 0)
	{
    
    
		buf[i] = '\0';
		s += buf;
	}
	return in;
}

Source code

namespace my_string
{
    
    
	class string
	{
    
    
	public:
		//迭代器
		typedef char* iterator;
		typedef const char* const_iterator;
		iterator begin()
		{
    
    
			return _str;
		}
		const_iterator begin() const
		{
    
    
			return _str;
		}
		iterator end()
		{
    
    
			return _str + _size;
		}
		const_iterator end()const
		{
    
    
			return _str + _size;
		}
		//用字符串进行构造
		string(const char* str = "")
		{
    
    
			int len = strlen(str);
			_str = new char[len + 1];
			memcpy(_str, str,len + 1);
			_size = len;
			_capacity = len;//_capacity存的是多少个有效字符
		}
		//拷贝构造
		string(const string& str)
		{
    
    
			//在类域里面对象可访问其成员,不管是由谁进行调用
			_str = new char[str._size + 1];
			memcpy(_str, str._str, str._size + 1);
			_size = str._size;
			_capacity = str._capacity;

		}
		//析构函数
		~string()
		{
    
    
			delete[]_str;
			_str = nullptr;
			_size = _capacity = 0;

		}
		//返回C字符串类型
		const char* c_str() const 
		//写const的原因为——不管const或非const都可使用此成员函数
		{
    
    
			return _str;
		}
		//返回有效字符的大小
		size_t size() const//const理由同上
		{
    
    
			return _size;
		}
		//运算符重载下标引用操作符
		//返回值得引用,读和写都可行
		char& operator[](size_t pos)
		{
    
    
			//判断pos位置是否越界
			assert(pos < _size);

			return _str[pos];
		}
		//只读版本
		const char& operator[](size_t pos) const
		{
    
    
			//判断pos位置是否越界
			assert(pos < _size);

			return _str[pos];
		}
		
		//扩容,参数n是想要扩到多大
		void reserve(size_t n = 0)
		{
    
    
			if (n  > _capacity)
			{
    
    
				char* tmp = new char[n+1];
				memcpy(tmp, _str, _size+1);
				delete[]_str;
				_str = tmp;
				_capacity = n;
			}
		}
		//尾插
		void push_back(char c)
		{
    
    
			//考虑扩容
			if (_size == _capacity)
			{
    
    
				//有可能是空字符串
				size_t new_capacity = _capacity == 0 ? \
				4 : _capacity * 2;
				reserve(new_capacity);
			}
			_str[_size++] = c;
			//注意这里一定要给'\0'
			_str[_size] = '\0';
		}
		//追加字符串
		void append(const char* str)
		{
    
    
			int len = strlen(str);
			if (len + _size >= _capacity)
			{
    
    
				reserve(len + _size);
			}
			memcpy(_str + _size, str, len + 1);

			_size += len;
		}
		//运算符重载+=
		string& operator += (const char* str)
		{
    
    
			append(str);
			return *this;
		}
		string& operator += (char c)
		{
    
    
			push_back(c);
			return *this;
		}
		string& insert(size_t pos, size_t n, char c)
		{
    
    
			//看pos位置是否合法
			assert(pos < _size);
			//看是否要扩容
			reserve(n + _size);
			//将pos位置的数据进行移动
			//npos是为了防止i--越界访问,因为size_t的范围是大于等于0的,
			//当pos为0,如果不设置npos会出现死循环
			for (size_t i = _size; i >= pos&& i!= npos; i--)
			{
    
    
				_str[i + n] = _str[i] ;
			}
			//将位置插入字符c
			for (int i = pos; i < pos + n; i++)
			{
    
    
				_str[i] = c;
			}
			_size += n;
			return *this;
		}
		string& insert(size_t pos, const char* str)
		{
    
    
			assert(pos < _size);
			int len = strlen(str);
			
			reserve(len + _size);
			for (size_t i = _size; i >= pos&&i != npos; i--)
			{
    
    
				_str[i + len] = _str[i];
			}
			//将字符串拷贝过去
			memcpy(_str + pos, str, len);
			_size += len;

			return *this;
		}
		string& earse(size_t pos = 0, size_t len = npos)
		{
    
    
			assert(pos < _size);
			if (len == npos || pos + len >= _size )
			{
    
    
				_str[pos] = '\0';
				_size = pos;
			}
			else
			{
    
    
				for (size_t i = pos + len; i <= _size; i++)
				{
    
    
					_str[i - len] = _str[i];
				}
				_size -= len;
			}
			return *this;
		}
		//查找字符串的函数
		size_t find(char c, size_t pos = 0) const
		{
    
    
			assert(pos < _size);
			int begin = 0;
			while (begin < _size && _str[begin] != c)
			{
    
    
				begin++;
			}
			if (begin == _size)
			{
    
    
				return npos;
			}
			else
			{
    
    
				return begin;
			}
		}
		size_t find(const char* str, size_t pos = 0)
		{
    
    
			assert(pos < _size);
			char* ret = strstr(_str + pos,str);
			if (ret == nullptr)
			{
    
    
				return -1;
			}
			else
			{
    
    
				return ret - _str;
			}
		}
		string substr(size_t pos = 0, size_t len = npos)
		{
    
    
			assert(pos < _size);
			size_t begin = pos;
			size_t end = pos + len;
			if (len == npos|| pos + len >= _size)
			{
    
    
				end = _size;
			}
			string tmp;
			for (size_t i = begin; i < end; i++)
			{
    
    
				tmp += _str[i];
			}
			return tmp;
		}
		//调整size
		//void resize(size_t n)
		//{
    
    
		//	if (n > _size)
		//	{
    
    
		//		reserve(n);
		//	}
		//	_str[n] = '\0';
		//	_size = n;
		//}
		void resize(size_t n, char c = '\0')
		{
    
    
			if (n > _size)
			{
    
    
				reserve(n);
				memset(_str + _size, c, n - _size);
			}
			_str[n] = '\0';
			_size = n;
		}
		void clear()
		{
    
    
			_str[0] = '\0';
			_size = 0;
		}
		//比较大小
		bool operator>(const string& s)
		{
    
    
			//先比较size
			size_t less = _size > s._size ? s._size : _size;
			int  ret = memcmp(_str, s._str, less);

			if (ret == 0)
			{
    
    
				return _size > s._size;
			}

			return ret > 0;
		}
		bool operator==(const string& s)
		{
    
    
			//size_t less = _size > s.size() ? s.size() : _size;
			//int  ret = memcmp(_str, s.c_str(), less);

			//if (ret == 0)
			//{
    
    
			//	return _size == s.size();
			//}

			return _size == s._size &&
				memcmp(_str, s._str, _size);
		}
		bool operator>=(const string& s)
		{
    
    
			return *this > s || *this ==  s;
		}
		bool operator<(const string& s)
		{
    
    
			return !(*this >= s);
		}
		bool operator<=(const string& s)
		{
    
    
			return !(*this > s);
		}
		void swap(string& str)
		{
    
    
			std::swap(_str, str._str);
			std::swap(_size, str._size);
			std::swap(_capacity, str._capacity);
			//说明:不能进行对象的交换,因为swap里也有赋值,成递归调用了。
			/*std::swap(*this, str);*/

		}
		string& operator=(const string& str)
		{
    
    
			//现代写法
			//string tmp(str);
			//swap(tmp);
			//传统写法
			//前提是自己不能拷贝自己
			if (this != &str)
			{
    
    
				delete[]_str;
				_str = new char[str._capacity + 1];
				memcpy(_str, str._str, _size + 1);
				_capacity = str._capacity;
				_size = str._size;
			}
			return *this;
		}
		// 库里的swap
		
		//template <class T> void swap(T& a, T& b)
		//{
    
    
		//	T c(a); a = b; b = c;
		//}
		//最优现代写法
		string& operator=(string tmp)
		{
    
    

			swap(tmp);
			return *this;
		}
	private:
		char* _str;
		size_t _size;
		size_t _capacity;
		static	size_t npos;
		//const static size_t npos = -1;
		//这是可行的。当做特殊语法记住即可。
	};
	size_t string::npos = -1;
	//注意流插入和流提取不能进行拷贝,因为实现是强制设置为引用,
	//拷贝会直接delete
	ostream& operator<<(ostream& out, const string& s)
	{
    
    

		//for (size_t begin = 0; begin < s.size(); begin++)
		//{
    
    
		//	out << s[begin];
		//}
		for (auto c : s)
		{
    
    
			cout << c;
		}
		return out;
	}
	istream& operator>>(istream& in, string& s)
	{
    
    
		//先清除s内部的内容 
		s.clear();
		//清除空格和\n
		char ch = in.get();
		while (ch == ' ' || ch == '\n')
		{
    
    
			ch = in.get();
		}
		//输入字符
		//由于+=要不断的进行开空间,我们可以设置一个buf将容量放大,
		//当这个buf满了,就在加上,这样节约了空间
		char buf[128];
		int i = 0;
		while (ch != ' ' && ch != '\n')
		{
    
    
			buf[i++] = ch;
			if (i == 127)
			{
    
    
				buf[i] = '\0';
				s += buf;
				i = 0;
			}
			ch = in.get();
		}
		//这里buf也可能会有数据
		if (i != 0)
		{
    
    
			buf[i] = '\0';
			s += buf;
		}
		return in;
	}
}

Summarize

 That’s it for today’s sharing. If you think the article is good,Give it a like and encourage it.! Us 下篇文章再见!

Guess you like

Origin blog.csdn.net/Shun_Hua/article/details/131616491