Simulation implementation of C++-string class

This blog is based on the main function implementation of the string class given in the official C++ document, as a reference, and simply simulates the implementation of My-string.

 For the introduction of the string class in C++, it has been explained in the previous blogs. If you have any questions, please refer to the two blog articles for reference:

(2 messages) C++ string class-2_chihiro1122's blog-CSDN blog

(2 messages) C++ string class iterator range for_string type iterator_chihiro1122's blog-CSDN blog

string class

 In order to distinguish it from the official string class, the My-string class is implemented in a custom namespace when writing.

Realization of basic functions

 Member variables of the My-string class

 You don't want member variables to be modified outside the class, so use the protected keyword to modify:

	protected:
		char* _str;
		size_t _size;
		size_t _capacity;

 constructor and destructor

 There are many constructors in the official documentation, but it is not necessary to implement them all. You only need to implement the two commonly used constructors, passing in strings and constructing empty objects. The implementation is as follows:

		 //构造函数
		string()
			:_str(new char[1]),
			_size(0),
			_capacity(0)
		{
			_str[0] = '\0';
		}

		string(const char* str)
			: _str(new char[strlen(str) + 1]),
			_size(strlen(str)),
			_capacity(strlen(str))
		{
			strcpy(_str, str);
		}

Note : The parameter list used above is used to assign values ​​and open space to the member variables in the string class. When using the parameter list, the order of the parameters in the parameter list must be maintained with the declaration order of the member variables in this class. unanimous. Because when using the parameter list to define a member variable, it is not defined according to the order of the parameters in the parameter list, but according to the order in which the members are declared in this class.

In the following example, an error occurs when using a parameter list:

class string
	{
	public:
        string(const char* str = "")
			: 
			_size(strlen(str)),
			_capacity(strlen(str)),
            _str(new char[_capacity + 1])
		{
			strcpy(_str, str);
		}

    protected:
		char* _str;
		size_t _size;
		size_t _capacity;
	};

We found that when the string array of _str is defined above, the declaration in this class is the first declaration, and the position in the parameter list of the constructor is the last. However, when defining variables, the variable _size is not defined first, but is defined according to the order in which the variables are declared in this class, so the string array of _str will be defined first; and because the parameters of the above constructor In the list, the size of the defined _str array is calculated according to the member variable _capacity, but at this time the member variable _capacity is not defined, which is not the value we want, so this program may have problems.

 Therefore, when we use the parameter list, remember to ensure that the declaration of member variables in the class is consistent with the order in which they are defined in the list.

The two constructors we want to implement above can actually be implemented with a constructor with default parameters:

		//string(const char* str = '\0') // 错误写法
		//string(const char* str = nullptr) // 错误写法
		//string(const char* str = "\0") // 可以但是没有必要下面更好
		string(const char* str = "")
			: _str(new char[strlen(str) + 1]),
			_size(strlen(str)),
			_capacity(strlen(str))
		{
			strcpy(_str, str);
		}

 Copy constructor implementation:

		// 拷贝构造函数(深拷贝)
		string(const string& s)
		{
			_str = new char[s._capacity + 1];
			strcpy(_str, s._str);
			_capacity = s._capacity;
			_size = s._size;
		}

 Destructor implementation:

		// 析构函数
		~string()
		{
			delete _str;
			_str = nullptr;
			_size = _capacity = 0;
		}

 Some simple function implementation

		// 返回c语言形式的字符串
		const char* c_str() const
		{
			return _str;
		}

		// 返回有效的字符个数
		size_t Size() const
		{
			return _size;
		}

operator [] 

 When we use the official string class, the operator overloading function of the [] subscript access operator is very useful, so let’s implement it here. In fact, it is not difficult to implement this function, but we need to pay attention to two interfaces, one interface can be Read and write functions, that is, for non-const objects; the other is for read-only const objects:

Readable and writable (non-const object):

		char& operator[](size_t pos)
		{
			assert(pos < _size);

			return _str[pos];
		}

read-only (const object):

		const char& operator[](size_t pos) const
		{
			assert(pos < _size);

			return _str[pos];
		}

The const used in the above-mentioned read-only interface function modifies the implicit this pointer in the function parameters . This pointer points to the object that this function acts on. If the object is a const object and the function used is readable Writable functions, that is, functions that are not modified with const, then the authority will be enlarged here, and an error will be reported when compiling.

 As mentioned above, the types of function parameters are different, which constitutes the overloading of the function. When using the operator [] function, if you use an ordinary object, then call the first function; if you use a const object, call the second function.

simple implementation of iterator iterator and range for

 iterator

 In fact, the iterator in the string is the implementation of a typedef. See the implementation of the begin() and end() functions in the string iterator as follows for specific operations.

The relationship between begin() and end() in the array: 

 Then the implementation is actually very simple:

	public:
		typedef char* iterator;

		iterator begin()
		{
			return _str;
		}

		iterator end()
		{
			return _str + _size;
		}

In fact, iterator is a renaming of char* type.

The above iterator is applicable to ordinary objects, and this iterator is readable and writable; for const objects , it is necessary to create a readable iterator separately :

		typedef const char* const_iterator;

		iterator begin() const
		{
			return _str;
		}

		iterator end() const
		{
			return _str + size();
		}

use iterator

	string_begin::string::iterator it = str.begin();
	while (it != str.end())
	{
		cout << *it << endl;
		it++;
	}

Note: It is very simple to use. It should be noted that if we want to find the location of the iterator, we should first declare the namespace as above, and then declare the class space in the namespace , and then the compiler can find the iterator . Because the compiler only searches globally by default.

Or use auto to automatically deduce the type.

 scope for

 After we implemented the above iterator, we used the range for in My-string like the range for in the official string class, and found that it is already available.

 This is because the bottom layer of the range for is actually an iterator, so after we implement the iterator, we can support the range for.

In our opinion, the range for is very smart, automatically starts, automatically judges the end, etc. In fact, this is all done by the compiler. When compiling, it will replace all the range for fools with iterators, so in our opinion it is very Magical, smart, in fact, it is all thanks to the compiler.

 use:

	for (auto ch : str)
	{
		cout << ch << " ";
	}
	cout << endl; 

output:

 How to prove it? It's actually very simple, because the range for is a fool's replacement for the compiler, so you can't use the range for with the wrong name:

Now we comment out the end() function and find that an error is reported directly :

 When the range for is used, the compiler cannot find the end() function.

After we changed the name of the end() function to Myend(), it didn’t work , and it also reported an error directly , and the end() function could not be found :

 To sum up, the scope for is a fool-like replacement, which not only requires the function of the iterator, but also requires the naming of the iterator .

 If we look at the underlying assembly of the range for, we find that he is also calling functions such as begin() and end(), which are similar to iterators.

CRUD

increase 

 For the increase of the string, implement two functions, one is push_back() and the other is append() , push_back() is one character after the end, append () is one character string after the end.

Of course, there may be a problem of expansion . For push_back(), we can directly expand the capacity by twice the form of _capacity; but append(), because the tail difference is a string, we dare not directly expand the capacity by 2 times, maybe There will be cases where the number of valid characters after the added string is greater than _capacity. Therefore, here we use strlen (str) to calculate the valid characters of the string to be inserted + _size The valid characters in the original string array are used as the expansion condition and the size of the expansion.

In the official string class, there is a reserve () expansion function , here we directly implement this function, then in the above  push_back () one is append () function, we use this function to expand:

 reserve():

This function we follow one of the expansion rules of the realloc function in c, directly open up another new large space, and then copy the contents of the original space to the new space:

		// 扩容函数
		void reserve(size_t n)
		{
			if (n > _capacity)
			{
				char* tmp = new char[n + 1];
				strcpy(tmp, _str);
				delete[] _str;
				_str = tmp;
				_capacity = n; 
			}
		}

 After expansion, the above two functions can be implemented:

		// 增
		void push_back(char ch)
		{
			// 如果有效字符个数超过了 容量
			if (_size >= _capacity)
			{
				// 2倍扩容
				reserve(_capacity == 0 ? 4 : _capacity * 2);
			}
			_str[_size] = ch;
			_size++;
			_str[_size] = '\0';
		}

		void append(const char* str)
		{
			size_t len = strlen(str);
			if (len + _size > _capacity)
			{
				// 扩容
				reserve(len + _size);
			}
			strcpy(_str + _size, str);
			_size += len;
		}

Of course, the above two functions are not the most commonly used by us. The most commonly used is the operator += overloaded function. Of course, the bottom layer of this function is similar to the above, but it is more convenient to use:

He also has two interfaces, one is the end insertion character, and the other is the end difference string:
 

		// += 尾差字符
		string& operator+=(char ch)
		{
			push_back(ch);
			return *this;
		}

		string& operator+=(const char* str)
		{
			append(str);
			return *this;
		}

For insertion, there is an implementation of the function to insert a character or string (insert()) at a specified position (pos):

		// 指定位置插入一个或多个字符
		void insert(size_t pos, size_t n, char ch)
		{
			assert(pos <= _size);

			if (_size + n >= _capacity)
			{
				// 扩容
				reserve(_size + n);
			}

			//往后挪动数据
			size_t end = _size;
			while (pos <= end && end != npos)
			{
				_str[end + n] = _str[end];
				--end;
			}

			// 覆盖值
			for (size_t i = 0; i < n; i++)
			{
				_str[pos + i] = ch;
			}

			_size += n;
		}
		// 指定位置插入一个字符串
		void insert(size_t pos, const char* str)
		{
			assert(pos <= _size);
			size_t len = strlen(str);

			if (_size + len >= _capacity)
			{
				// 扩容
				reserve(_size + len);
			}

			//往后挪动数据
			size_t end = _size;
			while (pos <= end && end != npos)
			{
				_str[end + len] = _str[end];
				--end;
			}

			// 覆盖值
			for (size_t i = 0; i < len; i++)
			{
				_str[pos + i] = *(str + i);
			}

			_size += len;
		}

The problems encountered during implementation are as follows:

 end is of type int, pos is of type size_t, according to the above debugging, this loop should exit the loop, but actually entered the loop ;

 The reason is that , in the syntax of C language, it is stipulated that on both sides of an operator, if the types of the operands on the left and right sides are not the same, plastic promotion will occur . Usually, a small type is converted to a large type. For example, if it is int type and double type, then int type will be converted to double type .

 Therefore, it seems that the above-mentioned end has reached -1, and it is reasonable to exit the loop, but the plastic promotion of the end of the int type has been promoted to the size_t unsigned plastic, so friends who understand the type of value range cycle will know that at this time end will not be -1 . From the perspective of unsigned shaping, it is all 1 , which is the maximum value of shaping. Naturally, it will not jump out of the loop.

 There are many ways to solve the above problems, and you can force type conversion:

 Or set an npoe like in the official string:

 After the above settings, you can add one more condition in the while loop, and stop when the end reaches -1:

 delete

 The erase function deletes one or more characters from position pos:

		// 从pos位置删除一个或多个字符
		void erase(size_t pos, size_t len = npos)
		{
			assert(pos <= _size);

			if (len == npos || pos + len >= _size)
			{
				_str[pos] = '\0';
				_size = pos;
			}
			else
			{
				size_t end = pos + len;
				while (end <= _size)
				{
					_str[pos++] = _str[end++];
				}
				_size -= len;
			}
		}

 The clear() function deletes all valid characters in a string:

		// 清除所有有效字符
		void clear()
		{
			_str[0] = '\0';
			_size = 0;
		}

check

 The find() function finds a character at position pos:

		// 从pos位置往后查找一个字符
		size_t find(char ch, size_t pos = 0)
		{
			for (size_t i = pos; i <= _size; i++)
			{
				if (_str[i] == ch)
				{
					return i;
				}
			}

			return npos;
		}

The find() function searches for a string starting from the pos position:

		// 从pos位置往后查找一个字符
		size_t find(char ch, size_t pos = 0)
		{
			assert(pos < _size);
			for (size_t i = pos; i <= _size; i++)
			{
				if (_str[i] == ch)
				{
					return i;
				}
			}

			return npos;
		}

		// 从pos位置开始查找一个字符串
		size_t find(const char* str, size_t pos = 0)
		{
			assert(pos < _size);
			const char* ptr = strstr(_str, str);
			if (ptr)
			{
				return ptr - _str;
			}
			else
			{
				return npos;
			}
		}

The substr() function extracts a string of len characters from the pos position of the string array:

		// 从pos位置开始,从字符串数组当中取出len个字符的字符串,返回string类
		string substr(size_t pos, size_t len = npos)
		{
			assert(pos < _size);

			size_t n = len;
			if (len == npos || len + pos >= _size)
			{
				n = _size - pos;
			}

			string tmp;
			tmp.reserve(n);
			for (int i = pos; i < pos + n; i++)
			{
				tmp += _str[i];
			}

			return tmp;
		}

other

 The resize() function, the resize() function in the official string class can be expanded or reduced (deleting elements). Similarly, the resize() simulated in our MyString should also have the above functions. We consider the following three situations: As shown below):

  • For the case of 5, it is less than _size. At this time, it needs to shrink, which is equivalent to deleting elements and directly modifying _size;
  • For the case of 15, it is between _size and _capacity. At this time, the space is enough, and only need to fill in the characters for initialization coverage;
  • For the case of 25, it is beyond the capacity of _capacitt. At this time, the capacity needs to be expanded, and then the characters are initialized;
  • For the above 15 and 25 cases, we will deal with them uniformly. The reserve() expansion function we implemented above will detect whether the given space exceeds the original space size of _capacity, so the above two cases directly perform reserve() first. Detect the expansion, and then perform the initialization assignment operation;

Code:

		// resize 删除或扩容添加函数
		void resize(size_t n, char ch = '\0')
		{
			if (n < _size)
			{
				_size = n;
				_str[_size] = '\0';
			}
			else
			{
				// 先检测 扩容
				reserve(n);

				for (int i = _size; i < n; i++)
				{
					_str[i] = ch;
				}
				_size = n;
				_str[_size] = '\0';
			}
		}

I/O (stream insertion and stream extraction) (operator<< and operator>>)

 Note 1 : The ostream return value and ostream parameters returned by the above two functions must use references . If you do not use references, an error will be reported when using them; because ostream has performed an anti-copy operation:

 When defining, let the copy constructor of ostream = delete to achieve anti-copy operation; if the above function is not returned as a reference, if it is returned by ostream alone, a copy will be generated and a temporary object will be generated, so an error will be reported! ! !

example:

ostream operator<< (ostream out, string& str)
{
	 方式一
	//for (int i = 0; i < str.size(); i++)
	//{
	//	out << str[i];
	//}

	// 方式二
	for (auto ch : str)
	{
		out << ch;
	}

	return out;
}

 If the above example is not referenced, an error will be reported:

 Note 2 : This function is best defined globally, and it is not recommended to define it as a member function, because the first parameter of the member function is fixed as the this pointer of the current object. In this case, the realized stream output and so on are formatted The format we often use does not conform to the standard, so we expect parameters like ostream as the first parameter of the function, and this format is more suitable.

Note 3 : The above also mentioned that it should be defined globally, but we used the namespace to distinguish it from the official String class, so this global function should be defined in the namespace, outside the String class, that is, in In the global scope of the namespace; if you accidentally define this global function in the largest global outside the namespace, then the compiler will think that this is a string in the STL.

 operator<<(stream insertion)

 You can use elements to solve the problem of private members, but it is generally not recommended to use elements. Use the operator[] function or iterator to access as follows:

	ostream& operator<< (ostream& out, string& str)
	{
		 方式一
		//for (int i = 0; i < str.size(); i++)
		//{
		//	out << str[i];
		//}

		// 方式二
		for (auto ch : str)
		{
			out << ch;
		}

		return out;
	}

 There is actually a difference between the printing here and the printing using the c_str() function. In most cases, the printing results of the two are the same, but in some special cases, they are different, as shown in the following example :

 The c_str() function prints a string, which is a built-in type. The string is printed with '\0' as the terminator, and the stream input prints as much as it inserts, so it will not encounter '\0' Stopped; and the above did not print out '\0' is a problem with the VS version, the above uses VS2019, if it is VS2013, it will print a between "hello world" and "!!!!!!!!!" space.

So this also leads to a problem. The strcpy function  in the C library function also uses '\0' as the stop symbol . If the valid characters of the string contain '\0' , then this function will Something went wrong, so we should use the memcpy() function.

  operator>> (stream extraction)

 The stream extraction (input) here does not consider spaces. If you need to input a string with spaces like a sentence, you should use the getline() function, which will be implemented below.

Therefore, you only need to extract one character from the buffer at a time and then += into the string class.

Moreover, if we use stream extraction to write data to the string array in the string class object, what we want is to overwrite, rather than tail insertion like the operator+= function, we consider using the clear() function to clear the string to clear. 

question:

	istream& operator>> (istream& in, string& str)
	{
		char ch;
		in >> ch;
		while (ch != ' ' && ch != '\n')
		{
			str += ch;
			in >> ch;
		}

		return in;
	}

The above code will fall into an infinite loop . This is because the stream extraction uses spaces or newlines (\n) to identify  different characters or strings . That is to say, the istream stream extraction itself will not read spaces and strings. Newline (\n) , so the above ch cannot be assigned as a space and newline (\n), so the loop will not stop.

 The solution is that  there is an interface get() in the stream extraction of istream , which reads only one character at a time by default , regardless of whether the character is a space or a newline .

code show as below:

	istream& operator>> (istream& in, string& str)
	{
        str.clear();
		char ch = in.get();
		while (ch != ' ' && ch != '\n')
		{
			str += ch;
			ch = in.get();
		}

		return in;
	}

The above code can also be optimized. We found that we insert characters into the str object by using the function operator+=, and the above is inserted in the form of characters one by one, which will cause a problem. When we When the input string is very long, using the operator+= function will expand the capacity many times. Although the impact is not great, it is still not very good for this code, so we perform the following optimizations:

  • Method 1: We can resercve() to open up a space in advance, so that the problem can be solved to a certain extent, but a small part of the problem can be solved in this way. Suppose we open up 1024 spaces first. If we only need 10 strings now, then The subsequent 1014 spaces are wasted. If the space we need is several times that of 1024, then 1024 is not enough, and we still need to open up a few more spaces, so we do not adopt this solution, and it is not suitable for many scenarios.
  • Method 2: Create a temporary array. The size of this array can be specified by yourself. Here we specify the size as 128 characters; it is equivalent to dividing the string we input into many groups with a group of 127 valid characters. When a group After the characters are filled, fill in the string array in the str object, thus avoiding many expansion operations in operator+=.

 What needs to be optimized is that the above-mentioned stream extraction of our implementation will stop when encountering a newline or a space, so if there is a space or a newline before we enter a valid character, then it will stop directly, but in the official string. The stream extraction will clear the previous spaces and newlines, so we add a loop to delete the spaces and newlines before the valid characters.

Final code implementation:

istream& operator>> (istream& in, string& str)
	{
		str.clear();
		char ch = in.get();

		while (ch == ' ' || ch == '\0')
		{
			ch = in.get();
		}

		char Buff[128];
		int i = 0;

		while (ch != ' ' && ch != '\n')
		{
			Buff[i++] = ch;

			if (i == 127)
			{
				Buff[i] = '\0';
				str += Buff;
				// 重置i
				i = 0;
			}
			ch = in.get();
		}

		// 如果此时 i 不是0,说明Buff 当中还有字符没有 += 完
		if (i != 0)
		{
			Buff[i] = '\0';
			str += Buff;
		}

		return in;
	}

Comparison of size

 operator<

 The size of string is not compared according to the length, but according to the ascll code, such as str1 = "bb"; str2 = "aaa"; then if str2 > str1.

The implementation here can be realized directly by using the high strcmp() library function in C, but the same problem as above will still occur. If there is '\0' in the effective characters, then there will be problems, so we should use memcmp( ).

However, memcmp() still has problems. In the following two cases, the number of characters in the two strings is not equal, which is quite troublesome:

 So you still have to implement it yourself, in fact, it is not difficult to implement it yourself:

  • The two strings go together, if the ascll value of the current character of which string is larger, then it is larger, and vice versa; if the characters of the current two strings are equal, then continue to go backwards.
  • From then on to the end, there are two situations, one is that the characters of the two strings are equal, and the number of characters is also equal, then the two strings are equal; the other is the two situations mentioned above, the short string goes It's over, the long string is not finished, so the long string is the big one;

 Code:

		int operator< (const string& str)
		{
			//return strcmp(_str, str._str) < 0;

			size_t i1 = 0;
			size_t i2 = 0;

			while (i1 < _size && i2 < str._size)
			{
				if (_str[i1] < str._str[i2])
				{
					return true;
				}
				else if (_str[i1] > str._str[i2])
				{
					return false;
				}
				else
				{
					++i1;
					++i2;
				}
			}

			/*if (i1 == _size && i2 != str._size)
			{
				return true;
			}
			else
			{
				return false;
			}*/
			// 或者
			
			// return _size < str._size;

			// 或者

			return i1 == _size && i2 != str._size;
		}

The code implemented by multiplexing the memcmp() function is as follows:

		int operator< (const string& str)
		{
			int Mybool = memcmp(_str, str._str, _size < str._size ? _size : str._size);

			return Mybool == 0 ? _size < str._size : Mybool < 0;
		}

operator== / <= / > / >= / !=

 After writing one, the latter is simple and can be reused directly:

		bool operator== (const string& str) const
		{
			return _size == str._size &&
				memcmp(_str, str._str, _size) == 0;
		}

		bool operator<= (const string& str) const
		{
			return *this < str || *this == str;
		}

		bool operator> (const string& str) const
		{
			return !(*this <= str);
		}

		bool operator>= (const string& str) const
		{
			return !(*this < str);
		}

		bool operator!= (const string& str) const
		{
			return !(*this == s);
		}

operator= 

 There are two types of copy, one is shallow copy and the other is deep copy:

  • Shallow copy is a value copy, which is a direct copy. In this case, if it is only some built-in types, there is no problem. However, if it is a pointer to a space for shallow copy, then this pointer only points to a new space, then the original pointer points to If the space is not found, a memory leak will occur;
  • Deep copy is the above example. Suppose you want to copy another space, open up a new space. The size of this space is the same as that of another space, and then copy the value in another fast space to the new space, and then Then let the pointer point to this new space, which is a deep copy.
  • For deep copy, the above description is only one case, that is, when the space to be copied is larger than the original space; in fact, there are two other cases, one is that the copied space is smaller than the original space, then copy directly, but for optimization, The extra space needs to be released, so it is still necessary to open a smaller space and then assign it; the other is that the copied space is equal to the original space, then directly assign the space.

 So to sum up, it is recommended to directly release the original space when the space size is equal, and then assign a value.

		string& operator= (string& str)
		{
			if (this != &str)
			{
				char* tmp = new char[str._capacity + 1];
				memcpy(tmp, str._str , str._size + 1);
				delete[] _str;
				_str = tmp;

				_size = str._size;
				_capacity = str._capacity;
			}

			return *this;
		}

The above is the assignment operator overload function of deep copy .

In fact, there is a better way of writing , first look at the following code:

string& operator= (const string& str)
		{
			if (this != &str)
			{
				string tmp(str);

				std::swap(_str, tmp._str);
				std::swap(_size, tmp._size);
				std::swap(_capacity, tmp._capacity);
			}
			return *this;
		}

As above, create a new object that is the same as str (calling the copy constructor)-tmp, and then exchange the array pointers of the tmp and this objects with all members, and then you can realize the raw copy. This is a wonderful way of writing , look at the picture below:

 As shown above, tmp has opened up a new space, s1 thought, anyway, the life cycle of your tmp is in this function, after this function, tmp needs to call the destructor to release the space, then s1 will use his space To tmp, tmp will give s1 the newly opened and assigned space with s3, and then tmp will release the original space of s1 when it finally releases the space, which is equivalent to tmp releasing the original space for s1.

 Note : In the assignment operator overload function, it cannot be written as follows:

  string& operator= (const string& str)
    {
        string tmp(str);
        std::swap(tmp , *this);

        return *this;
    }

Writing as above will cause a recursive infinite loop.

 At this time, when the two parameters of the swap() function are objects, it calls the operator= assignment operator overload function, as shown below:

 The above code will jump back and forth between swap and operator= , resulting in a recursive infinite loop .

 So when using swap as above, you still need to implement the swap() function yourself:

Just like the swap of the above string class is implemented as follows:

class string
{
`````````````

		void swap(string& str)
		{
			std::swap(_str, str._str);
			std::swap(_size, str._size);
			std::swap(_capacity, str._capacity);
		}

````````````
};

According to the swap() function implemented above, the above operator= function can be optimized as follows:

		string& operator= (string& str)
		{
			swap(str);

			return *this;
		}

The above is equivalent to exchanging all members of the two string objects .

		string& operator= (string str)
		{
			swap(str);

			return *this;
		}

The above is the same as before, it is copy by value, and a temporary object needs to be created, that is, this str is a local variable, and the life cycle of the local variable ends when it leaves the scope of this function, which is equivalent to str helping s1 (*this) The original space of s1 is released.

 Based on the above method, the optimization of the copy constructor

Based on the above optimization, we can optimize the copy constructor, and we can also use the same method as above in the copy constructor to let the compiler free up space for us:

		string(const string& s)
		{
			string tmp(s._str);
			// 让tmp来帮s对象释放掉原来的空间
			swap(tmp);
		}

  The above is an optimization of the copy constructor, but there are still problems with the above:

The above code will report an error in a general compiler. Generally speaking, the compiler will not automatically initialize the built-in types in the class. If you see initialization in some environments, it belongs to the optimization of this compiler. , but not all compilers do this, so we dare not rely on the compiler to initialize automatically, we need to initialize manually.

The above-mentioned this object is not manually initialized. For the built-in types (_size and _capacity), it is a random value. Then tmp exchanges an object (_size and _capacity) that is a random value. After the function ends, tmp The problem arises when the destructor delete is called to free up space.

So we should write like this:

		string(const string& s)
			:_str(nullptr)
			,_size(0)
			,_capacity(0)
		{
			string tmp(s._str);
			// 让tmp来帮s对象释放掉原来的空间
			swap(tmp);
		}

 It should be noted that the above is a tmp object constructed directly with s._str, so there will be problems in the following situations:

"hello\0worle"

 For the above string, he can only copy hello. So if it is used in the above situation, it is recommended to use the copy constructor used before.

 The complete code of the string class
 

#pragma once
#include<assert.h>

namespace string_begin
{
	class string
	{

	public:
		typedef char* iterator;
		typedef const char* const_iterator;

		iterator begin()
		{
			return _str;
		}

		iterator end()
		{
			return _str + _size;
		}

		iterator begin() const
		{
			return _str;
		}

		iterator end() const
		{
			return _str + size();
		}
		// 构造函数
		//string()
		//	:_str(new char[1]),
		//	_size(0),
		//	_capacity(0)
		//{
		//	_str[0] = '\0';
		//}

		//string(const char* str)
		//	: _str(new char[strlen(str) + 1]),
		//	_size(strlen(str)),
		//	_capacity(strlen(str))
		//{
		//	strcpy(_str, str);
		//}
		
		//string(const char* str = '\0') // 错误写法
		//string(const char* str = nullptr) // 错误写法
		//string(const char* str = "\0") // 可以但是没有必要下面更好
		string(const char* str = "")
			: _str(new char[strlen(str) + 1]),
			_size(strlen(str)),
			_capacity(strlen(str))
		{
			memcpy(_str, str, _size + 1);
		}

		// 拷贝构造函数(深拷贝)
		//string(const string& s)
		//{
		//	_str = new char[s._capacity + 1];
		//	//strcpy(_str, s._str);
		//	memcpy(_str, s._str, s._size + 1);
		//	_capacity = s._capacity;
		//	_size = s._size;
		//}

		string(const string& s)
			:_str(nullptr)
			,_size(0)
			,_capacity(0)
		{
			string tmp(s._str);
			// 让tmp来帮s对象释放掉原来的空间
			swap(tmp);
		}

		// 析构函数
		~string()
		{
			delete _str;
			_str = nullptr;
			_size = _capacity = 0;
		}

		// 返回c语言形式的字符串
		const char* c_str() const
		{
			return _str;
		}

		// 返回有效的字符个数
		size_t size() const
		{
			return _size;
		}

		// 下标+引用返回的运算符重载函数
		// 要提供两个版本,一个是非const对象的,一个是const对象的
		char& operator[](size_t pos)
		{
			assert(pos < _size);

			return _str[pos];
		}

		const char& operator[](size_t pos) const
		{
			assert(pos < _size);

			return _str[pos];
		}

		// 扩容函数
		void reserve(size_t n)
		{
			if (n > _capacity)
			{
				char* tmp = new char[n + 1];
				memcpy(tmp, _str, _size + 1);
				delete[] _str;
				_str = tmp;
				_capacity = n; 
			}
		}

		// 增
		void push_back(char ch)
		{
			// 如果有效字符个数超过了 容量
			if (_size >= _capacity)
			{
				// 2倍扩容
				reserve(_capacity == 0 ? 4 : _capacity * 2);
			}
			_str[_size] = ch;
			_size++;
			_str[_size] = '\0';
		}

		void append(const char* str)
		{
			size_t len = strlen(str);
			if (len + _size > _capacity)
			{
				// 扩容
				reserve(len + _size);
			}
			//strcpy(_str + _size, str);
			memcpy(_str + _size, str, len + 1);
			_size += len;
		}

		// += 尾插字符
		string& operator+=(char ch)
		{
			push_back(ch);
			return *this;
		}

		string& operator+=(const char* str)
		{
			append(str);
			return *this;
		}

		// 指定位置插入一个或多个字符
		void insert(size_t pos, size_t n, char ch)
		{
			assert(pos <= _size);

			if (_size + n >= _capacity)
			{
				// 扩容
				reserve(_size + n);
			}

			//往后挪动数据
			size_t end = _size;
			while (pos <= end && end != npos)
			{
				_str[end + n] = _str[end];
				--end;
			}

			// 覆盖值
			for (size_t i = 0; i < n; i++)
			{
				_str[pos + i] = ch;
			}

			_size += n;
		}

		// 指定位置插入一个字符串
		void insert(size_t pos, const char* str)
		{
			assert(pos <= _size);
			size_t len = strlen(str);

			if (_size + len >= _capacity)
			{
				// 扩容
				reserve(_size + len);
			}

			//往后挪动数据
			size_t end = _size;
			while (pos <= end && end != npos)
			{
				_str[end + len] = _str[end];
				--end;
			}

			// 覆盖值
			for (size_t i = 0; i < len; i++)
			{
				_str[pos + i] = *(str + i);
			}

			_size += len;
		}

		// 从pos位置删除一个或多个字符
		void erase(size_t pos, size_t len = npos)
		{
			assert(pos <= _size);

			if (len == npos || pos + len >= _size)
			{
				_str[pos] = '\0';
				_size = pos;
			}
			else
			{
				size_t end = pos + len;
				while (end <= _size)
				{
					_str[pos++] = _str[end++];
				}
				_size -= len;
			}
		}

		// 从pos位置往后查找一个字符
		size_t find(char ch, size_t pos = 0)
		{
			assert(pos <= _size);
			for (size_t i = pos; i <= _size; i++)
			{
				if (_str[i] == ch)
				{
					return i;
				}
			}

			return npos;
		}

		// 从pos位置开始查找一个字符串
		size_t find(const char* str, size_t pos = 0)
		{
			assert(pos <= _size);
			const char* ptr = strstr(_str, str);
			if (ptr)
			{
				return ptr - _str;
			}
			else
			{
				return npos;
			}
		}

		// 从pos位置开始,从字符串数组当中取出len个字符的字符串,返回string类
		string substr(size_t pos, size_t len = npos)
		{
			assert(pos < _size);

			size_t n = len;
			if (len == npos || len + pos >= _size)
			{
				n = _size - pos;
			}

			string tmp;
			tmp.reserve(n);
			for (int i = pos; i < pos + n; i++)
			{
				tmp += _str[i];
			}

			return tmp;
		}

		// resize 删除或扩容添加函数
		void resize(size_t n, char ch = '\0')
		{
			if (n < _size)
			{
				_size = n;
				_str[_size] = '\0';
			}
			else
			{
				// 先检测 扩容
				reserve(n);

				for (int i = _size; i < n; i++)
				{
					_str[i] = ch;
				}
				_size = n;
				_str[_size] = '\0';
			}
		}

		// 清除所有有效字符
		void clear()
		{
			_str[0] = '\0';
			_size = 0;
		}

		// 比较string大小
		//int operator< (const string& str) const
		//{
		//	//return strcmp(_str, str._str) < 0;

		//	size_t i1 = 0;
		//	size_t i2 = 0;

		//	while (i1 < _size && i2 < str._size)
		//	{
		//		if (_str[i1] < str._str[i2])
		//		{
		//			return true;
		//		}
		//		else if (_str[i1] > str._str[i2])
		//		{
		//			return false;
		//		}
		//		else
		//		{
		//			++i1;
		//			++i2;
		//		}
		//	}

		//	/*if (i1 == _size && i2 != str._size)
		//	{
		//		return true;
		//	}
		//	else
		//	{
		//		return false;
		//	}*/
		//	// 或者
		//	
		//	// return _size < str._size;

		//	// 或者

		//	return i1 == _size && i2 != str._size;
		//}

		// < 当中复用 memcmp
		bool operator< (const string& str) const
		{
			int Mybool = memcmp(_str, str._str, _size < str._size ? _size : str._size);

			return Mybool == 0 ? _size < str._size : Mybool < 0;
		}

		bool operator== (const string& str) const
		{
			return _size == str._size &&
				memcmp(_str, str._str, _size) == 0;
		}

		bool operator<= (const string& str) const
		{
			return *this < str || *this == str;
		}

		bool operator> (const string& str) const
		{
			return !(*this <= str);
		}

		bool operator>= (const string& str) const
		{
			return !(*this < str);
		}

		bool operator!= (const string& str) const
		{
			return !(*this == str);
		}

		void swap(string& str)
		{
			std::swap(_str, str._str);
			std::swap(_size, str._size);
			std::swap(_capacity, str._capacity);
		}

		// 复制操作符重载函数
		//string& operator= (string& str)
		//{
		//	if (this != &str)
		//	{
		//		char* tmp = new char[str._capacity + 1];
		//		memcpy(tmp, str._str , str._size + 1);
		//		delete[] _str;
		//		_str = tmp;

		//		_size = str._size;
		//		_capacity = str._capacity;
		//	}

		//	return *this;
		//}

		//string& operator= (const string& str)
		//{
		//	if (this != &str)
		//	{
		//		string tmp(str);

		//		//std::swap(_str, tmp._str);
		//		//std::swap(_size, tmp._size);
		//		//std::swap(_capacity, tmp._capacity);
		//		swap(tmp);

		//	}
		//	return *this;
		//}

		string& operator= (string str)
		{
			swap(str);

			return *this;
		}

	protected:
		char* _str;
		size_t _size;
		size_t _capacity;

		size_t static npos;
	};
	size_t string::npos = -1;

	ostream& operator<< (ostream& out, string& str)
	{
		 方式一
		//for (int i = 0; i < str.size(); i++)
		//{
		//	out << str[i];
		//}

		// 方式二
		for (auto ch : str)
		{
			out << ch;
		}

		return out;
	}

	istream& operator>> (istream& in, string& str)
	{
		str.clear();
		char ch = in.get();

		while (ch == ' ' || ch == '\0')
		{
			ch = in.get();
		}

		char Buff[128];
		int i = 0;

		while (ch != ' ' && ch != '\n')
		{
			Buff[i++] = ch;

			if (i == 127)
			{
				Buff[i] = '\0';
				str += Buff;
				// 重置i
				i = 0;
			}
			ch = in.get();
		}

		// 如果此时 i 不是0,说明Buff 当中还有字符没有 += 完
		if (i != 0)
		{
			Buff[i] = '\0';
			str += Buff;
		}

		return in;
	}
}


 copy-on-write (delayed copy)

 We have implemented and explained both deep copy and shallow copy above. Assuming that there is an s1 object now, s2 is constructed according to the copy of s1. According to the above implementation, the copy constructor is realized by deep copy, but if the use of s2 is actually not So much, I just want to simply copy it out. When s1 is destructed, I expect s2 to be destructed as well; why don't I use shallow copy in this case?

In fact, the reason is very simple, and there are many problems in shallow copy, as described above, shallow copy is just a simple value copy, such as the two objects mentioned above, where the _str array pointer points to the same space, then When two objects are destructed, this space will be destructed twice; and if one of the objects modifies the array, the other object will also be affected;

So at this time, someone invented copy-on-write;

Copy-on-write: Use one more reference count , which represents how many references are currently used to manage this space , such as the above-mentioned s1 and s2, at this time the reference count of the space pointed to by s1 and s2 is 2, in s2 Before the copy is constructed, the reference count is 1;

Use this reference count to count how many reference managements there are in this space. When an object is about to be destructed, first perform the "--" operation on the reference count. If the reference count is not 0, then The operation of freeing space is not performed; when the last reference is destructed, the reference count will be " -- " to 0 at this time, then this space will be released; the simple understanding is: the last person to leave turns off the light ;

 The above solves the problem of destructing twice, and another problem is that the space modification will affect multiple objects, and it is also operated according to the reference count;

When an object wants to modify the space, first check whether the reference count is 1. If it is 1, it means that this space is exclusive to this object, and then you can directly modify it ; if it is not 1, it means that this space is not This object is exclusive, then deep copy is performed, and the space copied from the heap deep copy is modified, and the reference count is -1 at this time ;

 The above deep copied space also has a reference count.

 The realistic copy above is used in g++. But not all compilers use copy-on-write strings. In VS2019, copy-on-write is not used, because copy-on-write is not a C++ standard specification.

In VS2019, some optimizations have been made. When we use the official string to store relatively small characters, it is stored in a buf array and does not open space on the heap. He thinks it is a relatively small string, and there is no It is necessary to open space on the heap and store it directly in an array; when the string is relatively large, it will open space on the heap:

 Although the efficiency is improved, when storing large strings, the array space of buf will be wasted.

Reading defects of realistic copy:

https://coolshell.cn/articles/1443.htmlicon-default.png?t=N6B9https://coolshell.cn/articles/1443.html

 Good article on the string class

 What happened to the STL's string class? _haoel's blog - CSDN blog icon-default.png?t=N6B9https://blog.csdn.net/haoel/article/details/1491219


A correct way of writing string class in C++ interview | Cool Shell - CoolShell icon-default.png?t=N6B9https://coolshell.cn/articles/10478.html

Guess you like

Origin blog.csdn.net/chihiro1122/article/details/131689912