One of the first articles of C++ lets you master the string class (simulation implementation)

insert image description here

1. Why do you want to simulate the implementation of string

insert image description here
Simulating the implementation of std::string is a challenging exercise that pays dividends in many ways, especially for learning C++ and gaining insight into the mechanics of string manipulation and dynamic memory management. Here are some benefits and significance of emulating the implementation of std::string:

  1. Learn C++ memory management : std::string is a container that dynamically allocates memory, and the simulation implementation needs to manually handle memory allocation and release. This can give you a deeper understanding of the principles and mechanisms of dynamic memory management, how to use new and delete operators correctly, and how to avoid memory leaks and dangling pointers.
  2. Exercises on string operations : During the simulation implementation, you need to implement operations such as string concatenation, insertion, deletion, and search, as well as other functions related to string processing. This can help you become familiar with how strings are manipulated and handled in C++.
  3. In-depth understanding of classes and objects : std::string is a class template, and its simulation implementation requires a deep understanding of the concepts of classes and objects, including constructors, destructors, member functions, member variables, etc. By implementing a class like std::string, you can better understand the design and use of the class.
  4. Improve programming skills : Simulating the implementation of std::string is a challenging task, which can exercise your programming skills and make you more proficient in using C++ syntax and features.
  5. In-depth learning of template programming : std::string is a class template, and its simulation implementation can help you deeply understand the mechanism and techniques of template programming.
  6. Implementing a custom container : std::string is a container class in the C++ standard library, and mocking it is an exercise in implementing a custom container. Custom containers can help you better understand container design and implementation.

2. What problems should be paid attention to in the simulation implementation of string

Simulating the implementation of the std::string class is a challenging task, because std::string is a complex data type in the C++ standard library, it has many functions and features, and its implementation involves dynamic memory management, string operations , copy semantics, etc. Here are some key issues to keep in mind when doing mock implementations:

  1. Memory management : The std::string class is a container that dynamically allocates memory, and the simulation implementation needs to correctly handle memory allocation and release. You can use dynamic arrays, pointers, or other data structures to simulate dynamic memory management.
  2. String operations : The simulation implementation needs to support operations such as string concatenation, insertion, deletion, and search, as well as other string processing functions (such as size(), substr(), find(), etc.).
  3. Exception handling : std::string may throw exceptions in some cases, such as memory allocation failure or out-of-bounds access.Mock implementations need to consider how to properly handle exceptions to ensure program stability and security.
  4. Memory copy : std::string adopts deep copy (deep copy) semantics, that is, the content of the entire string is copied when copying.Mock implementations need to handle memory copies correctly to avoid dangling pointers and resource leaks.
  5. Iterator support : std::string supports iterators to access the contents of the string, and the mock implementation needs to provide corresponding iterator support.
  6. Performance optimization : The standard implementation of std::string usually optimizes performance, such as adopting expansion strategies to reduce frequent memory allocation.Simulation implementation can consider some optimization strategies to improve performance and efficiency.
  7. Boundary Conditions : When carrying out simulation implementation,Special attention needs to be paid to boundary conditions and special cases to ensure the correctness and robustness of the implementation.
  8. Completeness : The std::string class is a very complex data type, and its functions and interfaces need to be implemented as completely as possible when simulating the implementation.

While simulating the implementation of std::string is a complex task, it's also a good learning exercise to deepen your understanding of C++ memory management, string handling, and more.

3. Classic string problems

The previous article has already given a brief introduction to the string class, as long as you can use it normally. In the interview, the interviewer always likes to let the students simulate and realize the string class by themselves, the most important thing is to realize the construction, copy construction, assignment operator overloading and destructor of the string class. Let's see if there is any problem with the implementation of the following string class?

// 为了和标准库区分,此处使用String
class String
{
    
    
public:
	/*String()
:_str(new char[1])
{*_str = '\0';}
*/
//String(const char* str = "\0") 错误示范
//String(const char* str = nullptr) 错误示范
	String(const char* str = "")
	{
    
    
		// 构造String类对象时,如果传递nullptr指针,可以认为程序非法
		if (nullptr == str)
		{
    
    
			assert(false);
			return;
		}
		_str = new char[strlen(str) + 1];
		strcpy(_str, str);
	}
	~String()
	{
    
    
		if (_str)
		{
    
    
			delete[] _str;
			_str = nullptr;
		}
	}
private:
	char* _str;
};
// 测试
void TestString()
{
    
    
	String s1("hello bit!!!");
	String s2(s1);
}

insert image description here
The above String class does not explicitly define its copy constructor and assignment operator overload. At this time, the compiler will synthesize the default one. When s1 is used to construct s2, the compiler will call the default copy constructor. The final problem is that s1 and s2 share the same memory space, and the same space is released multiple times during release, causing the program to crash. This copy method is called shallow copy .

What is a shallow copy

Shallow copy: Also known as bit copy, the compiler just copies the value in the object . If resources are managed in the object, multiple objects will eventually share the same resource. When an object is destroyed, the resource will be released. At this time, other objects do not know that the resource has been released and think it is still valid, so An access violation occurs while continuing to perform operations on the resource.In fact, we can use deep copy to solve the shallow copy problem, that is, each object has an independent resource and should not be shared with other objects.

What is deep copy

Deep copy means that when an object is copied, not only the member variables of the object itself are copied, but also the dynamically allocated resources (such as heap memory) pointed to by the object are copied to the new object. This means that the copied object and the original object have independent copies of resources and will not affect each other.

When the object contains dynamically allocated resources, such as memory blocks pointed to by pointers, or other dynamically allocated resources (file handles, network connections, etc.), it is very important to perform deep copying to avoid multiple objects sharing the same resource and causing release Problems with repetitions, dangling pointers, etc.

If resource management is involved in a class, its copy constructor, assignment operator overload, and destructor must be explicitly given. In general, it is provided in the form of deep copy.
insert image description here

4. Copy-on-write

"Copy on Write" (COW for short) is an optimization technology that is usually applied to the memory management or data structure of the operating system to save memory and improve performance.In COW, when multiple objects share the same resource, the actual copy operation will only be performed when an object tries to modify the resource content, otherwise all objects share the same original resource. This avoids copying the entire resource before modification, saving memory and execution time.

The most common applications of COW are process management and memory allocation in operating systems. When a process forks (copyes) itself, the COW mechanism is usually used. When fork, the child process will share the same memory space as the parent process, that is, the physical page frame. Only when one of the child processes or the parent process attempts to modify its contents, the operating system will perform the actual copy, copying the contents of the page frame to be modified into a new page frame, making the memory spaces of the two processes independent . In this way, the parent and child processes can share most of the resources without large-scale memory copying, thus improving the efficiency of the fork operation.

In programming languages ​​or data structures, realistic copying can also be used to optimize data structure copy operations. For example,In some container classes (such as strings, arrays, vectors, etc.), when multiple objects share the same data, the actual copy operation will only be performed when one of the objects tries to modify the data, ensuring data are independent of each other.

It should be noted that COW is not a general optimization technique applicable to all situations, and its effectiveness depends on specific application scenarios.In some cases, COW can bring significant performance benefits, but in others, it can add complexity and overhead. Therefore, it is necessary to carefully weigh the pros and cons during implementation, and choose an appropriate optimization strategy according to actual needs.

In fact, copy-on-write is a kind of procrastination, which is realized by adding reference counting on the basis of shallow copy.

Reference count : used to record the number of resource users. When constructing, the count of the resource is given as 1. Every time an object is added to use the resource, the count is increased by 1. When an object is destroyed, the count is first decremented by 1, and then check whether the resource needs to be released. If The count is 1, indicating that the object is the last user of the resource, and the resource is released; otherwise, it cannot be released because there are other objects using the resource.

A common example is a literal copy of a string.

In many programming languages, strings are usually immutable (immutable), that is, once created, their contents cannot be modified. In this case, when multiple variables or objects refer to the same string, if one of the variables tries to modify the contents of the string, a new string object needs to be created instead of directly modifying the original string .

Suppose there are two variables str1 and str2 both pointing to the same string "Hello":

std::string str1 = "Hello";
std::string str2 = str1;

it's here,str2 is created from str1 via copy constructor. In the traditional copy case, this would result in a copy of the entire string "Hello", ie the two variables str1 and str2 both point to different memory addresses, but their contents are the same.

but,Copy-on-write can optimize this case. In copy-on-write, when str2 copies str1, a new copy of the string is not created immediately. Instead, let str2 and str1 share the same underlying string data.The actual copy operation is only triggered when one of the strings tries to modify its content

For example, if the modification operation is performed on str2 now:

str2[0] = 'h'; // 修改第一个字符为小写 'h'

Under the copy-on-write mechanism, a new string "hello" will be created, and then the content of str2 points to the new string, while the content of str1 remains unchanged. Thus, the two variables str1 and str2 still share the same underlying data, but their contents are no longer the same.

Copy-on-write can effectively save memory, especially in the case of long-term shared strings, avoiding unnecessary memory copying. But in other cases, there may be added complexity and overhead. Therefore, it is necessary to carefully weigh the pros and cons during implementation, and choose an appropriate optimization strategy according to actual needs.

5. The traditional version of the String class (reference)

class String
{
    
    
public:
	String(const char* str = "")
	{
    
    
		// 构造String类对象时,如果传递nullptr指针,可以认为程序非法
		if (nullptr == str)
		{
    
    
			assert(false);
			return;
		}
		_str = new char[strlen(str) + 1];
		strcpy(_str, str);
	}
	String(const String& s)
		: _str(new char[strlen(s._str) + 1])
	{
    
    
		strcpy(_str, s._str);
	}
	String& operator=(const String& s)
	{
    
    
		if (this != &s)
		{
    
    
			char* pStr = new char[strlen(s._str) + 1];
			strcpy(pStr, s._str);
			delete[] _str;
			_str = pStr;
		}
		return *this;
	}
	~String()
	{
    
    
		if (_str)
		{
    
    
			delete[] _str;
			_str = nullptr;
		}
	}
private:
	char* _str;
};

6. The modern version of the String class (reference)

class String
{
    
    
public:
	String(const char* str = "")
	{
    
    
		if (nullptr == str)
		{
    
    
			assert(false);
			return;
		}
		_str = new char[strlen(str) + 1];
		strcpy(_str, str);
	}
	String(const String& s)
		: _str(nullptr)
	{
    
    
		String strTmp(s._str);
		swap(_str, strTmp._str);
	}

	String& operator=(String s)
	{
    
    
		swap(_str, s._str);
		return *this;
	}
	/*
	String& operator=(const String& s)
	{
	if(this != &s)
	{
	String strTmp(s);
	swap(_str, strTmp._str);
	}
	return *this;
}
*/
	~String()
	{
    
    
		if (_str)
		{
    
    
			delete[] _str;
			_str = nullptr;
		}
	}
private:
	char* _str;
};

7. Simulation implementation of string class (explain)

According to the content and knowledge mentioned above, we can implement the string class framework and most of the interface functions, but in the actual interview, we may not need to implement many functions, so we only use the most common and commonly used ones here Partially simulated implementation.

7.1 Member variable definition of namespace string class

namespace mystring
{
    
    
	class string
	{
    
    
	public:
		//...

	private:
		size_t _capacity;
		size_t _size;
		char* _str;
	public:
		// const static 语法特殊处理
		// 直接可以当成定义初始化
		const static size_t npos = -1;
}

First, we redefine a namespace to prevent redefinition with the string class in the library, or rewrite a string class with another name. The class members include capacity, size and string str. npos is defined as public and initialized.

7.2 String class constructor

string(const char* str = "")
		{
    
    
			_size = strlen(str);
			_capacity = _size;
			_str = new char[_capacity + 1];

			strcpy(_str, str);
		}

const char* str = "" is the default parameter of the constructor. A default parameter is a feature that provides a default value for a function parameter in a function declaration. It allows the default value to be used as the value of the parameter if the corresponding parameter value is not provided when the function is called, actually including a '\0' , allocate enough memory space to store the string (_size + 1, where _size is the length of the input string), and then copy the input C-style string to the _str member variable through the strcpy function.

7.3 String class copy constructor

Traditional writing:

string(const string& s)
		:_str(new char[s._capacity+1])
		, _size(s._size)
		, _capacity(s._capacity)
	{
    
    
		strcpy(_str, s._str);
	}

Modern writing:

void swap(string& tmp)
	{
    
    
		::swap(_str, tmp._str);
		::swap(_size, tmp._size);
		::swap(_capacity, tmp._capacity);
	}


string(const string& s)
	:_str(nullptr)
	, _size(0)
	, _capacity(0)
{
    
    
	string tmp(s._str);
	swap(tmp); //this->swap(tmp);
}

first codeIn , the copy constructor uses the traditional deep copy method. itFirst allocates a memory space of the same size as the source object(s) (including the trailing null character), then copies the contents of the source object into the newly allocated memory space

This implementationEnsure that the newly created object and the source object have independent memory spaces, that is, they do not share resources. In this way, when one object modifies its content, another object will not be affected, thereby ensuring data isolation between objects.

while inThe second piece of codeIn , the copy constructor uses the move semantics introduced by C++11. It first creates a temporary object named tmp and initializes the temporary object with s._str. Then, the members of the current object and the members of the temporary object are exchanged by calling the member function swap(tmp).

The implementation of the swap function will make the members of the current object point to the memory space of the temporary object, and the members of the temporary object point to the memory space before the current object. Thus,The original resources are exchanged, the temporary object will release the original resources of the current object when it is destroyed, and the current object owns the resources of the s object

This implementation is achieved byAvoids unnecessary memory copies, thus improving the performance of the copy constructor. When tmp is destroyed as a temporary object, it will automatically release the resources of the original s object, so there is no memory leak.

Both implementations are valid copy constructors, but the second implementation takes advantage of move semantics, which can improve performance by avoiding unnecessary memory copies when copying objects. In C++11 and above, the second implementation is recommended.

7.4 String class assignment operator overloading

Traditional writing:

string& operator=(const string& s)
{
    
    
	if (this != &s)
	{
    
    
	string tmp(s);
	swap(tmp); 
	}
	return *this;
}

Modern writing:

string& operator=(string s)
{
    
    
	swap(s);
	return *this;
}

existfirst functionIn , the assignment operator uses the traditional deep copy method. itFirst check whether the target object and the source object are the same object (address comparison), if they are the same object, the assignment operation will not be performed, avoiding the situation of self-assignment

then itCreates a temporary string object tmp and copies the contents of the source object s into tmp. Next, swap the members of the current object with the members of the temporary object by calling swap(tmp)

In this way, the original resources are exchanged, the current object owns the resources of the s object, and the temporary object will automatically release the original resources of the current object when it is destroyed. In this way, the assignment operation is realized, and unnecessary memory copying is avoided during the assignment.

existsecond functionIn , the assignment operator uses the move semantics introduced in C++11. Its parameter is a string object s, which is passed by value here, that is, the parameter is passed by copying the value.

Inside the function, it directly exchanges the members of the current object with the members of the parameter s object through swap(s). Since the parameter s is passed by value, it means that a copy constructor will be executed to create a copy of the s object when the function is called, so in swap(s), the resources of the s object are exchanged to the current object, and the temporary object s will automatically destroy and release the original resources of the current object at the end of the function.

so,Assignment operations are implemented through move semantics, and unnecessary memory copies are avoided during assignments

Summary of differences:

Parameter passing method : the first function uses constant reference passing, while the second function uses value passing.
Copy control technology : The first function uses deep copy and exchange resources, while the second function uses move semantics and swap operations to avoid copying.

Both can correctly implement assignment operations and avoid unnecessary memory copies. However, the second function is preferred in C++11 and above because it takes advantage of move semantics and is more performant. If your code environment supports C++11 or later, it is recommended to give priority to using the second implementation.

7.5 String class destructor and easy-to-implement member functions

~string()
{
    
    
	delete[] _str;
	_str = nullptr;
	_size = _capacity = 0;
}

The destructor here releases the dynamically allocated character array (string memory) pointed to by _str through the delete[] operation, then sets _str to nullptr, and sets _size and _capacity to 0. This ensures that the memory is properly released when the object is destroyed, preventing memory leaks.

const char* c_str() const
{
    
    
	return _str;
}

The c_str() function is used to return a pointer to a character array that stores a string.

size_t size() const
{
    
    
	return _size;
}

The size() function is used to return the size of the string, that is, the number of characters actually stored in the string, and the return type is size_t. The _size member variable here indicates the number of characters actually stored, so just return _size directly.

size_t capacity() const
{
    
    
	return _capacity;
}

The capacity() function is used to return the capacity of the string, that is, the size of the memory space currently allocated in the string, and the return type is size_t. The _capacity member variable here represents the current capacity, so just return _capacity directly.

const char& operator[](size_t pos) const
{
    
    
	assert(pos < _size);

	return _str[pos];
}

operator[](size_t pos) const is the const version of the subscript operator overload function, used to access the character at the specified position pos in the string. The return type of the function is const char&, which means that what is returned is a reference to a constant character, that is, it is not allowed to modify the character content through this reference. This is to ensure the immutability of strings.

char& operator[](size_t pos)
{
    
    
	assert(pos < _size);

	return _str[pos];
}

operator[](size_t pos) is a non-const version of the subscript operator overload function, the function is similar to the const version above, but the return type of this function is char&, which means that the return is a reference to a modifiable character, allowing the character content to be modified through the reference .

void clear()
		{
    
    
			_str[0] = '\0';
			_size = 0;
		}

The clear function is used to clear the string, that is, set all the contents of the string to empty, set the actual size _size to 0, and set the first character of the character array (that is, the starting position of the string) to the null character '\ 0' to clear the string content.

7.6 String class reserve function

void reserve(size_t n)
{
    
    
	if (n > _capacity)
	{
    
    
		char* tmp = new char[n + 1];
		strcpy(tmp, _str);
		delete[] _str;

		_str = tmp;
		_capacity = n;
	}
}

`void reserve(size_t n): This is the declaration of the reserve function in the std::string class, indicating that the function will reserve the memory space of n characters. n is the incoming parameter, indicating the number of characters to be reserved.

if (n > _capacity): Here, by comparing the incoming n and the capacity _capacity of the current string, it is judged whether the capacity of the string needs to be increased. Only when the number n of characters to be reserved is greater than the current capacity _capacity, the memory expansion operation is required.
char* tmp = new char[n + 1];: If you need to increase the capacity, first create a new character array tmp with a length of n + 1, that is, the number of reserved characters plus the null character at the end. Here, setting the capacity of the string to n is to reserve an additional location to store the null character at the end.
strcpy(tmp, _str);: Copy the original string content to the newly created character array tmp.
delete[] _str;: Release the dynamically allocated character array pointed to by the original string _str, that is, release the original memory space.
_str = tmp;: Point the original pointer _str to the new character array tmp, so that the memory space of the string is expanded.
_capacity = n;: Update _capacity to the new capacity n.

In this way, when more memory space needs to be reserved, the reserve function will create a new character array, copy the original string content to the new array, then release the original memory space, and point _str to the new Character array, update capacity _capacity to the new reserved value n.

7.7 string class resize function

void resize(size_t n, char ch = '\0')
{
    
    
	if (n > _size)
	{
    
    
		// 插入数据
		reserve(n);
		for (size_t i = _size; i < n; ++i)
		{
    
    
			_str[i] = ch;
		}
		_str[n] = '\0';
		_size = n;
	}
	else
	{
    
    
		// 删除数据
		_str[n] = '\0';
		_size = n;
	}
}

The resize function is used to change the size of a string, that is, to increase or decrease the number of characters in the string. Here is a brief explanation of the implementation of this function:

void resize(size_t n, char ch = '\0'): This is the declaration of the resize function in the std::string class, indicating that the function will change the size of the string to n. n is the parameter passed in, indicating the new string size. The parameter ch is optional, and the default value is '\0', which is used to pad new characters when expanding the size of the string.
if (n > _size): In this conditional branch, it is judged that the string size needs to be increased. If the new size n passed in is greater than the current string size _size, it means that new characters need to be added at the end of the string.
reserve(n);: First call the reserve function to reserve enough memory space to ensure that the string has enough capacity to accommodate the new characters.
for (size_t i = _size; i < n; ++i): Then at the newly added position in the string, add characters cyclically starting from the current string size _size. Here, all the newly added characters are set to ch, which is the second parameter passed in.
_str[n] = '\0';: After the loop ends, set the new end character of the string to the null character '\0' to ensure that the new string is correctly terminated.
_size = n;: Finally update the size _size of the string to the new size n.
else: In this conditional branch, handle the case where the size of the string needs to be reduced. If the new size n passed in is smaller than the current string size _size, it means that redundant characters in the string need to be deleted.
_str[n] = '\0';: Set the new end character of the string to the null character '\0' to ensure that the new string is properly terminated.
_size = n;: Finally update the size _size of the string to the new size n.

This way, the resize function can expand or shrink the size of the string, adding or removing characters as necessary, depending on the size n passed in.

7.8 string class insert function, append function, push_back function, += overload

The simulation implementation of the insert function
insert mainly implements two
character insertions: character and string insertion

string& insert(size_t pos, char ch)
{
    
    
	assert(pos <= _size);

	// 满了就扩容
	if (_size == _capacity)
	{
    
    
		reserve(_capacity == 0 ? 4 : _capacity * 2);
	}

	size_t end = _size + 1;
	while (end > pos)
	{
    
    
		_str[end] = _str[end - 1];
		--end;
	}

	_str[pos] = ch;
	++_size;

	return *this;
}

The insert function inserts a character at a specified position in a string. Here is a brief explanation of the implementation of this function:

string& insert(size_t pos, char ch): This is the declaration of the insert function in the std::string class, indicating that the function will insert the character ch at the specified position pos. pos is the incoming parameter, indicating the index of the insertion position; ch is the character to be inserted.
assert(pos <= _size);: Use the assert assertion to ensure that the insertion position pos does not exceed the actual size of the string _size. If the assertion fails (pos is greater than _size), an assertion failure error will be triggered to help debug find the location of the error.
if (_size == _capacity): Check if the current string is full (ie _size equals _capacity). If the string is full, it needs to be expanded to ensure there is enough capacity to insert new characters. The reserve function is used here to expand the capacity so that the string has enough capacity to accommodate new characters.
size_t end = _size + 1;: Before inserting a character, move the end position of the string (behind the actual number of characters _size) backward by one position to make space for the new character. This is done to move back the character after the insertion position pos.
while (end > pos): Through a loop, the characters after the insertion position pos are moved backward one position in turn.
_str[pos] = ch;: Insert the character ch into the specified insertion position pos.
++_size;: After inserting characters, increase the actual size of the string _size by 1.
return *this;: Return the reference of the current std::string object to support chain calls.

string insertion

string& insert(size_t pos, const char* str)
{
    
    
	assert(pos <= _size);
	size_t len = strlen(str);
	if (_size + len > _capacity)
	{
    
    
		reserve(_size + len);
	}

	// 挪动数据
	size_t end = _size + len;
	while (end >= pos + len)
	{
    
    
		_str[end] = _str[end - len];
		--end;
	}

	strncpy(_str + pos, str, len);
	_size += len;

	return *this;
}

Compared with the previous insert function, the parameter str here is a C-style string (const char*) instead of a single character. The function of the function is to insert a C-style string at the specified position in the string. Now to explain the implementation of this function:

string& insert(size_t pos, const char* str): This is the declaration of the insert function in the std::string class, indicating that the function will insert a C-style string str at the specified position pos. pos is the incoming parameter, indicating the index of the insertion position; str is the C-style string to be inserted.
assert(pos <= _size);: Use the assert assertion to ensure that the insertion position pos does not exceed the actual size of the string _size. If the assertion fails (pos is greater than _size), an assertion failure error will be triggered to help debug find the location of the error.
size_t len = strlen(str);: Calculate the length of the C-style string str to be inserted, that is, the number of characters.
if (_size + len > _capacity): Check whether the size of the inserted string exceeds the current capacity _capacity. If so, it needs to be expanded to ensure that there is enough capacity to accommodate the inserted string.
reserve(_size + len);: Call the reserve function to expand the capacity to ensure that there is enough capacity to accommodate the inserted string.
size_t end = _size + len;: Before inserting the string, move the end position of the string (behind the actual number of characters _size) backward by len positions to make room for the new string.
while (end >= pos + len): Through a loop, the characters after the insertion position pos are moved backward by len positions in order to reserve space for the insertion of a new string.
strncpy(_str + pos, str, len);: Use the strncpy function to copy the C-style string str to the specified insertion position pos, and only copy len characters.
_size += len;: After inserting a string, increase the actual size of the string _size by len to reflect the new size after insertion.
return *this;: Return the reference of the current std::string object to support chain calls.

append function

void append(const char* str)
{
    
    
	size_t len = strlen(str);

	// 满了就扩容
	if (_size + len > _capacity)
	{
    
    
		reserve(_size+len);
	}

	strcpy(_str + _size, str);
	//strcat(_str, str); 需要找\0,效率低
	_size += len;
}

The append function is used to add a C-style string at the end of the string. Now to explain the implementation of this function:

void append(const char* str): This is the declaration of the append function in the std::string class, indicating that this function will add a C-style string str at the end of the string. str is the incoming parameter, representing the C-style string to be added.
size_t len = strlen(str);: Calculate the length of the C-style string str to be added, that is, the number of characters.
if (_size + len > _capacity): Check whether the size of the added string exceeds the current capacity _capacity. If so, it needs to be expanded to ensure that there is enough capacity to accommodate the added string.
reserve(_size + len);: Call the reserve function to expand the capacity to ensure that there is enough capacity to accommodate the added string.
strcpy(_str + _size, str);: Use the strcpy function to copy the C-style string str to the end of the string, that is, start copying from the actual character number _size of _str.
_size += len;: After adding a string, increase the actual size of the string _size by len to reflect the new size after adding.

In this way, the append function adds the C-style string str to the end of the string, and memory expansion is performed if necessary.
Of course you can reuse the insert function

void append(const char* str)
{
    
    
	insert(_size, str);
}

push_back function

void push_back(char ch)
{
    
    
	// 满了就扩容
	if (_size == _capacity)
	{
    
    
		reserve(_capacity == 0 ? 4 : _capacity * 2);
	}

	_str[_size] = ch;
	++_size;
	_str[_size] = '\0';
}

The push_back function is used to add a character at the end of the string. Now to explain the implementation of this function:

void push_back(char ch): This is the declaration of the push_back function in the std::string class, indicating that the function will add a character ch at the end of the string. ch is the incoming parameter, representing the character to be added.
if (_size == _capacity): Check if the current string is full (ie _size equals _capacity). If the string is full, it needs to be expanded to ensure that there is enough capacity to accommodate the additional characters. The reserve function is used here to expand the capacity so that the string has enough capacity to accommodate new characters.
_str[_size] = ch;: Add the character ch to the end of the string, that is, add characters at the actual character number _size of _str.
++_size;: The actual size of the string, _size, is incremented by 1 to reflect the new size after the addition.
_str[_size] = '\0';: Add a null character '\0' at the end of the string to ensure that the new string is properly terminated.

In this way, the push_back function adds the character ch to the end of the string and expands the memory if necessary.
Similarly, you can reuse the push_back function for the insert function

void push_back(char ch)
{
    
    
	insert(_size, ch);
}

+=overload

string& operator+=(char ch)
{
    
    
	push_back(ch);
	return *this;
}

string& operator+=(const char* str)
{
    
    
	append(str);
	return *this;
}

operator+= The operator overloads are used to append characters or C-style strings to an existing string. Now to explain the implementation of this function:

string& operator+=(char ch): This is the first version of the operator+= operator overload, which means that the operator will append a character ch to the end of the string. In this version, the push_back function is called directly, adding the character ch to the end of the string.
string& operator+=(const char* str): This is the second version of the operator+= operator overload, indicating that this operator will append a C-style string str at the end of the string. In this version, the append function is directly called to add the C-style string str to the end of the string.

In both implementations, a reference to the current std::string object is returned to support chaining.

7.9 string class erase function

void erase(size_t pos, size_t len = npos)
{
    
    
	assert(pos < _size);

	if (len == npos || pos + len >= _size)
	{
    
    
		_str[pos] = '\0';
		_size = pos;
	}
	else
	{
    
    
		strcpy(_str + pos, _str + pos + len);
		_size -= len;
	}
}

The erase function is used to delete a certain length of characters starting from the specified position from the string. Now to explain the implementation of this function:

void erase(size_t pos, size_t len = npos): This is the declaration of the erase function in the std::string class, indicating that the function will delete a certain length of len characters starting from the specified position pos. pos is the parameter passed in, indicating the index of the starting position of deletion; len is the number of characters to be deleted, and the default value is npos, which means deleting all characters starting from the starting position.
assert(pos < _size);: Use the assert assertion to ensure that the starting position pos of deletion does not exceed the actual size of the string _size. If the assertion fails (pos is greater than or equal to _size), an assertion failure error will be triggered to help debug find the wrong location.
if (len == npos || pos + len >= _size): Check whether to delete all characters starting from the starting position pos (ie len is equal to npos), or if the number of characters to be deleted exceeds the end of the string (ie pos + len is greater than or equal to _size). If it is one of the cases, it means to delete all characters starting from pos or starting from pos to the end.
_str[pos] = '\0'; 和 _size = pos;: In the above case, truncate the string from position pos, that is, set the pos-th character of the character array to the null character '\0', and update the actual size of the string _size to pos to reflect the new value after deletion size.
else: If the number of characters to be deleted is less than the number of characters at the end of the string, the following characters need to be moved forward.
strcpy(_str + pos, _str + pos + len);: Copy the characters starting from position pos + len to position pos, overwriting the characters to be deleted.
_size -= len;: After removing characters, subtract len ​​from the actual size of the string, _size, to reflect the new size after removal.

7.10 string class erase function

size_t find(char ch, size_t pos = 0) const
{
    
    
	assert(pos < _size);

	for (size_t i = pos; i < _size; ++i)
	{
    
    
		if (ch == _str[i])
		{
    
    
			return i;
		}
	}
	return npos;
}

The find function is used to find a specified character or substring in a string and return its position. Now to explain the implementation of this function:

size_t find(char ch, size_t pos = 0) const: This is the first version of the find function in the std::string class, indicating that the function will find the character ch in the string starting at position pos. pos is an incoming parameter, indicating the index of the starting position of the search. The default value is 0, which means searching from the beginning of the string.
assert(pos < _size);: Use the assert assertion to ensure that the starting position pos of the search does not exceed the actual size of the string _size. If the assertion fails (pos is greater than or equal to _size), an assertion failure error will be triggered to help debug find the wrong location.
In this version, a simple loop is used, starting at position pos and traversing the string looking for the presence of the character ch. If found, return the position index of the character; if not found, return npos.

size_t find(const char* sub, size_t pos = 0) const
{
    
    
	assert(sub);
	assert(pos < _size);

	const char* ptr = strstr(_str + pos, sub);
	if (ptr == nullptr)
	{
    
    
		return npos;
	}
	else
	{
    
    
		return ptr - _str;
	}
}

size_t find(const char* sub, size_t pos = 0) const: This is the second version of the find function in the std::string class, indicating that the function will search for the substring sub starting from position pos in the string. sub is the incoming parameter, indicating the substring to be searched; pos is the incoming parameter, indicating the index of the starting position of the search, and the default value is 0, which means searching from the beginning of the string.
assert(sub);: Use the assert assertion to ensure that the incoming substring sub is not a null pointer. If the assertion fails (sub is a null pointer), an assertion failure error will be triggered to help debug find the location of the error.
assert(pos < _size);: Similarly, use the assert assertion to ensure that the starting position pos of the search does not exceed the actual size of the string _size.
In this version, the strstr function is used to find the substring sub in the string, and if found, the position index of the substring is returned; if not found, npos is returned.

7.11 string class substr function

string substr(size_t pos, size_t len = npos) const
{
    
    
	assert(pos < _size);
	size_t realLen = len;
	if (len == npos || pos + len > _size)
	{
    
    
		realLen = _size - pos;
	}

	string sub;
	for (size_t i = 0; i < realLen; ++i)
	{
    
    
		sub += _str[pos + i];
	}

	return sub;
}

The substr function is used to extract a substring from a string, starting at the specified position pos, and optionally specifying the length len of the substring. Now to explain the implementation of this function:

string substr(size_t pos, size_t len = npos) const: This is the declaration of the substr function in the std::string class, indicating that the function will extract a substring starting from the specified position pos, and optionally specify the length len of the substring. pos is the parameter passed in, indicating the index of the starting position of the extracted substring; len is the parameter passed in, indicating the length of the substring to be extracted, and the default value is npos, which means extracting all characters starting from the starting position pos .
assert(pos < _size);: Use the assert assertion to ensure that the starting position pos of the extracted substring does not exceed the actual size of the string _size. If the assertion fails (pos is greater than or equal to _size), an assertion failure error will be triggered to help debug find the wrong location.
size_t realLen = len;: Define a variable realLen to store the actual length of the substring to be extracted. The initial value is the parameter len passed in.
if (len == npos || pos + len > _size): Check whether to extract all characters starting from the starting position pos (ie len is equal to npos), or if the number of characters to be extracted exceeds the end of the string (ie pos + len is greater than or equal to _size). If it is one of the cases, it means to extract all characters from pos or from pos to the end. At this point, update realLen to the number of characters from the beginning to the end of pos. Creates a new std::string object named sub to store the extracted substring. Adds the substring character by character to sub using a loop starting at position pos. Returns the extracted substring sub.

7.12 String class comparison operator overloading

bool operator>(const string& s) const
{
    
    
	return strcmp(_str, s._str) > 0;
}

This is an overloaded version of the greater than operator >, which means that the operator is used to compare the magnitude relationship of the current string with another string s. In this version, the two strings _str and s._str are compared lexicographically using the strcmp function. Returns true if _str is greater than s._str, false otherwise.

bool operator==(const string& s) const
{
    
    
	return strcmp(_str, s._str) == 0;
}

This is an overloaded version of the equals operator ==, which means that the operator compares the current string to another string s for equality. Similarly, use the strcmp function to compare the contents of the two strings _str and s._str for the same. If they are the same, return true, otherwise return false.

bool operator>=(const string& s) const
{
    
    
	return *this > s || *this == s;
}

This is an overloaded version of the greater than or equal operator >=, indicating that the operator is used to compare whether the current string is greater than or equal to another string s. In this version, the defined greater than operator > and equal operator == are used directly to combine. If the current string is greater than s or equal to s, return true, otherwise return false.

bool operator<=(const string& s) const
{
    
    
	return !(*this > s);
}

This is an overloaded version of the less than or equal operator <=, which means that the operator is used to compare whether the current string is less than or equal to another string s. Similarly, directly use the defined greater than or equal operator >= to invert. If the current string is less than s, return true, otherwise return false.

bool operator<(const string& s) const
{
    
    
	return !(*this >= s);
}

This is an overloaded version of the less than operator <, indicating that the operator is used to compare whether the current string is less than another string s. Similarly, directly use the defined greater than or equal operator >= to invert, if the current string is not greater than or equal to s, it means that the current string is less than s, return true, otherwise return false.

bool operator!=(const string& s) const
{
    
    
	return !(*this == s);
}

This is an overloaded version of the not-equal operator !=, which means that the operator is used to compare whether the current string is not equal to another string s. Similarly, directly use the already defined equal operator == to invert, if the current string is not equal to s, return true, otherwise return false.

In fact, it is the same as the date class comparison operator overloading mentioned in the previous article on classes and objects, which can be reused first > ==or < ==later.

7.13 string class stream insertion << and stream extraction >> overload

The first thing to note here is that stream insertion and stream extraction are defined as global functions here, so we don't define them in the class, but outside the class, that is, globally. The operator overload function defined in this way is not a member of the class, so its implementation cannot directly access the private members of the class, but needs to be accessed through the public interface of the class.

Operator overloading functions can be defined as member functions or global non-member functions, depending on usage scenarios and design requirements. usually,If the operand of the operator is the class object itself or needs to directly access the private members of the class, consider defining it as a member function. andIf the operand of the operator is of a type other than the class object, or the operations involved in the operator are not limited to the class object itself, consider defining it as a global non-member function

stream insert <<

ostream& operator<<(ostream& out, const string& s)
{
    
    
	for (size_t i = 0; i < s.size(); ++i)
	{
    
    
		out << s[i];
	}
	return out;
}

This is an overloaded version of the output operator <<, which means to output the std::string class object s to the output stream out.

Use a loop to iterate over each character in s and output each character in turn to the output stream out. Finally, return the output stream out to support chained output.

istream& operator>>(istream& in, string& s)
{
    
    
	s.clear();

	char ch;
	ch = in.get();

	const size_t N = 32;
	char buff[N];
	size_t i = 0;

	while (ch != ' ' && ch != '\n')
	{
    
    
		buff[i++] = ch;
		if (i == N - 1)
		{
    
    
			buff[i] = '\0';
			s += buff;
			i = 0;
		}

		ch = in.get();
	}

	buff[i] = '\0';
	s += buff;

	return in;
}

This is an overloaded version of the input operator >>, which reads and stores the data in the input stream in into the std::string class object s.

First call the s.clear() function to clear the contents of s to receive new input. Then, a loop is used to read the characters ch one by one from the input stream in. If the character ch is not a space or a newline, the character is added to a temporary character array buff and the index i is incremented. Once the buff is full (i == N - 1), set the last element of buff to the null character '\0', then add the buff to s, then reset the index i to 0 to continue receiving subsequent characters . If the character ch is a space or a newline character, it means that the input of a word is over, set the last element of buff to the null character '\0', and then add buff to s. Finally, the input stream in is returned to support chained input.

8. Simulation implementation of string class (complete code)

#include<iostream>
#include<string.h>
#include<assert.h>
using namespace std;
namespace mystring
{
    
    
	class string
	{
    
    
	public:
		typedef char* iterator;
		typedef const char* const_iterator;

		iterator begin()
		{
    
    
			return _str;
		}

		iterator end()
		{
    
    
			return _str + _size;
		}


		const_iterator begin() const
		{
    
    
			return _str;
		}

		const_iterator end() const
		{
    
    
			return _str + _size;
		}


		string(const char* str = "")
		{
    
    
			_size = strlen(str);
			_capacity = _size;
			_str = new char[_capacity + 1];

			strcpy(_str, str);
		}

		// 传统写法
		//string(const string& s)
		//	:_str(new char[s._capacity+1])
		//	, _size(s._size)
		//	, _capacity(s._capacity)
		//{
    
    
		//	strcpy(_str, s._str);
		//}

		// 现代写法 
		void swap(string& tmp)
		{
    
    
			::swap(_str, tmp._str);
			::swap(_size, tmp._size);
			::swap(_capacity, tmp._capacity);
		}


		string(const string& s)
			:_str(nullptr)
			, _size(0)
			, _capacity(0)
		{
    
    
			string tmp(s._str);
			swap(tmp);
		}

		//string& operator=(const string& s)
		//{
    
    
		//	if (this != &s)
		//	{
    
    
		//		//string tmp(s._str);
		//		string tmp(s);
		//		swap(tmp); // this->swap(tmp);
		//	}

		//	return *this;
		//}


		string& operator=(string s)
		{
    
    
			swap(s);
			return *this;
		}

		~string()
		{
    
    
			delete[] _str;
			_str = nullptr;
			_size = _capacity = 0;
		}

		const char* c_str() const
		{
    
    
			return _str;
		}

		size_t size() const
		{
    
    
			return _size;
		}

		size_t capacity() const
		{
    
    
			return _capacity;
		}

		const char& operator[](size_t pos) const
		{
    
    
			assert(pos < _size);

			return _str[pos];
		}

		char& operator[](size_t pos)
		{
    
    
			assert(pos < _size);

			return _str[pos];
		}

		void reserve(size_t n)
		{
    
    
			if (n > _capacity)
			{
    
    
				char* tmp = new char[n + 1];
				strcpy(tmp, _str);
				delete[] _str;

				_str = tmp;
				_capacity = n;
			}
		}

		void resize(size_t n, char ch = '\0')
		{
    
    
			if (n > _size)
			{
    
    
				// 插入数据
				reserve(n);
				for (size_t i = _size; i < n; ++i)
				{
    
    
					_str[i] = ch;
				}
				_str[n] = '\0';
				_size = n;
			}
			else
			{
    
    
				// 删除数据
				_str[n] = '\0';
				_size = n;
			}
		}

		void push_back(char ch)
		{
    
    
			// 满了就扩容
			if (_size == _capacity)
			{
    
    
				reserve(_capacity == 0 ? 4 : _capacity * 2);
			}

			_str[_size] = ch;
			++_size;
			_str[_size] = '\0';
			//insert(_size, ch);
		}

		void append(const char* str)
		{
    
    
			size_t len = strlen(str);

			// 满了就扩容
			if (_size + len > _capacity)
			{
    
    
				reserve(_size+len);
			}

			strcpy(_str + _size, str);
			//strcat(_str, str); 需要找\0,效率低
			_size += len;
			//insert(_size, str);
		}


		string& operator+=(char ch)
		{
    
    
			push_back(ch);
			return *this;
		}

		string& operator+=(const char* str)
		{
    
    
			append(str);
			return *this;
		}

		string& insert(size_t pos, char ch)
		{
    
    
			assert(pos <= _size);

			// 满了就扩容
			if (_size == _capacity)
			{
    
    
				reserve(_capacity == 0 ? 4 : _capacity * 2);
			}

			size_t end = _size + 1;
			while (end > pos)
			{
    
    
				_str[end] = _str[end - 1];
				--end;
			}

			_str[pos] = ch;
			++_size;

			return *this;
		}

		string& insert(size_t pos, const char* str)
		{
    
    
			assert(pos <= _size);
			size_t len = strlen(str);
			if (_size + len > _capacity)
			{
    
    
				reserve(_size + len);
			}

			// 挪动数据
			size_t end = _size + len;
			while (end >= pos + len)
			{
    
    
				_str[end] = _str[end - len];
				--end;
			}

			strncpy(_str + pos, str, len);
			_size += len;

			return *this;
		}

		void erase(size_t pos, size_t len = npos)
		{
    
    
			assert(pos < _size);

			if (len == npos || pos + len >= _size)
			{
    
    
				_str[pos] = '\0';
				_size = pos;
			}
			else
			{
    
    
				strcpy(_str + pos, _str + pos + len);
				_size -= len;
			}
		}

		void clear()
		{
    
    
			_str[0] = '\0';
			_size = 0;
		}

		size_t find(char ch, size_t pos = 0) const
		{
    
    
			assert(pos < _size);

			for (size_t i = pos; i < _size; ++i)
			{
    
    
				if (ch == _str[i])
				{
    
    
					return i;
				}
			}

			return npos;
		}

		size_t find(const char* sub, size_t pos = 0) const
		{
    
    
			assert(sub);
			assert(pos < _size);

			// kmp/bm
			const char* ptr = strstr(_str + pos, sub);
			if (ptr == nullptr)
			{
    
    
				return npos;
			}
			else
			{
    
    
				return ptr - _str;
			}
		}

		string substr(size_t pos, size_t len = npos) const
		{
    
    
			assert(pos < _size);
			size_t realLen = len;
			if (len == npos || pos + len > _size)
			{
    
    
				realLen = _size - pos;
			}

			string sub;
			for (size_t i = 0; i < realLen; ++i)
			{
      
      
				sub += _str[pos + i];
			}

			return sub;
		}

		bool operator>(const string& s) const
		{
    
    
			return strcmp(_str, s._str) > 0;
		}

		bool operator==(const string& s) const
		{
    
    
			return strcmp(_str, s._str) == 0;
		}

		bool operator>=(const string& s) const
		{
    
    
			return *this > s || *this == s;
		}

		bool operator<=(const string& s) const
		{
    
    
			return !(*this > s);
		}

		bool operator<(const string& s) const
		{
    
    
			return !(*this >= s);
		}

		bool operator!=(const string& s) const
		{
    
    
			return !(*this == s);
		}
	private:
		size_t _capacity;
		size_t _size;
		char* _str;
	public:
		const static size_t npos = -1;
	};

	ostream& operator<<(ostream& out, const string& s)
	{
    
    
		for (size_t i = 0; i < s.size(); ++i)
		{
    
    
			out << s[i];
		}

		return out;
	}


	istream& operator>>(istream& in, string& s)
	{
    
    
		s.clear();

		char ch;
		ch = in.get();

		const size_t N = 32;
		char buff[N];
		size_t i = 0;

		while (ch != ' ' && ch != '\n')
		{
    
    
			buff[i++] = ch;
			if (i == N - 1)
			{
    
    
				buff[i] = '\0';
				s += buff;
				i = 0;
			}

			ch = in.get();
		}

		buff[i] = '\0';
		s += buff;

		return in;
	}
}

epilogue

Interested friends can pay attention to the author, if you think the content is good, please give a one-click triple link, you crab crab! ! !
It is not easy to make, please point out if there are any inaccuracies
Thank you for your visit, UU watching is the motivation for me to persevere.
With the catalyst of time, let us all become better people from each other! ! !
insert image description here

Guess you like

Origin blog.csdn.net/kingxzq/article/details/131953899