Explanation and simulation implementation of string class (c++)

content

1. What is STL

1. What is STL

2. STL version

3. Six components of STL

4. Defects of STL

2. What is string

 1. Why learn string?

2. The introduction of string in the document

3. Summary

 3. Member functions of the string class

1. Common construction of string class objects

 2. Capacity operation of string class objects

3. Access and traversal operations of string class objects

 4. The modification operation of the string class object

5. String class non-member function

 Four. The simulation implementation of the string class

1. Member variable & iterator declaration & friend declaration of input and output function overloading

2. Common construction of string class objects

3. Iterator (think of it as a pointer)

4. Capacity operation of string class objects

5. Modification operation of string class object

6. Access to string class objects

7. String class non-member function

8. Special operations related to strings in the string class

9. Input and output function overloading of string objects

5. Matters needing attention when using the string class

1. Real string class internal member variables

2. The real string class expansion method (different compilers may be different)

3. The process of using reserve for capacity expansion

4. The difference between [ ] and at

5. Details about copy constructor and assignment operator overloading

6.string和vector<char>


1. What is STL

To simulate and implement strings, we must first understand what strings are, and to understand strings, we must first know STL.

1. What is STL

The full name of STL is standard template libaray-standard template library, which is an important part of the C++ standard library. It is not only a reusable component library, but also a software framework that includes data structures and algorithms. Data structures are encapsulated in templates and provide some general and flexible algorithms.

2. STL version

①.Original version The original version completed by Alexander Stepanov and Meng Lee in HP Labs, in the spirit of open source, they declare that they allow anyone to use, copy, modify, disseminate, and commercially use these codes without payment. The only condition is that it also needs to be used as open source as the original version, the HP version - the ancestor of all STL implementations.

②.PJ version is developed by PJ Plauger, inherited from HP version, adopted by Windows Visual C++, cannot be disclosed or modified, defects: low readability, strange symbol naming.

③.RW version is developed by Rouge Wage Company, inherited from HP version, adopted by C++ Builder, cannot be published or modified, and has general readability.

④. The SGI version was developed by Silicon Graphics Computer Systems, Inc, and inherited from the HP version. Adopted by GCC (Linux), it has good portability, can be published, modified and even sold. From the point of view of naming style and programming style, it is very readable. We will read part of the source code to learn STL later, and this version is the main reference.

3. Six components of STL

The string class we're going to simulate is part of the container. 

4. Defects of STL

①. The update of the STL library is too slow. This is a serious complaint. The last version was C++98, and the C++03 in the middle was basically revised. It has been 13 years since the release of C++11. renew.

②. STL does not support thread safety now, we need to lock ourselves in a concurrent environment, and the granularity of the lock is relatively large.

③. STL's extreme pursuit of efficiency leads to complex internals, such as type extraction and iterator extraction.

4. The use of STL will have the problem of code expansion. For example, the use of vector<int>/vector<double>/vector<char> will generate multiple codes, of course, this is caused by the template syntax itself.

2. What is string

 1. Why learn string?

String in C language: In C language, a string is a collection of characters ending with '\0'. For the convenience of operation, some str series library functions are provided in the C standard library, but these library functions are not related to strings. It is separated, which is not in line with the idea of ​​OOP (object-oriented programming), and the underlying space needs to be managed by the user, and may be accessed out of bounds if you are not careful.

Therefore: In routine work, the string class is basically used for simplicity, convenience and speed, and few people use the string manipulation functions in the C library.

2. The introduction of string in the document

①. String is a class that represents a sequence of characters.

②. The standard string class provides support for such objects, and its interface is similar to that of the standard character container, but adds design features specifically for manipulating single-byte character strings.

③. The string class is to use char (i.e. as its character type, use its default char_traits and allocator type (for more information on templates, see basic_string)).

4. The string class is an instance of the basic_string template class, which uses char to instantiate the basic_string template class, and uses char_traits and allocator as the default parameters of basic_string (for more template information, please refer to basic_string).

⑤. Note that this class handles bytes independently of the encoding used: if used to handle sequences of multibyte or variable-length characters (such as UTF-8), all members of this class (such as length or size) and its , will still operate on bytes (rather than the actual encoded characters).

Note: When using the string class, you must include #include<string> and using namespace std;

3. Summary

 3. Member functions of the string class

Note: We only explain some commonly used member functions here, not all.

1. Common construction of string class objects

 2. Capacity operation of string class objects

Notice:

1. The underlying implementation principle of size() and length() methods is exactly the same. The reason for introducing size() is to be consistent with the interfaces of other containers. In general, size() is basically used.

2. clear() just clears the valid characters in the string without changing the underlying space size.

3. resize(size_t n) and resize(size_t n, char c) both change the number of valid characters in the string to n, the difference is that when the number of characters increases: resize(n) fills the number of characters with 0. The extra element space, resize(size_t n, char c) fills the extra element space with the character c. Note: When resize changes the number of elements, if the number of elements is increased, it may change the size of the underlying capacity. If the number of elements is reduced, the total size of the underlying space remains unchanged.

4. reserve(size_t res_arg=0): Reserve space for the string without changing the number of valid elements. When the reserve parameter is less than the total size of the underlying space of the string, the reserve will not change the capacity.

3. Access and traversal operations of string class objects

 4. The modification operation of the string class object

Notice:

1. When appending characters at the end of the string, the three methods of s.push_back(c) / s.append(1, c) / s += 'c' are similar. In general, the += operation of the string class is used for comparison Multiple, += operations can not only concatenate single characters, but also concatenate strings.

2. When operating on strings, if you can roughly estimate how many characters to put, you can reserve the space through reserve first.

5. String class non-member function

 Four. The simulation implementation of the string class

1. Member variable & iterator declaration & friend declaration of input and output function overloading

//输入输出函数重载
        friend ostream& operator<<(ostream& _cout, const lzStr::string& s);
        friend istream& operator>>(istream& _cin, lzStr::string& s);
//迭代器声明
        typedef char* iterator;
//成员变量      
        char* _str;
        size_t _capacity;
        size_t _size;

2. Common construction of string class objects

		string(const char* str = "")//构造函数
			:_str(nullptr)
			, _size(0)
			, _capacity(0) 
		{
			if (str != nullptr) {
				_str = new char[strlen(str) + 1];
				strcpy(_str, str);
				_size = strlen(str);
				_capacity = strlen(str);
			}
			else {
				perror("string:");
			}
		}

		string(const string& s)//拷贝构造函数
			:_str(nullptr)
		{
			lzStr::string tempStr(s._str);
			swap(tempStr);
		}

		string& operator=(const string& s) {//赋值运算符重载
			if (this != &s) {
				lzStr::string tempStr(s._str);
				this->swap(tempStr);
			}
			return *this;
		}

		~string() {//析构函数
			if (_str != nullptr) {
				delete[] _str;
				_str = nullptr;
			}
		}

3. Iterator (think of it as a pointer)

		iterator begin() {
			return _str;
		}

		iterator end() {
			return _str + _size;
		}

4. Capacity operation of string class objects

		size_t size()const {//有效元素个数
			return _size;
		}

		size_t capacity()const {//容量
			return _capacity;
		}

		bool empty()const {//判断有效元素个数是否为0
			if (size() == 0) {
				return true;
			}
			return false;
		}

		void resize(size_t n, char c = '\0') {//改变有效元素个数,以c填充
			int new_size = n;
			if (new_size > size()) {
				if (new_size > capacity()) {
					reserve(new_size);
				}
				int i;
				for (i = size(); i < new_size; i++) {
					_str[i] = c;
				}
			}
			_size = new_size;
			_str[n] = '\0';
		}

		void reserve(size_t n) {//扩容
			if (n > capacity()) {
				char* temp = new char[n + 1];
				strcpy(temp, _str);
				delete[] _str;
				_str = temp;
				_capacity = n;
			}
		}

5. Modification operation of string class object

		void push_back(char c) {//尾插字符
			if (_size == _capacity) {
				reserve(_capacity*2);
			}
			*end() = c;
			_size++;
			*end() = '\0';
		}

		string& operator+=(char c) {//+=运算符重载,类似于尾插字符
			push_back(c);
			return *this;
		}

		void append(const char* str) {//在尾部追加字符串
			size_t new_size = _size + strlen(str);
			if (new_size > _capacity) {
				reserve(_capacity * 2);
			}
			int i = _size,j=0;
			while (str[j] != '\0') {
				_str[i] = str[j];
				++i, ++j;
			}
			_str[i] = '\0';
			_size = i;
		}

		string& operator+=(const char* str) {//+=运算符重载,类似于尾插字符串
			append(str);
			return *this;
		}

		void clear() {//将有效元素个数置为0
			*begin() = '\0';
			_size = 0;
		}

		void swap(string& s) {//交换
			std::swap(_str, s._str);
			std::swap(_size, s._size);
			std::swap(_capacity, s._capacity);
		}

		const char* c_str()const {//将string对象转为字符串
			return _str;
		}

6. Access to string class objects

		char& operator[](size_t index) {//[]运算符重载(普通类型)
			if (index<0 || index>=size()) {
				perror("operator[]:");
			}
			return *(_str + index);
		}

		const char& operator[](size_t index)const {//[]运算符重载(const类型)
			if (index < 0 || index >= size()) {
				perror("const operator[]:");
			}
			return *(_str + index);
		}

7. String class non-member function

		bool operator<(const string& s) {//<运算符重载
			if (strcmp(_str, s._str) < 0) {
				return true;
			}
			return false;
		}

		bool operator<=(const string& s) {//<=运算符重载
			if (strcmp(_str, s._str) > 0) {
				return false;
			}
			return true;
		}

		bool operator>(const string& s) {//>运算符重载
			if (strcmp(_str, s._str) > 0) {
				return true;
			}
			return false;
		}

		bool operator>=(const string& s) {//>=运算符重载
			if (strcmp(_str, s._str) < 0) {
				return false;
			}
			return true;
		}

		bool operator==(const string& s) {//==运算符重载
			if (strcmp(_str, s._str) == 0) {
				return true;
			}
			return false;
		}

		bool operator!=(const string& s) {//!=运算符重载
			if (strcmp(_str, s._str) == 0) {
				return false;
			}
			return true;
		}

8. Special operations related to strings in the string class

// 返回c在string中第一次出现的位置

		size_t find(char c, size_t pos = 0) const {
			if (pos < 0 || pos >= size()) {
				perror("find c:");
			}
			int i;
			for (i = pos; i < size(); i++) {
				if (_str[i] == c) {
					return i;
				}
			}
			return -1;
		}

// 返回子串s在string中第一次出现的位置

		size_t find(const char* s, size_t pos = 0) const {
			if (pos < 0 || pos >= size()) {
				perror("find s:");
			}
			int i;
			for (i = pos; i < size(); i++) {
				int j = 0;
				int i_1 = i;
				while (i_1 < size() && j < strlen(s) && _str[i_1] == s[j]) {
					++i_1, ++j;
				}
				if (j == strlen(s)) {
					return i;
				}
			}
			return -1;
		}

// 在pos位置上插入字符c/字符串str,并返回该字符的位置

		string& insert(size_t pos, char c) {
			if (pos < 0 || pos >= size()) {
				perror("insert c:");
			}
			int new_size = size() + 1;
			if (new_size > capacity()) {
				reserve(capacity() * 2);
			}
			auto i = begin() + pos;
			for (auto j = end() - 1; j >= i; --j) {
				*(j + 1) = *j;
			}
			*i = c;
			_size = new_size;
			return *this;
		}

		string& insert(size_t pos, const char* str) {
			if (pos < 0 || pos >= size()) {
				perror("insert s:");
			}
			int new_size = size() + strlen(str);
			if (new_size > capacity()) {
				reserve(capacity() * 2);
			}
			auto i = begin() + pos;
			for (auto j = end() - 1; j >= i; --j) {
				*(j + strlen(str)) = *j;
			}
			for (int j = 0; str[j] != '\0'; ++j) {
				*i = str[j];
				++i;
			}
			_size = new_size;
			return *this;
		}
		
// 删除pos位置上的元素,并返回该元素的下一个位置

		string& erase(size_t pos, size_t len) {
			if (pos < 0 || pos >= size()) {
				perror("erase:");
			}
			auto i = begin() + pos;
			int new_size = size() - len;
			for (auto j = i + len; j < end(); ++j) {
				*i = *j;
				++i;
			}
			_size = new_size;
			return *this;
		}

9. Input and output function overloading of string objects

//输出函数重载
ostream& lzStr::operator<<(ostream& _cout, const lzStr::string& s) {
	int i;
	for (i = 0; i < s.size(); i++) {
		_cout << s[i];
	}
	return _cout;
}

//输入函数重载
istream& lzStr::operator>>(istream& _cin, lzStr::string& s) {
	char* str = (char*)malloc(sizeof(char) * 100);
	char* buf = str;
	int i = 1;
	while ((*buf = getchar()) == ' ' || (*buf == '\n'));
	for (; ; ++i) {
		if (*buf == '\n') {
			*buf = '\0';
			break;
		}
		else if (*buf == ' ') { 
			*buf = '\0';
			break;
		}
		else if (i % 100 == 0) { 
			i += 100; 
			str = (char*)realloc(str, i);
		}
		else {  
			buf = (str + i);
			*buf = getchar();
		}
	}
	s._str = str;
	s._capacity = s._size = i;
	return _cin;
}

5. Matters needing attention when using the string class

1. Real string class internal member variables

When we create an object of type string, we directly use sizeof() to print its size. We will find that it is 28 bytes under a 32-bit operating system, but the variables in the string class we speculate are only: char* str, size_t size, size_t capacity; The size of the string calculated from these variables should be 12, so why is there 16 extra bytes? When c++ created the string container, it gave a char str[16] , when the length of the string we write is less than 16 bytes, we will not use char* str to new dynamic character array, this setting greatly improves the efficiency when we use string, of course, we use reverse to string These problems can also be found when the object is expanded. When we keep increasing the capacity of the string object, there will be no problems, but if we use reverse to reduce the capacity of the string object, the capacity of the string object will not. Change, because it is not easy to apply for space, it is not easy to go through a lot of processes, so we do not need to release the space before its life cycle ends, but there will be exceptions, when we reduce its capacity to less than 16 words At the time of section, the str[16] given to us by C++ is completely sufficient, so the space for the dynamic application is released, and str[16] can be used directly. At this time, the size of the capacity is 15, because one word is required. section to store '\0', shrinking again will not change it.

2. The real string class expansion method (different compilers may be different)

When using reserve in string to expand, the compiler will not expand according to our requirements, but will expand according to capacity*1.5 (vs), until the capacity is greater than or equal to our needs and stop. The vector is not the same , when using reserve in vector to expand, the compiler will expand according to our needs, we will expand it to as much as we need, and expanding to a small size will not take effect.

3. The process of using reserve for capacity expansion

1. Open up new space.   

2. Copy the element.   

3. Free up old space.   

4. Use the new space.

4. The difference between [ ] and at

When we use [ ] and at, we should pay attention to the difference between them. Both of them can be used to access elements in the string with the same effect. The difference is that when an error occurs, if [ ] is used, the program will crash directly. at will throw an exception where an error occurs.

5. Details about copy constructor and assignment operator overloading

a. Question:

When we implement a string class ourselves, we should pay attention to the problem of using shallow copy in the class copy constructor and assignment operator overloading.

b. Several methods to solve shallow copy:

①. Deep copy. (Old)

(The scheduling of space resources exists in the premise class) We use the old version of deep copy in the two member functions of the copy constructor and the assignment operator overload to implement, that is, when the copy of space resources is involved, we call it every time. Create a new space for the calling object to use.

②. Deep copy. (New)

(The scheduling of space resources exists in the premise class) We use the new version of deep copy in the two member functions of the copy constructor and the assignment operator overload to implement, that is, in these two functions, the _str of the parameter object is called as a parameter The constructor of the class reconstructs an object, and exchanges the members of the temporary object with the members of the object that calls these two functions one by one. Finally, there are new members in the object that calls these two functions. The members of the temporary object will be released.

③. Copy-on-write. (Not recommended for defects)

What is copy-on-write?

When we define the copy constructor and assignment operator overloading, we use the shallow copy method to define, of course, there may be doubts? If we use the shallow copy to define, then when these two member functions are called, it will definitely appear The problem of multiple objects using the same space, so how to solve it? When we use the same space for multiple objects, as long as we do not modify the value in the space, there will be no problem, so we add another block to the space The space is used to record how many objects are shared in this space. In the copy constructor and assignment operator overloading, the counter in this space is incremented by 1 each time it is called, and then a shallow copy is performed. In the member function, the value in this counter space is detected. If the number of shares is greater than 1, then a new space is applied and the content in the original space is copied, and the value of the counter in the new space is assigned to 1, and the The value of the counter in the old space is -1, let the object point to this new space and operate on it. This is copy-on-write. In addition, we must pay attention to the destructor. We must judge first when the destructor is defined. Whether the counter in the space used by the object is 1, if it is not 1, the value of the counter will be -1, and if it is 1, the space will be released.

Also copy on read? (defect)

When we use [ ] or iterator, we can either modify the content in the space or access the content in the space, so the compiler can't know what we want to do, so a judgment is added inside these two member functions, when When there is only one object in the space, it can directly return the corresponding value, but if more than one object is in use, the copy-on-write technology will still be used for processing. Therefore, when we use [ ] to traverse the contents of the space, the system will It is found that this space is not in use by an object, then even if we are read-only, the [ ] function will still proceed --> re-apply for new space --> copy elements --> use new space, and wait for a series of copy-on-write techniques to perform deal with.

6.string和vector<char>

question:

The string class is a dynamic sequence table that manages characters, and the vector is a dynamic sequence table that manages elements of any type. Since vector<char> is a character array, why does STL provide a separate string class?

the difference:

1. The character array is not necessarily a string, because it is stipulated in c/c++ that the last valid element of the string must be \0.

2. The string class has some special operations related to strings.

Guess you like

Origin blog.csdn.net/weixin_49312527/article/details/124106668