C++STL container - simulated implementation of String class

       

Table of contents

 1. First, let’s take a look at the member structure of the String class:

 2. The following implementation of ordinary function interface:

 3. The second step is to simulate the expansion mechanism of String class objects:

 4. Add, delete, modify and check

         push_back, append, += overloaded functions add data:

         insert function - add data anywhere in the array: 

         delete:

         check:

         change:

 Five: Copy construction and assignment overloading:

        5.1 Traditional writing :

        5.2 Modern writing:

 6. Stream insertion/stream extraction overloaded functions:

 Seven: Partial implementation of iterator:

 String class code.h file:


         As a character sequence class in C++, String can perform a series of additions, deletions, checks, and modifications to string data. Let’s take a look at the underlying implementation of multiple commonly used member functions in the String class:

 1. First, let’s take a look at the member structure of the String class:

class String {
public:
    //构造
	String(const char* str = "") 
     {
	    }

    //析构
    ~String(){
	}

private:
	char* _arr;
	size_t _size;
	size_t _capacity;
	const static size_t npos = -1;
};

The bottom layer of this class is a continuous storage space, which is equivalent to a character array. There are four member variables in this class:

        1. One of them is the character pointer _arr, which points to an address in the heap space, in which the string content of the object is stored). Because the memory space in the stack area is very small, it is necessary to open up the heap space to store it. data;

        2._size refers to the total number of characters stored in the current array, excluding the '\0' character. We need to use it every time we add or delete data;

        3._capacity refers to the capacity of the current array. _capacity represents the upper limit storage space of the array. Once this capacity is exceeded, it is equivalent to crossing the boundary. The '\0' character is not counted in _capacity!

        4. Finally, there is the npos variable. It is a const static member, cannot be modified, and is shared by all members of the class.

2. The following implementation of ordinary function interface:

class String{	
 public:
        //拷贝构造
       string(const char* str="") {
			_size = strlen(str);
			_capacity = _size;
			_str = new char[_capacity+1];
			strcpy(_str, str);
		}

		//析构
		~string() {
			_size = _capacity = 0;
			delete[] _str;
			_str = nullptr;
		}
        //求字符串的长度
		 size_t size() const{
			return _size;
		}
		//求字符串的数组容量大小
		size_t capacity() const{
			return _capacity;
		}
		//c_str内容
		char* c_str() const{
			return _str;
		}
        //判断该对象数组中是否为空
        bool empty() const {
		    return _size == 0;
	    }
   
        //获取数组所能存放数据的最大容量
	    size_t Max_size() {
		    return npos;
    	}
        //清空对象所有内容
	    void clear() {
		    _size = 0;
		    _arr[0] = '\0';
	    }

 private:
    char* _arr;
	size_t _size;
	size_t _capacity;
	const static size_t npos = -1;
    };

       For the constructor, I used the default value "empty string" to replace the formal parameters. If the created object is not initialized, it will have no content, so it is more appropriate to use an empty string; if it is initialized, it will have been assigned a value. If so, you can get the size of the assigned string, open up space based on the size, and then adapt to member variables such as _size and _capacity. Then there is an extra new space during the initialization assignment of _arr. This space is used to store the '\0' terminator, which is not included in the _size and _capacity member variables.

        For the destructor, it is to release the heap space and return it to the operating system, and clear the remaining member variables to zero.

        1. In most of the above member functions, const is added. The function of const modified variable is not to allow the variable to be modified in subsequent operations; and the function of const modified member function is also the same, so that when the function returns a value Prevent it from being modified as an lvalue. The first formal parameter in each non-static member function placed in the class is the hidden this pointer, and const modifies this this pointer, which means that this pointer can no longer modify its member variables (_size, _arr ,_capacity).

        2. Almost all of the above member functions encapsulate member variables. Encapsulation improves the security of the underlying member variables and will not be exposed to the outside and used by others at will.

 Note: The following functions are all member functions placed in the class!

  3. The second step is to simulate the expansion mechanism of String class objects:

//扩容机制
	void reserve(size_t n){
		if (n > _capacity) {
			//会重新开辟一块更大的新空间
			char* tmp = new char[n + 1];  //扩容的时候多开一个空间,为'\0'开
			strcpy(tmp, _str);
			//销毁原来的旧空间
			delete[] _str;
			_str = tmp;		//将临时空间再赋值给类成员_str
			_capacity = n;	//更新容量
			}
		}

        The expansion is performed when the original space capacity_capacity is not enough, and the heap space creation space is random, so the expansion system will expand according to whether there is free space behind the space. If there is free space behind the space, then It is in-situ expansion - add the required byte space directly behind the space; the other is off-site expansion - re-select a place of suitable size to open up space for its use. 

        And we don’t know whether the system will be expanded off-site or in-situ, so we choose a temporary pointer (a worker) to help us do these things. After the work is completed (the expansion is completed), we will obtain the results from the worker. Can.

        Note: Off-site expansion will cause you to forget to release the original heap space, so remember to destroy it~

4. Add, delete, modify and check

        push_back, append, += overloaded functions add data:

    //插入字符
	void push_back(char c) {
		//在插入字符时,需要注意对象可能是空字符串,需要手动扩容
		if (_size == _capacity) {
			size_t newcapacity = _capacity == 0 ? 4 : _capacity * 2;
			reserve(newcapacity);
			}
			_str[_size++] = c;
			_str[_size] = '\0';
		}
        
        //插入字符串
		void append(const char* str){
			size_t len = strlen(str);
			
			if (_size+len > _capacity) {
				reserve(_size+len);	
			}
            //方法1:
			/*for (int i = 0; i < len; i++) {
				_str[_size++] = str[i];
			}
			_str[_size] = '\0';*/
			
			//方法2:
			strcpy(_str +_size, str);
				_size += len;
		}

        //插入字符+=
        string& operator+=(char c) {
			push_back(c);
			return *this;
		}
        //插入字符串+=
		string& operator+=(const char* str) {
			append(str);
			return *this;
		}
         For push_back and append functions, they all belong to tail insertion. The efficiency of tail insertion is the highest for arrays, and there is no need to move data!
        Secondly, the += operator overloaded function is also a tail insertion function, and you can directly reuse push_back and append.

        insert function - add data anywhere in the array: 

//在某个位置插入字符
		string& insert(size_t pos,char c) {
			assert(pos <= _size);
			//若该string类对象是空字符串时,需要手动扩容
			if (_size == _capacity) {
				size_t newcapacity = _capacity == 0 ? 4 : _capacity * 2;
				reserve(newcapacity);
			}

			//方法1:
			size_t end = _size+1;
			while (end > pos) {
				_str[end] = _str[end - 1];
				--end;
			}
			_str[pos] = c;
			_size++;

			//方法2:
			/*int end = _size;
			while (end >= (int)pos) {
				_str[end + 1] = _str[end];
				--end;
			}
			_str[pos] = c;
			_size++;*/

			return *this;
		}

		//在某个位置插入字符串
		string& insert(size_t pos, const char* str) {
			assert(pos <= _size);
			size_t len = strlen(str);

			if (_size + len > _capacity) {
				reserve(_size + len);
			}
            //方法1:
			size_t end =_size + len;
			while (end >=pos+len) {
				_str[end] = _str[end -len];
				--end;
			}
            //方法2同上——不展示了

			strncpy(_str+pos, str,len);    
			_size+=len;
			return *this;
		}

        For the insert function, data can be inserted at any position. In this case, three situations need to be considered: inserting data at the end of the array, inserting data at the head of the array, and inserting data in the middle of the array.

        For head insertion and middle insertion, the data needs to be moved to leave enough space for the insertion position. Since the time complexity of moving data in the array is O(N), the efficiency is very low, so the insert function is rarely used. . 

        The above function for adding data needs to check the capacity before each insertion to see whether the array is full and whether it needs to be expanded! 

         delete:

//删除字符串
		string& erase(size_t pos, size_t len=npos) {									
			if (len == npos || pos + len == _size) {
				_str[pos] = '\0';
				_size=pos;
			}
			else {
				strcpy(_str + pos, _str + pos + len);
				_size -= len;
			}
			return *this;
		}

        There are three situations that need to be considered when deleting data: head deletion, tail deletion, and middle deletion. Due to the particularity of the default value of the formal parameter npos, special situations need to be handled.

        check:

//查找字符
	size_t find(char c,size_t pos=0) {	
		assert(pos < _size);
		while (pos < _size) {
			if (_str[pos] == c) {
				return pos;
			}
			++pos;
		}
			//若找不到,则返回-1
		return npos;
	}

	//查找字符串
	size_t find(const char* str, size_t pos = 0) {
		assert(pos < _size);
		const char* ptr = strstr(_str + pos, str);
		if (ptr == nullptr) {
			return npos;
		}
		else {
			return ptr - _str;	
		}
	}

        The search function is easy to write. The search character can be compared one by one using a loop traversal method. If successful, the subscript of the character will be returned. 

        To find a string, the strstrC library function is used. The function of this function is to scan the specified string. If it succeeds, it will return a pointer, if it fails, it will return null. ptr is the string returned successfully by the search. You can use the pointer-pointer=number method to locate the subscript position of the string in the entire class object array!

         change:

//寻找字符串的某个pos位置字符
		char& operator[](size_t pos) {
			assert(pos < _size);
			return _str[pos];
		}

        After overloading the [ ] operator, we can use a loop in the main function to traverse and modify the data of the object! 

Five: Copy construction and assignment overloading:

        5.1 Traditional writing:

string(const string& s) {
		_str = new char[s._capacity + 1];
		_size = s._size;
		_capacity = s._capacity;
		strcpy(_str, s._str);
		}

string& operator=(const string& s) {
	if (this != &s) {	//加if条件是因为,可能有自己给自己赋值的操作,需要考虑这一情况
		char* tmp = new char[s._capacity + 1];
		strcpy(_str, s._str);
		delete _str;
		_str = tmp;
		_size = s._size;
		_capacity = s._capacity;
		}
		return *this;
	}

      1. The copy constructor and assignment overloaded function essentially assign/copy the data of one object to another object!

   

          2. Not writing copy construction and assignment overloaded functions is because _arr points to the heap space, and shallow copying will occur, which will cause two class objects to point to the same heap space, and both will be destroyed during destruction. This causes the system to crash, so the copy construction and assignment overloading must be written personally. In order to avoid shallow copying, the copied and assigned object must have its own heap space, and only the two member variables _size and _capacity are copied. Can.

       

        3. For copy construction and assignment, formal parameters and return values ​​of overloaded functions should be passed by reference as much as possible. This can reduce the number of copies of actual parameters and new parameters and improve operating efficiency!

5.2 Modern writing:

//拷贝构造——现代写法(为了代码的简洁性)
    void swap(string& s) {
		std::swap(_str, s._str);
		std::swap(_size, s._size);
		std::swap(_capacity, s._capacity);
		    }
    //拷贝构造——  String s3(s1);
	string(const string& s)
	:_str(nullptr)
	,_size(0)
	,_capacity(0){
		string tmp(s._str);
		this->swap(tmp);	
		}
    //赋值重载—— s3=s1;
    string& operator=(const string& s) {
		if (this != &s) {	
			string tmp(s);
			this->swap(tmp);
            tmp._arr=nullptr;    //老板给打工人钱
			}
		return *this;
		}

Modern writing methods are more concise and readable than traditional writing methods.

        The core of modern writing is to exchange all the data of the left object and the right object (formal parameters) with swap (library function), but before the exchange, a temporary object is created to allow the object to exchange data with the temporary object, so that Shallow copies will not occur, and the same space will not be destructed twice.

       Assignment overloaded function code analysis: the temporary object tmp is used to copy and construct the formal parameter object s (the formal parameter object s is an alias of the class object s3. Using reference transfer does not create space in the function, and the address of s3 is passed directly. , but this pointer cannot directly copy the data of the address, so by creating a new temporary object tmp, materializing the space to copy the data of s3, and then letting this pointer copy tmp, tmp is like a worker, doing things for the boss, and everything is done. After that, the boss can steal the results of tmp and give money (leave tmp's _arr address blank, so that the wild pointer will not be released when released), and everything will be fine.

6. Stream insertion/stream extraction overloaded functions:

class String{
   public:
    friend ostream& operator<<(ostream& out,  string& s);
    friend istream& operator>>(istream& in, string& s);
    };


	//流插入
	ostream& operator<<(ostream& out,  string& s) {
		for (size_t i = 0; i <s.size(); i++) {
			out << s[i];
		}
		return out;
	}


	//流提取
	istream& operator>>(istream& in, string& s) {
		s.clear();
		char buff[128] = { '\0' };
		char ch = in.get();		//get函数用来提取每一个字符
		size_t i = 0;
		while (ch != ' ' && ch != '\n') {

			if (i == 127) {
				s += buff;
				i = 0;
			}
			buff[i++] = ch;
			ch = in.get();
		}
		if (i > 0) {
			buff[i] = '\0';
			s += buff;
		}
		return in;
	}

        The stream extraction and stream insertion functions can only be placed outside the class. The reason is that if they are placed inside the class, the first parameter of these two functions will be the hidden this pointer. If it is placed within the class, use it during testing: cout<<s1; the compiler cannot recognize this statement and can only write it as s1<<cout; but no one will write it like this, so it has to be placed outside the class.

        Although it is placed outside the class, since the member variables are private in the class and cannot be accessed outside the class, the friend function declaration solves this big problem. The function outside the class is declared within the class through the friend keyword, and then it can be declared in the class. Members of external access classes!

        Friends who don't understand can read this article, which explains the class's stream insertion and stream extraction overloaded operators placed outside the class.

Seven: Partial implementation of iterator:

typedef char* iterator;
	public:
		//迭代器
		iterator begin() {
			return _str;	//begin会指向字符串的首个字符位置
		}

		iterator end() {
			return _str+_size;	//end会指向最后一个有效字符的下一个位置
		}

typedef const char* const_iterator;
		//迭代器
		const_iterator cbegin() const{
			return _arr;	//begin会指向字符串的首个字符位置
		}

		const_iterator cend() const{
			return _arr+_size;	//end会指向最后一个有效字符的下一个位置
		}

        The iterator type name is renamed from char*. Begin and end in the iterator are pointers, pointing to the beginning and end of the class object array.


String class code.h file:

using namespace std;
#include<string.h>
#include<iostream>
#include<assert.h>

namespace Cheng {
	class string {
		typedef char* iterator;
	public:
		//迭代器
		iterator begin() {
			return _str;	//begin会指向字符串的首个字符位置
		}

		iterator end() {
			return _str+_size;	//end会指向最后一个有效字符的下一个位置
		}

	    typedef const char* const_iterator;
	    //const迭代器
	    const_iterator cbegin() const {
		    return _arr;	
	    }

	    const_iterator cend() const {
		    return _arr + _size;	
	    }

		//类对象的构造函数,str=""是缺省值,若使用者不给参数,则是默认使用缺省值——无参构造
		//若是给参数,则按给参数构造,缺省值失效
		string(const char* str="") {
			_size = strlen(str);
			_capacity = _size;
			_str = new char[_capacity+1];
			strcpy(_str, str);
		}

		//析构
		~string() {
			_size = _capacity = 0;
			delete[] _str;
			_str = nullptr;
		}

		//拷贝构造——传统写法
		/*
		string(const string& s) {
			_str = new char[s._capacity + 1];
			_size = s._size;
			_capacity = s._capacity;
			strcpy(_str, s._str);
		}
		*/

		//拷贝构造——现代写法(为了代码的简洁性)

		void swap(string& s) {
			std::swap(_str, s._str);
			std::swap(_size, s._size);
			std::swap(_capacity, s._capacity);

		}
		string(const string& s)
		:_str(nullptr)
		,_size(0)
		,_capacity(0){
			string tmp(s._str);
			this->swap(tmp);
			
		}

		//赋值——传统写法
		/*
		string& operator=(const string& s) {
			if (this != &s) {	//加if条件是因为,可能有自己给自己赋值的操作,需要考虑这一情况
				char* tmp = new char[s._capacity + 1];
				strcpy(_str, s._str);
				delete _str;
				_str = tmp;
				_size = s._size;
				_capacity = s._capacity;
			}
			return *this;
		}
		*/

		//赋值——现代写法
		//s1=s3;
		string& operator=(const string& s) {
		if (this != &s) {	
				string tmp(s);
				this->swap(tmp);
			}
			return *this;
		}

		//求字符串的长度
		const size_t size() const{
			return _size;
		}
		//求字符串的数组容量大小
		size_t capacity() {
			return _capacity;
		}

		//寻找字符串的某个pos位置字符
		//普通对象:可读可写
		char& operator[](size_t pos) {
			assert(pos < _size);
			return _str[pos];
        }
		//c_str内容
		char* c_str() {
			return _str;
		}
        
        //判断该对象数组中是否为空
        bool empty() const {
		    return _size == 0;
	    }

	    void shrink_to_fit() {
		    _capacity = _size;
		    _arr[_size] = '\0';
	    }
        
        //获取数组所能存放数据的最大容量
	    size_t Max_size() {
		    return npos;
    	}

		//扩容机制
		void reserve(size_t n) {
			if (n > _capacity) {
				//会重新开辟一块更大的新临时空间
				char* tmp = new char[n + 1];	//扩容的时候多开一个空间,为\0开
				strcpy(tmp, _str);
				//销毁原来的旧空间
				delete[] _str;
				_str = tmp;		//将临时空间再赋值给类成员_str
				_capacity = n;	//更新容量
			}
		}

	    void resize(size_t n, char ch = '\0') {
		    if (n <= _size) {
			    _arr[n] = '\0';
			    _size = n;
		    }

		    else {
			    if (n <= _capacity) {
				    for (size_t i = _size; i < n; ++i) {
				    	_arr[_size++] = ch;
				    }
		    	}
			    else{
			    	reserve(_capacity*2);
				    for (size_t i = _size; i < n; ++i) {
				    	_arr[_size++] = ch;
				    }
		    	}
		    	_arr[_size] = '\0';
		    }
	    }

		//插入字符
		void push_back(char c) {
			//在插入字符时,需要注意对象可能是空字符串,需要手动扩容
			if (_size == _capacity) {
				size_t newcapacity = _capacity == 0 ? 4 : _capacity * 2;
				reserve(newcapacity);
			}
			_str[_size++] = c;
			_str[_size] = '\0';
		}

		void append(const char* str){
			size_t len = strlen(str);
			
			if (_size+len > _capacity) {
				reserve(_size+len);	
			}
			//方法1:
			/*for (int i = 0; i < len; i++) {
				_str[_size++] = str[i];
			}
			_str[_size] = '\0';*/
			

			//方法2:
			strcpy(_str +_size, str);	//使用strcpy会把字符串的斜杆0也拷贝过来,那么最后就不需要再加斜杠0了
			//_str指针指向字符串的首元素,_str+_size就会让指针指向字符串的最后一个元素的下一个位置
			//那么会在\0位置开始拷贝想要尾插的新字符串
				_size += len;
		}

		string& operator+=(char c) {
			push_back(c);
			return *this;
		}

		string& operator+=(const char* str) {
			append(str);
			return *this;
		}

		//在某个位置插入字符
		string& insert(size_t pos,char c) {
			assert(pos <= _size);
			//若该string类对象是空字符串时,需要手动扩容
			if (_size == _capacity) {
				size_t newcapacity = _capacity == 0 ? 4 : _capacity * 2;
				reserve(newcapacity);
			}

			//挪动数据
			//情况1:若在头部插入时,即pos=0,那么end只能是>pos的,否则会死循环
			//造成死循环原因,size_t不为负,若它为-1,会隐式提升成42亿多
			//方法1:
			size_t end = _size+1;
			while (end > pos) {
				_str[end] = _str[end - 1];
				--end;
			}
			_str[pos] = c;
			_size++;

			//方法2:不建议用
			/*int end = _size;
			while (end >= (int)pos) {
				_str[end + 1] = _str[end];
				--end;
			}
			_str[pos] = c;
			_size++;*/

			return *this;
		}

		//在某个位置插入字符串
		string& insert(size_t pos, const char* str) {
			assert(pos <= _size);

			//若该string类对象是空字符串时,需要手动扩容
			size_t len = strlen(str);

			if (_size + len > _capacity) {
				reserve(_size + len);
			}

			size_t end =_size + len;
			while (end >=pos+len) {
				_str[end] = _str[end -len];
				--end;
			}
			strncpy(_str+pos, str,len);
			_size+=len;
			return *this;
		}

		//删除字符串
		string& erase(size_t pos, size_t len=npos) {	//len=npos为缺省参数,若使用的时候mai'n中不给第二个参数
														//代表从pos位置会直接删到'\0'结束
			if (len == npos || pos + len == _size) {
				_str[pos] = '\0';
				_size=pos;
			}
			else {
				strcpy(_str + pos, _str + pos + len);
				_size -= len;
			}
			return *this;
		}

		//查找字符
		size_t find(char c,size_t pos=0) {	//pos又给缺省参数,因为C++库中的string类find函数,pos可以不给参数,默认为0
			assert(pos < _size);
			while (pos < _size) {
				if (_str[pos] == c) {
					return pos;
				}
				++pos;
			}
			//若找不到,则返回-1
			return npos;
		}

		//查找字符串
		size_t find(const char* str, size_t pos = 0) {
			assert(pos < _size);
			const char* ptr = strstr(_str + pos, str);
			if (ptr == nullptr) {
				return npos;
			}
			else {
				return ptr - _str;	
			}
		}
		//清空函数
		void clear() {
			_size = 0;
			_str[0] = '\0';
		}

	private:
		size_t _size;
		size_t _capacity;
		char* _str;
		//
		const static size_t npos = -1;
	};

	//流插入
	ostream& operator<<(ostream& out,  string& s) {
		for (size_t i = 0; i <s.size(); i++) {
			out << s[i];
		}
		return out;
	}

	//流提取
	istream& operator>>(istream& in, string& s) {
		s.clear();
		char buff[128] = { '\0' };
		char ch = in.get();		//get函数用来提取每一个字符
		size_t i = 0;
		while (ch != ' ' && ch != '\n') {

			if (i == 127) {
				s += buff;
				i = 0;
			}
			buff[i++] = ch;
			ch = in.get();
		}
		if (i > 0) {
			buff[i] = '\0';
			s += buff;
		}
		return in;
	}

Guess you like

Origin blog.csdn.net/weixin_69283129/article/details/131883775