[C++] String class simulation implementation Part 1 (with complete source code)

Preface

In the previous article, we introduced in detail the use of some common interfaces of the string class. In this article, we will conduct a simulated implementation of string to help everyone understand more deeply.

1. Basic structure of string

In the previous article we learned about:

The bottom layer of string is actually a character array that supports dynamic growth.. Then determine its structure, and then we start to simulate and implement it.

First create a new header file string.hand define a string class:

class string
{
    
    
    public :
    //成员函数
    private :
        char*  _str;
        size_t _size;
        size_t _capacity;
};

Here, there are three member variables of the string class, a character pointer _strpoints to the opened dynamic array, _sizeidentifies the number of valid data, and _capacityrecords the size of the capacity (excluding '\0').

But because there is already a string class in the standard library,In order to avoid conflicts, we need to define a namespace and put our own implemented string class into our own namespace。

namespace w
{
    
    
    class string
{
    
    
    public :
    //成员函数
    private :
        char*  _str;
        size_t _size;
        size_t _capacity;

};
    
}

2. Constructor, destructor

2.1 Implementation of constructor

2.1.1 Constructor with parameters

First, let's simulate and implement a constructor with parameters:

We know that there are many constructors of the string class in the standard library. Here we only simulate and implement the most commonly used ones:

In the previous article, we mentioned that we should try to use the initialization list for initialization. We can write like this:

char* strBut here you will find that the program reports an error, because if it is initialized as shown above, it first involves the issue of permission amplification (discussed in the previous article). It is modified and cannot be modified, constbut the assignment _stris _strof char*type and can be modified. Secondly, initialization with a constant string cannot be modified.

then what should we do? We do not pass parameters directly here but open space and use strcpy to copy:

      string(const char* str)
	        :_str(new char[strlen(str)+1])
		    ,_size(strlen(str))
		    ,_capacity(strlen(str))
	    {
    
    
            strcpy(_str, str);
        }

By the way, here we provide an interface to return a string:

 const char* c_str()
        {
    
    
            return _str;
        }

We are creating a test.cppfile to test the interface we wrote:

2.2 Destructor

Here we give the destructor directly:

~string()
        {
    
    
            delete[] _str;
            _str = nullptr;
            _size = _capacity = 0;
        }

2.3 Parameterless constructor

Sometimes we will encounter such a scenario:

So here we need to implement a parameterless constructor.

Assume that the parameterless constructor here is implemented like this:

Is this really feasible?
If a null pointer is passed here _str, the function just implemented c_strwill return an empty program and the program will crash. And the interface in the standard library c_strwill still have a return value even if it is empty.

So what should we do here? We can write this:

  string()
	        :_str(new char[1])
		    ,_size(0)
		    ,_capacity(0)
	    {
    
    
            _str[0] = '\0';
        }

Here we _stropen up a space, and then give this space '\0'. In this way, the above problems will not occur.。

2.4 Merging no-parameter and parameterized constructors

We have mentioned before that you can use the full default for no parameters and with parameters.。

Let’s look at several ways to write it:

Can it be written like this?The answer is that you definitely can’t write it like this. The types will not match. One is a character and the other is a string..

Can it be written like this?The answer is definitely not. If you write strlen like this, strit will be an empty string.

In fact, it should be written like this:

Here we directly give an empty string, which is "\0"present

3. String traversal

3.1 operator[ ]

We know that in the standard library, you can access a certain character in a string through subscripting. Let's implement the overloading [].

First we need to implement size(the interface:

Next we will implement the following []overloading:

Here we have implemented two versions: the normal version corresponds to ordinary objects, and the const version corresponds to const objects, and these two functions constitute function overloading。

Let’s verify it below:

3.2 Iterator simulation implementation (simple implementation)

In addition to []traversing and accessing string objects, we can also access them using iterators.

As we said, the iterator can be understood as something like a pointer, but it is not necessarily a pointer.
We initially introduced that there are several versions of STL, and the implementation of different versions may be different.
In fact, the string iterator under VS is not implemented using pointers, but the SGI version used under G++ is implemented by pointers.
So here we simulate the implementation using pointers:

Let’s verify it below:

Similarly, we can also use range for to traverse:

the bottom layer of range for is the iterator used.
You can understand that the syntax of range for is actually somewhat similar to the macro we learned before. It will be replaced by an iterator, which is equivalent to assigning *it to ch. The bottom layer of range for is brainless replacement.

3.3 const iterator simulation implementation

Here we implement the const version for const objects to use:

4. Addition, deletion, checking and modification of data

First, let's implement push_back()and append(). Both of these are inserting data. Since data is inserted, we must consider the issue of expansion.

So if we expand the capacity here, how much should we expand at one time?
For push_back, it is okay to expand twice at a time, but it may not be possible for append to expand twice at a time.
Why?
If the current capacity is 10, adding a string with a length of 25 and expanding the capacity to twice the original size of
20 is not enough.

Then here we reserve through another interface of string, which can change the capacity to the size we specify and help us expand the capacity.
Let's first implement reserve.

4.1 reserve

Let’s first take a look at how to implement reserve:

When the value of parameter n is less than here _capacity, if this if is not added, the size will be reduced. But we know that Curry’s interface will not be reduced in size. Therefore, this conditional judgment needs to be added.

4.2 push_back和append

Then reservewe will continue to implement push_backand append.

push_backHere we directly choose to double the expansion.

The capacity here appendis at least expanded to _size + len.

Let’s implement it below:

4.3 +=

Although we have push_back and append, we prefer to use overloaded ones +=. Of course, the bottom layer of += can also be implemented using push_back and append.

Let’s implement it below:

4.4 insert

For insert, we mainly implement these two versions of the library:

First, let's implement inserting n characters at the pos position:
the logic is actually relatively simple.First, determine whether expansion is needed, and then insert data. If you insert it in the middle, you need to move the data.

Is there any problem with writing this way? Let’s test it out:

There seems to be no problem. Is it really okay?

Let's look at a special case: pos = 0inserting data at that time:

The program hangs here. So why?
Here, pos = 0when end is equal to 0, it will enter the loop. What will end become again? Is it -1?

The type of end here is szie_t, an unsigned integer, so after end is 0 - - is not -1, but the maximum value of the integer. An out-of-bounds occurrence occurs and the loop does not end normally, so the program crashes.

So how to solve it? Is it possible to change end to int?

It's not feasible here either. Comparing end with pos, end becomes int, but pos is of type size_t, so integer promotion will occur here (C language knowledge). So how should we solve it?

There are many solutions here, and we use one of them to solve it using npos mentioned in our previous article :

Let’s test it again:

Just now we inserted a character, now we will insert a string. Then the logic is actually the same as above. It's just that we only need to move n spaces above, so here we need to move the data to make strlen(str)space.

Let's test it below:

4.5 erase

Then let's implement erase and delete len characters from the pos position:

For erasethe first case, which is pos+lenless than the length of the string, we need to delete the last len characters starting at the pos position, but still retain the subsequent characters. Then here is to move the data at the back and overwrite the ones that need to be deleted.。
In other cases, len is relatively large, and pos+len is directly greater than or equal to the length of the string, then delete everything after pos. Or if the pos parameter is not passed and the default value is npos, then all the following ones must be deleted, so these two situations can be handled uniformly. Here you only need to “\0”set .

Let’s test it out:

Of course, in order to be consistent with the standard library, we also use reference return here:

4.6 find

Let’s implement it below find. The implementation of find is actually very simple. It traverses to find it. If it is found, it returns the subscript. If it cannot find it, it returns npos.

Of course, find also supports searching for a string starting from the pos position: here we reuse the search method in the C language strstr.

Let's test it below:

4.7 substr

Let’s implement it next substr. Its logic is also very simple.

Something to note here is that we need to conditionally judge that when the intercepted string is long enough, the length we intercept is from posthe position to the end of the string.

5. Copy construction

Let's first write a piece of code like this:

There is a copy construct here, s2 is a copy construct of s1.

5.1 Shallow copy default copy structure

In the previous article on classes and objects, we know that the copy constructor will be generated by default if we don’t write it ourselves. Here we run the above code directly:

A classic shallow copy problem occurred when the program went wrong here. We have also talked about it in previous articlesIf not explicitly defined, the compiler will generate a default copy constructor. The default copy constructor copies objects in memory storage byte order. This type of copy is called shallow copy, or value copy.Once resource application is involved, the copy constructor must be written, otherwise it will be a shallow copy and problems will occur.

5.2 Deep copy copy constructor

Here we need to implement the copy constructor ourselves and complete the deep copy:

let's test it:

6. Source code (upper part)

6.1 string.h

#include <iostream>
using namespace std;
namespace w
{
    
    
    class string
{
    
    
    public :
        typedef char* iterator;
        typedef const char* const_iterator;
       iterator begin()
		{
    
    
			return _str;
		}

		iterator end()
		{
    
    
			return _str + _size;
		}

        const_iterator begin() const
		{
    
    
			return _str;
		}

		const_iterator end() const
		{
    
    
			return _str + _size;
		}

        string(const char* str = "")
	        :_str(new char[strlen(str)+1])
		    ,_size(strlen(str))
		    ,_capacity(strlen(str))
	    {
    
    
            strcpy(_str, str);
        }

        
		string(const string& s)
		{
    
    
			_str = new char[s._capacity + 1];
			strcpy(_str, s._str);
			_size = s._size;
			_capacity = s._capacity;
		}


        ~string()
        {
    
    
            delete[] _str;
            _str = nullptr;
            _size = _capacity = 0;
        }

        const char* c_str() const
        {
    
    
            return _str;
        }

        size_t size() const
        {
    
    
            return _size;
        }

        char& operator[](size_t pos)
       {
    
    
	        assert(pos < _size);
	        return _str[pos];
       }

         const char& operator[](size_t pos) const
       {
    
    
	        assert(pos < _size);
	        return _str[pos];
       }

          void reserve(size_t n)
		{
    
    
			if (n > _capacity)
			{
    
    
				char* tmp = new char[n + 1];
				strcpy(tmp, _str);
				delete[] _str;
				_str = tmp;
				_capacity = n;
			}
		}
		void push_back(char ch)
		{
    
    
			if (_size == _capacity)
			{
    
    
				// 2倍扩容
				reserve(_capacity == 0 ? 4 : _capacity * 2);
			}

			_str[_size] = ch;

			++_size;
			_str[_size] = '\0';
		}

		void append(const char* str)
		{
    
    
			size_t len = strlen(str);
			if (_size + len > _capacity)
			{
    
    
				// 至少扩容到_size + len
				reserve(_size+len);
			}

			strcpy(_str + _size, str);
			_size += len;
		}

        string& operator+=(char ch)
		{
    
    
			push_back(ch);
			return *this;
		}

		string& operator+=(const char* str)
		{
    
    
			append(str);
			return *this;
		}

        	void insert(size_t pos, size_t n, char ch)
		{
    
    
			assert(pos <= _size);

			if (_size +n > _capacity)
			{
    
    
				// 至少扩容到_size + len
				reserve(_size + n);
			}

			// 添加注释最好
			size_t end = _size;
			while (end >= pos && end != npos)
			{
    
    
				_str[end + n] = _str[end];
				--end;
			}

			for (size_t i = 0; i < n; i++)
			{
    
    
				_str[pos + i] = ch;
			}

			_size += n;
		}

        	void insert(size_t pos, const char* str)
		{
    
    
			assert(pos <= _size);

			size_t len = strlen(str);
			if (_size + len > _capacity)
			{
    
    
				// 至少扩容到_size + len
				reserve(_size + len);
			}

			// 添加注释最好
			size_t end = _size;
			while (end >= pos && end != npos)
			{
    
    
				_str[end + len] = _str[end];
				--end;
			}

			for (size_t i = 0; i < len; i++)
			{
    
    
				_str[pos + i] = str[i];
			}

			_size += len;
		}

        string& erase(size_t pos, size_t len = npos)
		{
    
    
			assert(pos <= _size);

			if (len == npos || pos + len >= _size)
			{
    
    
				_str[pos] = '\0';
				_size = pos;

				_str[_size] = '\0';
			}
			else
			{
    
    
				size_t end = pos + len;
				while (end <= _size)
				{
    
    
					_str[pos++] = _str[end++];
				}
				_size -= len;
			}

            return *this;
		}

        size_t find(char ch, size_t pos = 0)
		{
    
    
			assert(pos < _size);

			for (size_t i = pos; i < _size; i++)
			{
    
    
				if (_str[i] == ch)
				{
    
    
					return i;
				}
			}

			return npos;
		}

		size_t find(const char* str , size_t pos = 0)
		{
    
    
			assert(pos < _size);

			const char* ptr = strstr(_str + pos, str);
			if (ptr)
			{
    
    
				return ptr - _str;
			}
			else
			{
    
    
				return npos;
			}
		}

        string substr(size_t pos = 0, size_t len = npos)
		{
    
    
			assert(pos < _size);

			size_t n = len;
			if (len == npos || pos + len > _size)
			{
    
    
				n = _size - pos;
			}

			string tmp;
			tmp.reserve(n);
			for (size_t i = pos; i < pos + n; i++)
			{
    
    
				tmp += _str[i];
			}

			return tmp;
		}

    private :
        char*  _str;
        size_t _size;
        size_t _capacity;

    public:
		const static size_t npos;

};
    
	const size_t string::npos = -1;
}

6.2 test.cpp

#include "Mystring.h"

void test_string1()
{
    
    
    w ::string s1("hello world");
    cout << s1.c_str() << endl;

    for (size_t i = 0; i < s1.size(); i++)
    {
    
    
        cout << s1[i] << " ";
    }
    cout << endl;

    w ::string::iterator it = s1.begin();
    while (it != s1.end())
    {
    
    
        cout << *it << " ";
        ++it;
    }
    cout <<endl;
    
    for (auto ch : s1)
    {
    
    
        cout << ch <<" ";
    }
    cout <<endl;
}

void test_string2()
{
    
    

	w::string s1("hello world");
	cout << s1.c_str() << endl;

	s1.push_back(' ');
	s1.push_back('#');
	s1.append("hello");
	cout << s1.c_str() << endl;

    w::string s2("hello world");
	cout << s2.c_str() << endl;

	s2 += ' ';
	s2 += '#';
	s2 += "hello code";
	cout << s2.c_str() << endl;

}

void test_string3()
{
    
    
	w::string s1("helloworld");
	cout << s1.c_str() << endl;

	s1.insert(5, 3, '#');
	cout << s1.c_str() << endl;

	s1.insert(0, 3, '#');
	cout << s1.c_str() << endl;

    w::string s2("helloworld");
	s2.insert(5, "%%%%%");
	cout << s2.c_str() << endl;
	
}

void test_string4()
{
    
    
	w::string s1("helloworld");
	cout << s1.c_str() << endl;

	s1.erase(5, 3);
	cout << s1.c_str() << endl;

	s1.erase(5, 30);
	cout << s1.c_str() << endl;

	s1.erase(2);
	cout << s1.c_str() << endl;
}

void test_string5()
{
    
    
	w::string s1("helloworld");
	cout << s1.find('w',2) << endl;

	
}

void test_string6()
{
    
    
	w::string s1("hello world");
	w::string s2(s1);

	cout << s1.c_str() << endl;
	cout << s2.c_str() << endl;

}




int main()
{
    
    
    test_string6();
    return 0;
}

7. Summary

The length of the article is limited, and the remaining content will be explained in the next article.