Table of contents
Preface
In the previous article, we introduced in detail the use of some common interfaces of the string class. In this article, we will conduct a simulated implementation of string to help everyone understand more deeply.
1. Basic structure of string
In the previous article we learned about:
The bottom layer of string is actually a character array that supports dynamic growth.. Then determine its structure, and then we start to simulate and implement it.
First create a new header file string.h
and define a string class:
class string
{
public :
//成员函数
private :
char* _str;
size_t _size;
size_t _capacity;
};
Here, there are three member variables of the string class, a character pointer
_str
points to the opened dynamic array,_size
identifies the number of valid data, and_capacity
records the size of the capacity (excluding '\0').
But because there is already a string class in the standard library,In order to avoid conflicts, we need to define a namespace and put our own implemented string class into our own namespace。
namespace w
{
class string
{
public :
//成员函数
private :
char* _str;
size_t _size;
size_t _capacity;
};
}
2. Constructor, destructor
2.1 Implementation of constructor
2.1.1 Constructor with parameters
First, let's simulate and implement a constructor with parameters:
We know that there are many constructors of the string class in the standard library. Here we only simulate and implement the most commonly used ones:
In the previous article, we mentioned that we should try to use the initialization list for initialization. We can write like this:
char* str
But here you will find that the program reports an error, because if it is initialized as shown above, it first involves the issue of permission amplification (discussed in the previous article). It is modified and cannot be modified,const
but the assignment_str
is_str
ofchar*
type and can be modified. Secondly, initialization with a constant string cannot be modified.
then what should we do? We do not pass parameters directly here but open space and use strcpy to copy:
string(const char* str)
:_str(new char[strlen(str)+1])
,_size(strlen(str))
,_capacity(strlen(str))
{
strcpy(_str, str);
}
By the way, here we provide an interface to return a string:
const char* c_str()
{
return _str;
}
We are creating a
test.cpp
file to test the interface we wrote:
2.2 Destructor
Here we give the destructor directly:
~string()
{
delete[] _str;
_str = nullptr;
_size = _capacity = 0;
}
2.3 Parameterless constructor
Sometimes we will encounter such a scenario:
So here we need to implement a parameterless constructor.
Assume that the parameterless constructor here is implemented like this:
Is this really feasible?
If a null pointer is passed here_str
, the function just implementedc_str
will return an empty program and the program will crash. And the interface in the standard libraryc_str
will still have a return value even if it is empty.
So what should we do here? We can write this:
string()
:_str(new char[1])
,_size(0)
,_capacity(0)
{
_str[0] = '\0';
}
Here we
_str
open up a space, and then give this space'\0'
. In this way, the above problems will not occur.。
2.4 Merging no-parameter and parameterized constructors
We have mentioned before that you can use the full default for no parameters and with parameters.。
Let’s look at several ways to write it:
Can it be written like this?The answer is that you definitely can’t write it like this. The types will not match. One is a character and the other is a string..
Can it be written like this?The answer is definitely not. If you write strlen like this,str
it will be an empty string.
In fact, it should be written like this:
Here we directly give an empty string, which is"\0"
present
3. String traversal
3.1 operator[ ]
We know that in the standard library, you can access a certain character in a string through subscripting. Let's implement the overloading
[]
.
First we need to implement
size(
the interface:
Next we will implement the following
[]
overloading:
Here we have implemented two versions: the normal version corresponds to ordinary objects, and the const version corresponds to const objects, and these two functions constitute function overloading。
Let’s verify it below:
3.2 Iterator simulation implementation (simple implementation)
In addition to
[]
traversing and accessing string objects, we can also access them using iterators.
As we said, the iterator can be understood as something like a pointer, but it is not necessarily a pointer.
We initially introduced that there are several versions of STL, and the implementation of different versions may be different.
In fact, the string iterator under VS is not implemented using pointers, but the SGI version used under G++ is implemented by pointers.
So here we simulate the implementation using pointers:
Let’s verify it below:
Similarly, we can also use range for to traverse:
the bottom layer of range for is the iterator used.
You can understand that the syntax of range for is actually somewhat similar to the macro we learned before. It will be replaced by an iterator, which is equivalent to assigning *it to ch. The bottom layer of range for is brainless replacement.
3.3 const iterator simulation implementation
Here we implement the const version for const objects to use:
4. Addition, deletion, checking and modification of data
First, let's implement
push_back()
andappend()
. Both of these are inserting data. Since data is inserted, we must consider the issue of expansion.
So if we expand the capacity here, how much should we expand at one time?
For push_back, it is okay to expand twice at a time, but it may not be possible for append to expand twice at a time.
Why?
If the current capacity is 10, adding a string with a length of 25 and expanding the capacity to twice the original size of
20 is not enough.
Then here we reserve through another interface of string, which can change the capacity to the size we specify and help us expand the capacity.
Let's first implement reserve.
4.1 reserve
Let’s first take a look at how to implement reserve:
When the value of parameter n is less than here_capacity
, if this if is not added, the size will be reduced. But we know that Curry’s interface will not be reduced in size. Therefore, this conditional judgment needs to be added.
4.2 push_back和append
Then
reserve
we will continue to implementpush_back
andappend
.
push_back
Here we directly choose to double the expansion.
The capacity hereappend
is at least expanded to_size + len
.
Let’s implement it below:
4.3 +=
Although we have push_back and append, we prefer to use overloaded ones
+=
. Of course, the bottom layer of += can also be implemented using push_back and append.
Let’s implement it below:
4.4 insert
For insert, we mainly implement these two versions of the library:
First, let's implement inserting n characters at the pos position:
the logic is actually relatively simple.First, determine whether expansion is needed, and then insert data. If you insert it in the middle, you need to move the data.
Is there any problem with writing this way? Let’s test it out:
There seems to be no problem. Is it really okay?
Let's look at a special case:
pos = 0
inserting data at that time:
The program hangs here. So why?
Here,pos = 0
when end is equal to 0, it will enter the loop. What will end become again? Is it -1?
The type of end here is szie_t, an unsigned integer, so after end is 0 - - is not -1, but the maximum value of the integer. An out-of-bounds occurrence occurs and the loop does not end normally, so the program crashes.
So how to solve it? Is it possible to change end to int?
It's not feasible here either. Comparing end with pos, end becomes int, but pos is of type size_t, so integer promotion will occur here (C language knowledge). So how should we solve it?
There are many solutions here, and we use one of them to solve it using npos mentioned in our previous article :
Let’s test it again:
Just now we inserted a character, now we will insert a string. Then the logic is actually the same as above. It's just that we only need to move n spaces above, so here we need to move the data to make
strlen(str)
space.
Let's test it below:
4.5 erase
Then let's implement erase and delete len characters from the pos position:
For
erase
the first case, which ispos+len
less than the length of the string, we need to delete the last len characters starting at the pos position, but still retain the subsequent characters. Then here is to move the data at the back and overwrite the ones that need to be deleted.。
In other cases, len is relatively large, and pos+len is directly greater than or equal to the length of the string, then delete everything after pos. Or if the pos parameter is not passed and the default value is npos, then all the following ones must be deleted, so these two situations can be handled uniformly. Here you only need to“\0”
set .
Let’s test it out:
Of course, in order to be consistent with the standard library, we also use reference return here:
4.6 find
Let’s implement it below
find
. The implementation of find is actually very simple. It traverses to find it. If it is found, it returns the subscript. If it cannot find it, it returnsnpos
.
Of course, find also supports searching for a string starting from the pos position: here we reuse the search method in the C language
strstr
.
Let's test it below:
4.7 substr
Let’s implement it next
substr
. Its logic is also very simple.
Something to note here is that we need to conditionally judge that when the intercepted string is long enough, the length we intercept is from
pos
the position to the end of the string.
5. Copy construction
Let's first write a piece of code like this:
There is a copy construct here, s2 is a copy construct of s1.
5.1 Shallow copy default copy structure
In the previous article on classes and objects, we know that the copy constructor will be generated by default if we don’t write it ourselves. Here we run the above code directly:
A classic shallow copy problem occurred when the program went wrong here. We have also talked about it in previous articlesIf not explicitly defined, the compiler will generate a default copy constructor. The default copy constructor copies objects in memory storage byte order. This type of copy is called shallow copy, or value copy.Once resource application is involved, the copy constructor must be written, otherwise it will be a shallow copy and problems will occur.
5.2 Deep copy copy constructor
Here we need to implement the copy constructor ourselves and complete the deep copy:
let's test it:
6. Source code (upper part)
6.1 string.h
#include <iostream>
using namespace std;
namespace w
{
class string
{
public :
typedef char* iterator;
typedef const char* const_iterator;
iterator begin()
{
return _str;
}
iterator end()
{
return _str + _size;
}
const_iterator begin() const
{
return _str;
}
const_iterator end() const
{
return _str + _size;
}
string(const char* str = "")
:_str(new char[strlen(str)+1])
,_size(strlen(str))
,_capacity(strlen(str))
{
strcpy(_str, str);
}
string(const string& s)
{
_str = new char[s._capacity + 1];
strcpy(_str, s._str);
_size = s._size;
_capacity = s._capacity;
}
~string()
{
delete[] _str;
_str = nullptr;
_size = _capacity = 0;
}
const char* c_str() const
{
return _str;
}
size_t size() const
{
return _size;
}
char& operator[](size_t pos)
{
assert(pos < _size);
return _str[pos];
}
const char& operator[](size_t pos) const
{
assert(pos < _size);
return _str[pos];
}
void reserve(size_t n)
{
if (n > _capacity)
{
char* tmp = new char[n + 1];
strcpy(tmp, _str);
delete[] _str;
_str = tmp;
_capacity = n;
}
}
void push_back(char ch)
{
if (_size == _capacity)
{
// 2倍扩容
reserve(_capacity == 0 ? 4 : _capacity * 2);
}
_str[_size] = ch;
++_size;
_str[_size] = '\0';
}
void append(const char* str)
{
size_t len = strlen(str);
if (_size + len > _capacity)
{
// 至少扩容到_size + len
reserve(_size+len);
}
strcpy(_str + _size, str);
_size += len;
}
string& operator+=(char ch)
{
push_back(ch);
return *this;
}
string& operator+=(const char* str)
{
append(str);
return *this;
}
void insert(size_t pos, size_t n, char ch)
{
assert(pos <= _size);
if (_size +n > _capacity)
{
// 至少扩容到_size + len
reserve(_size + n);
}
// 添加注释最好
size_t end = _size;
while (end >= pos && end != npos)
{
_str[end + n] = _str[end];
--end;
}
for (size_t i = 0; i < n; i++)
{
_str[pos + i] = ch;
}
_size += n;
}
void insert(size_t pos, const char* str)
{
assert(pos <= _size);
size_t len = strlen(str);
if (_size + len > _capacity)
{
// 至少扩容到_size + len
reserve(_size + len);
}
// 添加注释最好
size_t end = _size;
while (end >= pos && end != npos)
{
_str[end + len] = _str[end];
--end;
}
for (size_t i = 0; i < len; i++)
{
_str[pos + i] = str[i];
}
_size += len;
}
string& erase(size_t pos, size_t len = npos)
{
assert(pos <= _size);
if (len == npos || pos + len >= _size)
{
_str[pos] = '\0';
_size = pos;
_str[_size] = '\0';
}
else
{
size_t end = pos + len;
while (end <= _size)
{
_str[pos++] = _str[end++];
}
_size -= len;
}
return *this;
}
size_t find(char ch, size_t pos = 0)
{
assert(pos < _size);
for (size_t i = pos; i < _size; i++)
{
if (_str[i] == ch)
{
return i;
}
}
return npos;
}
size_t find(const char* str , size_t pos = 0)
{
assert(pos < _size);
const char* ptr = strstr(_str + pos, str);
if (ptr)
{
return ptr - _str;
}
else
{
return npos;
}
}
string substr(size_t pos = 0, size_t len = npos)
{
assert(pos < _size);
size_t n = len;
if (len == npos || pos + len > _size)
{
n = _size - pos;
}
string tmp;
tmp.reserve(n);
for (size_t i = pos; i < pos + n; i++)
{
tmp += _str[i];
}
return tmp;
}
private :
char* _str;
size_t _size;
size_t _capacity;
public:
const static size_t npos;
};
const size_t string::npos = -1;
}
6.2 test.cpp
#include "Mystring.h"
void test_string1()
{
w ::string s1("hello world");
cout << s1.c_str() << endl;
for (size_t i = 0; i < s1.size(); i++)
{
cout << s1[i] << " ";
}
cout << endl;
w ::string::iterator it = s1.begin();
while (it != s1.end())
{
cout << *it << " ";
++it;
}
cout <<endl;
for (auto ch : s1)
{
cout << ch <<" ";
}
cout <<endl;
}
void test_string2()
{
w::string s1("hello world");
cout << s1.c_str() << endl;
s1.push_back(' ');
s1.push_back('#');
s1.append("hello");
cout << s1.c_str() << endl;
w::string s2("hello world");
cout << s2.c_str() << endl;
s2 += ' ';
s2 += '#';
s2 += "hello code";
cout << s2.c_str() << endl;
}
void test_string3()
{
w::string s1("helloworld");
cout << s1.c_str() << endl;
s1.insert(5, 3, '#');
cout << s1.c_str() << endl;
s1.insert(0, 3, '#');
cout << s1.c_str() << endl;
w::string s2("helloworld");
s2.insert(5, "%%%%%");
cout << s2.c_str() << endl;
}
void test_string4()
{
w::string s1("helloworld");
cout << s1.c_str() << endl;
s1.erase(5, 3);
cout << s1.c_str() << endl;
s1.erase(5, 30);
cout << s1.c_str() << endl;
s1.erase(2);
cout << s1.c_str() << endl;
}
void test_string5()
{
w::string s1("helloworld");
cout << s1.find('w',2) << endl;
}
void test_string6()
{
w::string s1("hello world");
w::string s2(s1);
cout << s1.c_str() << endl;
cout << s2.c_str() << endl;
}
int main()
{
test_string6();
return 0;
}
7. Summary
The length of the article is limited, and the remaining content will be explained in the next article.