Analysis of the use of string

How computers represent characters

Computers use digital encoding to represent characters. The most common encoding method is ASCII (American Standard Code for Information Interchange). By mapping each character to 8-bit binary numbers, there are a total of 128 characters, including letters, numbers, punctuation marks and some control characters . The ASCII code table is shown in the figure below:
ascii_Table2.png
Image source: http://www.asciima.com/ascii/12.html
With the development of computer technology, ASCII code has been gradually replaced by Unicode code. Unicode is a standard encoding used to represent all characters, including those in ASCII, as well as characters in almost every language in the world. Unicode encoding uses 32 binary digits to represent each character, which can represent a total of 1.1 million characters .
Regardless of the encoding used, computers convert characters to numbers for processing.

The character type char in C language

Repeat the basic principle: under the computer, characters are converted into numbers for processing.
In the C language, chartypes are used to represent data of character type. charThe length of the type is usually one byte and can represent 256 different characters, including all characters in ASCIIthe inclusion code and extension code.ASCII

  1. char is a number

In the C language, character constants are stored in the form of ASCII codes in the memory, so variables of type char can be treated as integer types, and the ASCII code value can be directly used for calculation .
C language only stipulates that unsigned char is an unsigned 8-bit integer, signed char is a signed 8-bit integer, and the char type only needs to be an 8-bit integer, which can be signed or unsigned, depending on the compilation device decision.

int main()
{
    
    
    char temp = 'A';  // int: 65
    cout << (int)temp << endl;
    temp += 32;      // int: 97, char: a
    cout << temp << endl;
    return 0;
}
  1. char string

In the C language, a string is a set of character arrays arranged in a specific way, terminated by \0a null character . Why does the C language string \0end with a null character: Since each character in the C language string is continuously and compactly arranged in an array, in order for the program to know the length of the string, the string must contain a special character, Used to identify the end of the string, this special character is the null character \0.
The wonderful use of the null character, using the feature of the C language string "terminating with 0", can write 0 at a character that is originally non-zero to end the string early.

int main()
{
    
    
    char temp[12] = "hello c"; // 空格对应的ASCII码值为32
    cout << "sizeof: " << sizeof(temp) << endl;
    cout << "strlen: " << strlen(temp) << endl;

    temp [4] = 0;
    cout << "sizeof: " << sizeof(temp) << endl;
    cout << "strlen: " << strlen(temp) << endl; 
    return 0;
}
//----------outputs-----------------//
// sizeof: 12
// strlen: 7
// sizeof: 12
// strlen: 4
//---------------------------------.//
  1. Escape character in C language
常见的转义符:
'\n': 换行符
'\\': 反斜杠
'\0': 空字符, ASCII码值为0
'0' : 字符0, ASCII码值为48

C++ string class

In C++, std::string is a string class provided in the standard library, which can store strings of any length.

Constructor

C++98 provides 7 ways to construct string objects, and the construction methods are as follows:

  1. Build an empty string with a length of 0;
string();
  1. call the copy constructor;
string (const string& str);
  1. Constructs a string object using parts of another string object;
string (const string& str, size_t pos, size_t len = npos);
  1. Use C language strings;
string (const char* s);
  1. Use the C language string part to build a string object;
string (const char* s, size_t n);
  1. repeated initialization with a single character;
string (size_t n, char c);
  1. Initialize with iterator range;
template <class InputIterator>
  string  (InputIterator first, InputIterator last);

An example of string object initialization is as follows:

// string constructor
#include <iostream>
#include <string>

int main ()
{
    
    
  std::string s0 ("Initial string");

  // constructors used in the same order as described above:
  std::string s1;												\\ 1
  std::string s2 (s0);											\\ 2
  std::string s3 (s0, 8, 3);									\\ 3
  std::string s4 ("A character sequence");						\\ 4
  std::string s5 ("Another character sequence", 12);			\\ 5
  std::string s6a (10, 'x');									\\ 6
  std::string s6b (10, 42);      // 42 is the ASCII code for '*'
  std::string s7 (s0.begin(), s0.begin()+7);					\\ 7

  std::cout << "s1: " << s1 << "\ns2: " << s2 << "\ns3: " << s3;
  std::cout << "\ns4: " << s4 << "\ns5: " << s5 << "\ns6a: " << s6a;
  std::cout << "\ns6b: " << s6b << "\ns7: " << s7 << '\n';
  return 0;
}

Differences between C++ strings and C strings

The main difference between C++ string and C string is that C++ string is a class while C string is an array of characters. There are also the following differences in expression:

  • The C language string is a separate one char* ptr, automatically \0ending with;
  • The C++ string is a string class, and its members have two: char* ptrand size_t len, the second member is used to determine the position of the end, and does not need to \0end with;
char data[20] = "hello\0world";
cout << data << endl;       // hello
string str1(data);
cout << str1 << endl;       // hello
string str2(data, 11);
cout << str2 << endl;       // helloworld

String Common Operations

capacity operation

  1. Returns the length of the string
size_t size() const;
size_t length() const;

The functions of the above two functions are the same, returning the length of the string.

  1. Resize the string to a fixed size
void resize (size_t n);
void resize (size_t n, char c);
  1. Returns the memory size allocated by the string object
size_t capacity() const;
  1. Modify the memory line allocated by the string object
void reserve (size_t n = 0);
  1. clear content
void clear();
  1. Determine whether it is empty
bool empty() const;

access element

The string object has two ways to access elements operator[]and atfunctions. The difference is that atif the subscript i is out of bounds, std::out_of_rangean exception will be thrown to terminate the program. Instead of operator[]throwing an exception, knowledge simply adds the first address pointer of the string ito get a new pointer and dereferences it. If ithe bounds are crossed, the program may crash, or misbehave.

CRUD

  1. insert function

The insert function supports inserting a single character or a string after the character pos in the original string.

string& insert (size_t pos, const char* s);
string& insert(size_t pos, char c)
  1. push_back function

Call the insert function to insert a single character at the end of the string.

void push_back (char c);
  1. append function

Call the insert function to insert the string content at the end of the string.

string& append (const char* s);
  1. operator+= operator overloading

Call the insert function to insert a single character and string content at the end of the string.

string& operator+= (const char* s);
string& operator+= (char c);
  1. erase function

eraseThe function is used to delete the content of the posnext element from the position len.

 string& erase (size_t pos = 0, size_t len = npos);
  1. find function

findposThe function supports searching from the specified position , whether it contains a single character cor string content.

size_t find (const char* s, size_t pos = 0) const;
size_t find (char c, size_t pos = 0) const;

cut off a substring

substrThe function will posstart from the first character and intercept a substring with a length of len, and the content of the original string will not change.
Boundary issues:

  • If the length of the remaining part of the original string is less than len, a substring whose length is less than len is returned without error.
  • If pos exceeds the range of the original string, std::out_of_rangean exception is thrown
string substr (size_t pos = 0, size_t len = npos) const;

Simulation implementation

class string
{
    
    
    private:
        char* _str;
        size_t _size;
        size_t _capacity;

        static const size_t npos;
    public:
        typedef char* iterator;
        typedef const char* const_iterator;

        iterator begin()
        {
    
    
            return _str;
        }
        const_iterator begin() const
        {
    
    
            return _str;
        }
        iterator end()
        {
    
    
            return _str + _size;
        }
        const_iterator end() const
        {
    
    
            return _str + _size;
        }
        // 构造函数
        string(const char* str="")
        : _size(strlen(str)), _capacity(_size)
        {
    
    
            _str = new char[_capacity + 1] ;
            strcpy(_str, str);
        }
        // 析构函数
        ~string()
        {
    
    
            delete[] _str;
            _str = nullptr;
            _size = _capacity = 0;
        }

        // 常用操作
        const char* c_str() const
        {
    
    
            return _str;
        }

        size_t size() const
        {
    
    
            return _size;
        }

        char& operator[] (size_t pos)
        {
    
    
            assert(pos < _size);
            return _str[pos];
        }
        const char&
        operator[] (size_t pos) const
        {
    
    
            assert(pos < _size);
            return _str[pos];               
        }
        // request a change in capacity
        void reverse(size_t n=0)
        {
    
    
            if(n > _capacity)
            {
    
    
                // 重新分配内存, 并释放旧的内存
                char* tmp = new char[n + 1];
                strcpy(tmp, _str);
                delete [] _str;
                _str = tmp;
                _capacity = n;
            }
        }
        void resize(size_t n, char c)
        {
    
    
            if (n <= _size)
            {
    
    
                _size = n;
                _str[n] = '\0';
            }
            else
            {
    
    
                if(n > _capacity)
                {
    
    
                    reverse(n);
                }
                memset(_str + _size, c, n - _size);
                _size = n;
                _str[_size] = '\0';
            }
        }

        // 在指定位置插入一个字符
        string& insert(size_t pos, char c)
        {
    
    
            assert(pos <= _size); // 分配的内存大小为size + 1
            if(_size == _capacity)
            {
    
    
                // 两倍扩容
                reverse(_capacity == 0 ? 4 : _capacity * 2);
            }
            size_t end = _size + 1;
            while(end  > pos)
            {
    
    
                // 指定位置后面的数据向后移动一步
                _str[end] = _str[end - 1];
                --end;
            }
            _str[pos] = c;
            ++_size;
            return *this;
        }
        // 在指定位置插入字符串
        string& insert(size_t pos, const char* s)
        {
    
    
            assert( pos <= _size);
            size_t len = strlen(s);
            if(_size + len > _capacity)
            {
    
    
                reverse(_size + len);
            }
            size_t end = _size + len;
            while (end >= pos +len)
            {
    
    
                _str[end] = _str[end - len];
                --end;
            }
            strncpy(_str + pos, s, len);
            _size += len;
            return *this;
        }
    void push_back(char ch)
    {
    
    
        insert(_size, ch);
    }
    void append(const char* str)
    {
    
    
        insert(_size, str);
    }
    string& operator+= (char ch)
    {
    
    
        push_back(ch);
        return *this;
    }
    string& operator+=(const char* str)
    {
    
    
        append(str);
        return *this;
    }
    // 删除
    string& erase(size_t pos=0, size_t len=npos)
    {
    
    
        assert(pos < _size);
        if(len == npos || pos + len >= _size)
        {
    
    
            _str[pos] = '\0';
            _size = pos;
        }
        else
        {
    
    
            strcpy(_str + pos, _str + pos + len); // 使用后面的内容覆盖前面的内容
            _size -= len;
        }
        return *this;
    }
    void clear()
    {
    
    
        _str[0] = '\0';
        _size = 0;
    }
    // 查询
    size_t find(char ch)
    {
    
    
        // 顺序查找
        for(size_t i=0; i<_size; ++i)
        {
    
    
            if(ch == _str[i])
                return i;
        }
        return npos; // -1
    }
    size_t find(const char* s, size_t pos = 0)
    {
    
    
        const char* ptr = strstr(_str+pos, s);
        if(ptr == nullptr)
            return npos;
        else 
            return ptr - _str;
    }
};
const size_t string::npos = -1;

Common operator overloading implementations:

 // 常见运算符重载
bool operator< (const string& s1, const string& s2)
{
    
    
    return strcmp(s1.c_str(), s2.c_str()) < 0;
}
bool operator== (const string& s1, const string& s2)
{
    
    
    return strcmp(s1.c_str(), s2.c_str()) == 0;
}
bool operator<= (const string& s1, const string& s2)
{
    
    
    return s1 < s2 || s1 == s2;
}
bool operator> (const string& s1, const string& s2)
{
    
    
    return !(s1 <= s2);
}
bool operator>= (const string& s1, const string& s2)
{
    
    
    return !(s1 < s2);
}
bool operator!= (const string& s1, const string& s2)
{
    
    
    return !(s1 == s2);
}

// output stream
ostream& operator<< (ostream& out, const string& s)
{
    
    
    for (auto ch : s)
    {
    
    
        out << ch;
    }
    return out;
}

// input stream
istream& operator>>(istream& in, string& s)
{
    
    
    char ch = in.get();
    while(ch != ' ' && ch != '\n')
    {
    
    
        s += ch;
        ch = in.get();
    }
    return in;
}

shallow copy and deep copy

shallow copy

Shallow copy: The compiler just copies the values ​​in the object . If there are management resources in the object, it will eventually cause multiple objects to share the same resource . When an object is destroyed, the resource will be released, and at this time another The object does not know that the resource has been released, and if it continues to operate on the resource, an access violation problem will occur.
As shown in the following code, a string class is defined, and only the constructor and destructor are given, while the copy constructor and assignment operator overload are generated by the compiler by default.

#include <cstring>

class string
{
    
    
private:
	char* _str;
public:
	string(const char* str="")
	: _str(new char[strlen(str) + 1]
	{
    
    
        if(str != nullptr)
        	strcpy(_str, str);
    }
    ~string()
	{
    
    
        delete [] _str;
        _str = nullptr;
    } 
};

The following code, when copying and assigning initialization, will appear that the string objects point to the same space . When running return 0, the destructors of the three objects will be called in turn to release the space.
Since the address spaces pointed to by the three string objects are the same, when str2 is released after the str3 object is destructed, the release fails and an error occurs because the space has already been released.

int main()
{
    
    
    string str1("Hello C++");
    string str2(str1);			// 调用拷贝构造函数
    string str3 = str2;			// 调用赋值运算符重载
    
    return 0;
}

deep copy

To avoid the shallow copy problem, a deep copy can be used. The simple implementation code is as follows:

// 拷贝构造函数
string::string(const string& s)
: _str(new char[strlen(s._str) + 1)
{
    
    
    if(s._str != nullptr)
        strcpy(_str, s._str);
}
// 赋值运算符重载
string& string::operator=(const string& s)
{
    
    
    if( nullptr != s._str)
    {
    
    
        delete[] _str;
        _str = new char[strlen(s._str) + 1];
        strcpy(_str, s._str);
    }
    return *this;
}

reference count

#include <cstring>

class string
{
    
    
private:
	char* _str;
	int* _pCount;
public:
	string(const char* str="")
	: _str(new char[strlen(str) + 1], _pCount(new int)
	{
    
    
        if(str != nullptr)
        	strcpy(_str, str);
    }
	// 拷贝构造函数
	string(const string& s)
    {
    
    
        _str = s._str;
        _pCount = s._pCount;
        ++(*_pCount);
    }
	// 赋值运算符重载
	string& operator=(const string& s)
    {
    
    
        delete[] _str;
        _str = s._str;
        _pCount = s._pCount;
        ++(*pCount);
        return *this;
    }
    ~string()
	{
    
    
        if(--(*_pCount) == 0)
        {
    
    
            delete [] _str;
            delete _pCount;
            _str = nullptr;
            _pCount = nullptr;
        }
    } 
};

Why use pointers for reference counting instead of member variables and static member variables?

  • Member variables: When calling copy and assignment to initialize objects, if member variables are used for reference counting, only the count value of the current object changes, while other shared objects are not updated ;
  • static member variable
class String
{
    
    
private:
    char* _str;
    
public:
    static int _count;
    String(const char* s="")
    : _str(new char[strlen(s) + 1])
    {
    
    
        if(s != nullptr)
        {
    
    
            strcpy(_str, s);
        }
    }
    String(const String& s)
    {
    
    
        _str = s._str;
        
        ++_count;
    }
    String& operator=(const String& s)
    {
    
    
        delete[] _str;
        _str = s._str;
        ++_count;
        return *this;
    }
    ~String()
    {
    
    
        if(--_count == 0)
        {
    
    
            delete[] _str;
            _str = nullptr;
        }
    }
};

int String::_count = 1;

int main()
{
    
    
    String str1("Hello C++");
    String str2(str1);
    String str3 = str1;
    cout << String::_count << endl;		// 3
    
    String str4("other data");
    cout << String::_count << endl;		// 3
    return 0;
}

When using static member variables for counting, since a class has only one copy of the variable, when an existing object calls copy and assignment, multiple objects point to the same memory space, that is, the reference count is greater than 1 at this time. At this time, a new variable is redefined, and there is only one count variable, the content will not change, and the data will be messed up at this time .
Using pointers as reference variables can not only meet the requirements, but also ensure that a reference count is created for each different object without affecting the reference counts of other identical objects.

copy-on-write cow

Problems with shallow copy:

  • Destructuring multiple times
  • Modifications made by one of the objects affect the other

In response to the above problems, a copy-on-write of reference counting is proposed . The specific principles are as follows:

  • For the first problem, when using shallow copy, multiple objects point to the same area. At this time, a counter can be introduced to add one to the counter. When destructing, if the counter is greater than 1, the counter is decremented by one. When the counter is 1, the destructor is called again;
  • For the second question, if the counter is not 1 and the object needs to be modified, a deep copy must be performed;

Therefore, the advantage of copy-on-write of reference counting is that if the object is not modified, it only needs to increase the count, without deep copying, which improves efficiency. Disadvantages: There are thread safety issues in quoting odd numbers, and locks are required, and there is a price to pay in a multi-threaded environment.

reference link

Guess you like

Origin blog.csdn.net/hello_dear_you/article/details/129664013