C++ string class (1)—initialization, capacity operation, iterator

These different string class templates are designed to handle different character encodings and character sets. Each template is designed to handle a specific type of character data.

std::string: This is the most common string class template, used to handle the ASCII character set. It uses single-byte character representation and is suitable for most regular string operations.
std::wstring: This is the wide character version of the string class template, used to handle Unicode characters. It uses the wchar_t type to represent characters and is suitable for situations where multi-language character sets need to be processed.
std::u16string: This is a template for handling UTF-16 encoded strings. UTF-16 uses a 16-bit encoding to represent characters and is suitable for processing larger character sets, such as most Unicode characters.
std::u32string: This is a template for handling UTF-32 encoded strings. UTF-32 uses a 32-bit encoding to represent characters and is suitable for processing character sets that contain all Unicode characters.

These different string class templates provide support for different character encodings and character sets to correctly represent and manipulate characters when working with different types of text data. By choosing the appropriate string class template, you can ensure that character data is correctly processed and manipulated in different application scenarios.

Let’s first understand how computers store characters:

Computers are all in binary form and cannot directly store letters and symbols. At this time, a mapping table is needed, and ASCll was born.

In order to display the text of multiple countries on the computer, unicode was born

Unicode, also called Universal Code and Unicode, was developed byUnicode Alliance and isAn industry standard in the field of computer science, including character set, Coding schemeetc.

Unicode was created to solve the limitations of the traditionalcharacter encoding scheme. It is designed for each character in each language. A unified and uniquebinary encoding is defined to meet the needs of cross-language and cross-platformRequirements for text conversion and processing.

Unicode character encoding schemes are divided into UTF-8, UTF-16 and UTF-32, which are designed to represent and process different ranges of Unicode characters in computer systems.

UTF-8: UTF-8 is a variable-length encoding scheme that uses 1 to 4 bytes to represent different Unicode characters. It is one of the most commonly used Unicode encoding schemes because it is compatible with the ASCII character set and saves space when representing common characters. UTF-8 is suitable for saving space when storing and transmitting text data, especially when it is widely used on the Internet and computer networks.
UTF-16: UTF-16 is a fixed-length or variable-length encoding scheme that uses 16-bit encoding to represent Unicode characters. For most common Unicode characters, UTF-16 uses a 16-bit encoding, but for some less commonly used characters, it requires two 16-bit encodings. UTF-16 is suitable for situations where larger character sets need to be processed, such as multi-language text processing and internationalization applications.
UTF-32: UTF-32 is a fixed-length encoding scheme that uses 32-bit encoding to represent Unicode characters. Every Unicode character is represented using a 32-bit encoding, regardless of whether the character is common or not. UTF-32 is suitable for situations where a character set containing all Unicode characters needs to be processed, such as text processing and character-level operations in some specific fields.

These different Unicode encoding schemes provide different trade-offs and applicability. According to specific needs and application scenarios, the appropriate encoding scheme can be selected to represent and process Unicode characters.

Among them, UTF-8 is used the most.

We in China also have GBK encoding for Chinese characters, which provides support for some rare characters.

1. String class

Document introduction

String is a class that represents a sequence of characters. The standard string class provides support for such objects. Its interface is similar to that of a standard character container, but it adds design features specifically for operating single-byte character strings.

The string class uses char as its character type, using its default char_traits and allocator type (see basic_string for more information on templates).

The string class is an instance of the basic_string template class. It uses char to instantiate the basic_string template class, and uses char_traits and allocator as the default parameters of basic_string (for more template information, please refer to basic_string).

Note that this class handles bytes independently of the encoding used: if used to handle sequences of multibyte or variable-length characters (such as UTF-8), all members of this class (such as length or size) and its iterators The processor will still operate in terms of bytes (rather than actual encoded characters).

Summarize:

string is the string class that represents strings
The interface of this class is basically the same as that of a regular container, with the addition of some regular operations specifically used to operate strings.
At the bottom level, string is actually: an alias of the basic_string template class, typedef basic_string<char, char_traits, allocator> string;
Multi-byte or variable-length character sequences cannot be operated on.

When using the string class, you must include the #include header file string and using namespace std;

2. Initialization

Default constructor string(): Creates an empty string object.
Copy constructor string(const string& str): Create a new string object by copying another string object str .
Substring constructor string(const string& str, size_t pos, size_t len = npos): Starting from the specified position of the string object str pos , create a new A string object with length len. If the len argument is not provided, a substring to the end of the string is created by default.
From C-String constructor string(const char* s): Creates a new string object from a null-terminated C string s .
Constructor from sequence string(const char* s, size_t n): Creates a new character from the first characters of the C string s String object. n
Fill constructor string(size_t n, char c): Creates a new string object containing n characters c .
Range constructor template <class InputIterator> string(InputIterator first, InputIterator last): Creates a new string object through the characters in the iterator range [first, last) . This constructor can accept different types of iterators, such as pointers, container iterators, etc.

1. Without or with ginseng

It can be initialized without parameters or initialized with parameters.

int main()
{
	string s1;
	string s2("hello world");
	string s3 = "hello";
	return 0;
}

We can also access a certain position in the string through the [ ] operator.

#include<string>
int main()
{
	string s2("hello world");
	for (size_t i = 0; i < s2.size(); ++i) {
		s2[i]++;
	}
	cout << s2 << endl;
	return 0;
}

We can output string class objects through the stream insertion operator <<.

This is because the stream insertion operator<< is overloaded in the string class, Stream extraction has also been reloaded.

int main()
{
	string s2;
	cin >> s2;
	for (size_t i = 0; i < s2.size(); ++i) {
		s2[i]++;
	}
	cout << s2 << endl;
	return 0;
}

2. Initialize with string variables

Initializes another character with the specified number of characters starting from the specified position of one character.

int main()
{
	string s3 = "hello";
	string s4(s3, 2, 3);
	cout << s4 << endl;
	return 0;
}

If the number of characters to be fetched exceeds the total character length, just fetch to the end.

int main()
{
	string s3 = "hello";

	string s4(s3, 2, 3);
	cout << s4 << endl;

	string s5(s3, 2, 10);
	cout << s5 << endl;
	return 0;
}

The third parameter can be a character integer by default, then it will be taken from the specified position to the end.

int main()
{
	string s3 = "hello";

	string s4(s3, 2, 3);
	cout << s4 << endl;

	string s5(s3, 2);
	cout << s5 << endl;
	return 0;
}

3. Initialize with string

You can also put the string to be assigned directly in the first parameter position, and the second parameter is the number of assigned characters.

int main()
{
	string s7("hello world", 5);
	cout << s7 << endl;
	string s8("hello world", 5 , 6);
	cout << s8 << endl;

	return 0;
}

In the first constructor, if there is only one parameter except string, the size and length of the first parameter will be assigned starting from the first character of the string by default.
In the second constructor, if the string has two parameters, the first parameter is the specified starting position, and the second parameter is the specified initialization length. If the length is greater than the length of the actual string, it will be truncated to the actual string. length.

4. Specify the number of characters

int main()
{
	string s9(10, '$');
	cout << s9 << endl;

	return 0;
}

3. Capacity operation

1、size

The underlying implementation principles of size() and length() are exactly the same, both are used to obtain the effective character length of a string, introduces size() The reason is to be consistent with the interfaces of other containers. Generally, size() is used.

int main()
{
	string s1("hello world");
	cout << s1.size() << endl;
	cout << s1.length() << endl;

	return 0;
}

You can understand the following two:

max_size()The function returns an unsigned integer representing the maximum number of characters that the string object can hold. This value usually depends on system limitations and therefore may vary between operating systems and compilers.

capacity()The function returns an unsigned integer indicating the size of the memory space currently allocated for the string object. This value may be larger than the actual number of characters the string contains, since string classes usually reserve some extra space for expansion.

int main()
{
	string s1("hello world");
	cout << s1.max_size() << endl;
	cout << s1.capacity() << endl;

	return 0;
}

Output results under 64-bit:

2、push_back

Add a single character to the end of a string

int main()
{
	string s1("hello");
	s1.push_back(' ');
	s1.push_back('!');
	cout << s1 << endl;

	return 0;
}

3、append

We can also use append to add a single character or string at the end of the string, which is the most commonly used form.

int main()
{
	string s1("hello");
	s1.push_back(' ');
	s1.push_back('!');
	cout << s1 << endl;

	s1.append("world");
	cout << s1 << endl;
}

4. += operator

We can also use the += operator to add characters or strings at the end of the string. The bottom layer of += is to call push_back or append. When appending characters to the end of the string, s.push_back(c) / s.append(1, c) / The three implementation methods of s += 'c' are similar. Generally, the += operation of the string class is used more frequently. The += operation can not only connect single characters, but also strings.

int main()
{
	string s1("hello");
	s1 += ' ';
	s1 += '!';
	s1 += "world";
	cout << s1 << endl;
}

5, The structure of string under vs

First look at the following code:

int main()
{
	size_t sz = s.capacity();
	cout << "making s grow:\n";
	cout << "capacity changed: " << sz << '\n';
	for (int i = 0; i < 100; ++i)
	{
		s.push_back('c');
		if (sz != s.capacity())
		{
			sz = s.capacity();
			cout << "capacity changed: " << sz << '\n';
		}
	}
}

The above demonstrates how to use thepush_back() function to add characters to a string object and observe the change in the capacity of the string

First, a variable of type is declared in the main() function and initialized to a character The capacity of string object (assuming that is a string object that has been defined before). size_tszss
Then, by using the cout object and the << operator, a prompt message "making s grow:" is output.
Next, use a loop to add characters 'c' to the string objects one by one from 0 to 99. After each character is added, determine whether the capacity has changed by comparing the previously recorded capacitysz with the capacity of the current string objects.
If the capacity changes, assign the new capacity value to sz and use the cout object and << The operator outputs a prompt message "capacity changed:" and the new capacity value.

The capacity output here does not include \0, which means that the actual capacity must be added by 1.

As you can see in the monitoring, the characters are actually stored in the _Buf array.

In the implementation of C++, the std::string class usually uses two arrays to store the characters of the string. When the length of the string is less than or equal to 15 characters (the sixteenth is \0), the characters of the string are stored in an internal fixed-size array named _Buf. The length of the array is is 15. This avoids dynamic memory allocation and improves performance.

When the length of the string exceeds 15 characters, the characters of the string will be stored in a dynamically allocated array named _Ptr, and the length of the array will be as needed Make dynamic adjustments. This allows longer strings to be accommodated, and memory can be dynamically allocated as needed.

This design can save memory when the string is short and provide sufficient storage space when the string is long. The specific implementation may vary depending on the compiler and standard library, but this strategy of distinguishing between short and long strings is one of the common optimization techniques

The structure of string under vs: string occupies a total of 28 bytes, and the internal structure is a little more complicated. First, there is a union, which is used to determine the storage space of the string in string:

When the string length is less than or equal to 15, an internal fixed character array is used to store it.

When the string length is greater than or equal to 16, space is allocated from the heap

union _Bxty
{ // storage for small buffer or pointer to larger one
	value_type _Buf[_BUF_SIZE];
	pointer _Ptr;
	char _Alias[_BUF_SIZE]; // to permit aliasing
} _Bx;

This design also makes sense. In most cases, the length of the string is less than 16. After the string object is created, there is already a fixed space for a 16-character array inside. There is no need to create it through the heap, which is highly efficient.
Secondly: There is also a size_t field to store the length of the string, and a size_t field to store the total capacity of the space opened on the heap
Finally: There is alsoa pointerthat does something else.
Therefore, it occupies a total of 16+4+4+4=28 bytes.

Output to check the size of string class object s

string s;

cout << sizeof(s) << endl;

You can see that it is 28, so its starting size on the heap is 32.

Observing the expansion situation, we can find that starting from size 32, the capacity is expanded by 1.5 times each time.

6. The structure of string under g++

Under G++, string is implemented through copy-on-write. The string object occupies a total of 4 bytes and contains only one pointer internally.

The needle points to a piece of heap space in the future, which contains the following fields:

total space size
Effective length of string
Reference counting

struct _Rep_base
{
    size_type _M_length;
    size_type _M_capacity;
    _Atomic_word _M_refcount;
};

Pointer to heap space used to store strings.

7、reserve

When the required space size is known, we can use reserve to open up space in advance to reduce capacity expansion and improve efficiency.

Use reverse to open up 100 character space at one time.

int main()
{
	string s;

	s.reserve(100);
	size_t sz = s.capacity();
	cout << "making s grow:\n";
	cout << "capacity changed: " << sz << '\n';
	for (int i = 0; i < 100; ++i)
	{
		s.push_back('c');
		if (sz != s.capacity())
		{
			sz = s.capacity();
			cout << "capacity changed: " << sz << '\n';
		}
	}
}

At this time, there is no need to expand the capacity again and again.

When operating on strings, if you can roughly estimate how many characters to put in, you can reserve the space first through reserve.

8、resize

resize resizes the string to a length of n characters.

If n is less than the current string length, shorten the current value to the first n characters and delete characters other than the nth character.
If n is greater than the current string length, the current content is extended by inserting as many character 0s at the end as possible to reach the size of n.
If the padding character c is specified, the new elements are initialized to a copy of c, otherwise, they are value-initialized characters (null characters).

int main()
{
	// 扩容
	string s1("hello world");
	s1.reserve(100);
	cout << s1.size() << endl;
	cout << s1.capacity() << endl;

	// 扩容+初始化
	string s2("hello world");
	s2.resize(100);
	cout << s2.size() << endl;
	cout << s2.capacity() << endl;

	return 0;
}

resize(size_t n) and resize(size_t n, char c) both change the number of valid characters in the string to n. The difference is that when the number of characters increases: resize(n) fills the extra characters with 0 Element space, resize(size_t n, char c) uses character c to fill the extra element space. Note: When resize changes the number of elements, if the number of elements is increased, the size of the underlying capacity may be changed. If the number of elements is reduced, the total size of the underlying space remains unchanged.

reserve(size_t res_arg=0): Reserve space for string without changing the number of valid elements. When the parameter of reserve is less than the total size of the underlying space of string, reserve will not change the capacity.

4. Iterator

1. Forward iterator

begin points to the first character of the string, and end points to the position after the last character of the string. Its function is similar to that of a pointer.

int main()
{
	string s1("hello world");
	string::iterator it = s1.begin();
	while (it != s1.end()) {
		cout << *it << " ";
		++it;
	}
    return 0;
}

The bottom layer of range for is implemented by calling iterator.

int main()
{
	string s1("hello world");
	string::iterator it = s1.begin();
	while (it != s1.end()) {
		cout << *it << " ";
		++it;
	}
	cout << endl;
	for (auto ch : s1) {
		cout << ch << " ";
	}
	cout << endl;
}

2. Reverse iterator

rbegin points to the last character of the string, and rend points to the position before the first character of the string.

int main()
{
	string::reverse_iterator rit = s1.rbegin();
	while (rit != s1.rend()) {
		cout << *rit << " ";
		++rit;
	}
	cout << endl;
    return 0;
}

3. const iterator (forward and reverse)

Const forward and reverse iterators can only traverse and read data.

int main()
{
	string s1("hello world");

	string::const_iterator it = s1.begin();
	while (it != s1.end()) {
		cout << *it << " ";
		++it;
	}
	cout << endl;

	string::const_reverse_iterator rit = s1.rbegin();
	while (rit != s1.rend()) {
		cout << *rit << " ";
		++rit;
	}
	cout << endl;

	cout << s1 << endl;

	return 0;
}

At this time, you can use auto to automatically deduce the type.

int main()
{
	//string::const_iterator it = s1.begin();
	auto it = s1.begin();
	while (it != s1.end()) {
		cout << *it << " ";
		++it;
	}
	cout << endl;

	//string::const_reverse_iterator rit = s1.rbegin();
	auto rit = s1.rbegin();
	while (rit != s1.rend()) {
		cout << *rit << " ";
		++rit;
	}
	cout << endl;

	return 0;
}

5. OJ exercises

reverse characters

917. Just reverse letters - LeetCode

Use the idea of quick sort.

class Solution {
public:
    string reverseOnlyLetters(string s) {
        size_t begin = 0, end=s.size()-1;
        while(begin<end){
            while(begin<end&&!isalpha(s[begin]))
                ++begin;
            while(begin<end&&!isalpha(s[end]))
                --end;
            swap(s[begin],s[end]);
            ++begin;
            --end;
        }
    return s;
    }
};

Find characters that appear once in a string

387. The first unique character in a string - LeetCode

Use the idea of counting sorting.

class Solution {
public:
    int firstUniqChar(string s) {
        int count[26]={0};
        for(auto ch:s){
            count[ch-'a']++;
        }
        for(int i=0;i<s.size();i++){
            if(count[s[i]-'a']==1)
                return i;
        }
        return -1;
    }
};