C ++ learning (c ++ 17)-2.X. Use string and string_view


Original post address : LeoRanbom's blog

Blogger : LeoRanbom

Only posted on the blog of the original post address, all other places are crawled.

If you think it's good, I hope you like it.

Foreword

Two days before the end of the water, a basic small program, and now begin to learn in depth. In this section, I will start with a review of strings, which will involve C-style strings, C ++ strings, and C ++ 17's new standards and new features in strings.

1. Dynamic string

In contact with object-oriented languages ​​such as java and python, their strings have many convenient features: they can be expanded to any size, and substrings can be extracted or replaced. But turning around and looking at the C language will feel particularly painful. It doesn't even have boundary detection.

1.1. C-style strings

As we all know, the C language string is an array of char type. Usually we will dynamically allocate memory to it in the form of char *.

Disadvantages:

  1. The C string does not record its own length information. When obtaining the length of the string, it will traverse the byte array until it encounters a null character, complexity O (n). (Official defines the null character ('\ 0') as NUL, there is only one L)
  2. The C string does not record the length, and it does not determine whether the memory space is sufficient when it is modified. If it is not enough, it will cause a buffer overflow.

Although C ++ is used, some interfaces written in C are often called. So we also have to understand C-style strings

1.1.1. Error-prone

The C string is often omitted from '\ 0'. For example, "hello" is 5 letters, but it consists of 6 characters 'h', 'e', ​​'l', 'l', 'o', '\ 0', so it is in the memory space It is also 6 bytes. And the C language's operation of string strcpy () and strcat () function is extremely unsafe, they will not detect the length of the string.

1.1.2.strcpy()

This is a function to copy a string. It has 2 string parameters, copy the second string to the first string, but it will not consider whether the space is enough.

Here we use function packaging to improve and solve the original problem:

char* copyString(const char* str){
    char* newStr = new char[strlen(str) + 1];//strlen()函数可以获得字符串的长度,但是不算\0
    strcpy(newStr,str);
    return newStr;
}//记得使用完后释放copyString()分配的堆内存

1.1.3.strcat()

Here is the function of splicing strings. It is also 2 string parameters, the second string is assigned to the first and stored in the first. The memory space is also not judged.

Here is also a function to wrap it up so that it can concatenate 3 strings and automatically expand the space.

char* addStrings(const char* str1,const char* str2,const char* str3){
    char* newStr = new char[strlen(str1) + strlen(str2) + strlen(str3) + 1];
    strcpy(newStr,str1);
    strcat(newStr,str2);
    strcat(newStr,str3);
    return newStr;
}

1.1.4. The difference between using the sizeof () keyword or strlen () function on a string

As you can see in 1.1.2 and 1.1.3 above, I used strlen () to get the length of the string. sizeof is to get the number of bytes, and the length can also be expressed according to the principle.

Here I will sort out the pits (In a test organized by the laboratory, I am confused because of unclear concepts, but Wang Xuechang privately told me that it was first, and let me inherit his circuit book This circuit book is the only textbook I have touched during the epidemic, grateful

In a C-style string, sizeof () will return different sizes based on how the string is stored. If it is char [], it returns the actual storage type, including '\ 0'; and if it is char *, sizeof () will return the memory size of the pointer under your system according to the platform (32-bit is 4, 64 is 8)

1.1.5. Security C library

When using C-style string functions in VS, a warning appears that it is obsolete. You can avoid these "safe C libraries" using strcpy_s () and strcat_s (). However, it is best to switch to the C ++ std :: string class.

1.2. String literals

1.2.1. Literal

cout<<"hello"<<endl;

This includes the string itself, not the string variable. It itself is a string literal, written as a value, not as a variable. The memory associated with the literal is located in the memory read-only area. Through this mechanism, the compiler can reuse the same string literals to optimize memory usage. eg a program used hundreds of times "hello" string literals, but in fact only created a hello instance, this is literal pooling

The C ++ standard states that string literals are const char arrays. It can be assigned to a variable, but there is a risk when pointing with a pointer. Such as:

char* ptr = "hello";//声明一个字符串变量(指针指向字符串字面量)
ptr[1] = 'a';//Undefined Behaviour

The literal is const itself, which does not allow modification, but if it is modified here, it is obviously an ub. Depending on the compiler, it may cause the program to crash and may continue to run; it may produce changes and may ignore the changes.

The safer writing habits are:

const char* ptr = "hello";
ptr[1] = 'a';//ERROR,attempt to write to read-only memory

If you write like this, the compiler will report an error because of "trying to write to a read-only area."

Note: If the character array char [] is used, the corresponding size of memory is opened. Then this literal will not be placed in the read-only memory area, nor will the literal pool be used. Of course, it can be changed freely.

1.2.2. Original string literal

The original string literal is a processing method that does not convert the escape character. It ends with R "(beginning with)."

cout<<R"(hello "world"!\n)";

The console will output

hello "world"!\n

If you want to change the line, just press Enter in the original string literal.

But because the end is) ", there must be no)", otherwise it will report an error. The solution is to use the extended raw string literal syntax-optional delimiter sequence.

R"d-char-sequence(lalala)d-char-sequence"

The separator sequence can have up to 16 characters.

1.3. C ++ std :: string class

In C ++, std :: string is a class (actually an instance of the basic_string template class), this class supports Many of the functions provided in can also automatically manage memory allocation. in Defined in the header file.

1.3.1. There are C strings, why are there C ++ strings?

Advantages and disadvantages of C-style strings

Advantage:

  • Simple, using only basic data types (char) and array structures
  • Lightweight, due to the simple structure, it can occupy only the required memory.
  • Low-level, you can operate on the string in the same way as the original memory.
  • Can be understood by C programmers

Disadvantages:

  • In order to simulate a string, it takes a lot of effort.
  • Difficult to use, easy to produce bugs, unsafe
  • C ++ oop ideas are not used
  • Require programmers to understand the underlying

1.3.2. Using the string class

Although string is a class, I prefer to think of it as a simple data type. Through operator overloading, the operation of string is more intuitive and simple.

string A("12");
string B("34");
string C;
C = A + B;//C is "1234"

+ =, ==,! =, <, [], Etc. have been overloaded.

And the string in C languageInsecure (if one is char [] and one is char *, it will return false, because it compares the value of the pointer, not the content of the string. You need to use strcmp (a, b)0 to compare)

For compatibility purposes, you can use the c_str () method of the string class to obtain a C-style const character pointer. However, once the string is reallocated, or the object is destroyed, this pointer will also be invalidated.

Be careful not to return the result of calling c_str () on the stack-based string from the function.

In C ++ 17, the data () method returns char when it is called with a non-const character (while version 14 or earlier always returns const char )

1.3.3.std :: string literal

The string in the source code is still const char *, but if "xxx" s, you can turn the literal into std :: string.

But it requires std :: string_literals or std namespace.

1.3.4. Advanced numerical conversion

There are many auxiliary functions in the std namespace to complete the conversion of numeric values ​​and strings

  • string to_string(int val);
  • string to_string(unsigned val);
  • string to_string(long val);
  • string to_string(unsigned long val);
  • string to_string(long long val);
  • string to_string(unsigned long long val);
  • string to_string(float val);
  • string to_string(double val);
  • string to_string(long double val);

There is also the conversion of the string to a numeric value, where str is the string to be converted, idx is a pointer, the index of the first unconverted character is received, and base represents the hexadecimal number used in the conversion process. The pointer can be a null pointer, if it is a null pointer, it is ignored. If no conversion can be performed, an invalid_argument exception will be thrown, and if it exceeds the range of the return type, an out_of_range exception will be thrown

  • int stoi(const string& str, size_t *idx=0,int base=0);
  • long stol(const string& str, size_t *idx=0,int base=0);
  • unsigned long stoul(const string& str, size_t *idx=0,int base=0);
  • longlong stoll(const string& str, size_t *idx=0,int base=0);
  • unsigned long long stoull(const string& str, size_t *idx=0,int base=0);
  • float stof(const string& str, size_t *idx=0);
  • double stod(const string& str, size_t *idx=0);
  • long double stold(const string& str, size_t *idx=0);

1.3.5. Low-level conversion (C ++ 17)

C ++ 17 provides many low-level numerical conversion functions, in Header file. These functions do not perform memory allocation, but use the memory allocated by the caller. Optimizing them can achieve high performance and is independent of localization. Compared with advanced conversion, the performance is higher and the speed is several orders of magnitude faster. (If the requirements are higher and conversions independent of localization are required, these functions should be used; for example, serialization / deserialization between numeric data and readable formats [json, xml, etc.])

To convert an integer to a character, you can use the following set of functions: to_char_result to_chars(char* first, char* last, IntegerT value, int base = 10);. Here IntegerT can be any integer type or character type. The returned result is of type to_char_result, look at the definition:

struct to_char_result{
    char* ptr;
    errc ec;
};

If the conversion is successful, the ptr member will be equal to the pointer of the next position (one-past-the-end) of the written character. If the conversion fails (ie ec == errc.value_to_large), it is equal to last.

Here is an example:

Low-level conversion.png

Similarly, floating point numbers can be converted:

to_chars_result to_chars(char* first,char* last, FloatT value);
to_chars_result to_chars(char* first,char* last, FloatT value, chars_format format);
to_chars_result to_chars(char* first,char* last, FloatT value, chars_format format, int precision);

You can modify the data type of floating-point numbers by modifying chars_format

enum class chars_float{
    scientific, 	//Style:(-)d.ddde±dd
    fixed,			//Style:(-)ddd.ddd
    hex,			//Style:(-)h.hhhp±d (NoteL no 0x)
    general = fixed|scientific
}

The default format is chars_format :: general, which will cause to_chars () to convert the floating-point value to fixed decimal representation or scientific decimal representation to get the shortest representation, at least thought it exists before the decimal. If the format is specified, but the precision is not specified, then the shortest representation is automatically determined. The maximum precision is 6 digits.

For the opposite conversion (ie number-> string), there is a set of functions:

from_chars_result from_chars(const char* first, const char* last, IntegerT& value, int base = 10);
from_chars_result from_chars(const char* first, const char* last, FloatT& value, chars_format format = chars_format::general);

Here ,from_chars_result is a type defined as follows:

struct from_chars_result{
	const char* ptr;
    errc ec;
};

from_chars will not ignore any leading whitespace. ptr points to the first unconverted character. If all are converted, then point to last, if all are not converted, point to first, ec will be errc :: invalid_argument. If the value is too large, it is errc :: result_out_of_range.

1.4.std :: string_view class (C ++ 17)

Before, it was always hesitant to choose the type of formal parameters in order to accept a read-only string function. I don't know whether to use const string & or const char *. C ++ 17 introduced the std :: string_view class to solve this kind of problem.

It is an instantiation of the std :: basic_string_view class template, defined in the <string_view> header file. It is a simple alternative to const string &. No additional strings are copied, so there is no additional memory overhead. It supports an interface similar to string, but it lacks c_str (). And added remove_prefix (size_t) and remove_suffix (size_t) methods. The former moves the start pointer forward by a certain offset to shrink the string, and the latter moves the end pointer backward by a certain offset to shrink.

Note: It is not possible to connect a string and string_view, and it cannot be compiled (solution: use the method string_view.data ())

When the formal parameter is string_view, you can pass in string, char *, string literal (constant).

If you take const string & as a parameter, you cannot pass in string literal constants and char *. Can only use string. (String_view converted to string class method:

1.xxx.data();

2.string(xxx)//explicit ctor。

1.4.1. String_view literal

You can use "xxxxx" sv to interpret the literal as std :: string_view. Requires namespace std :: string_view_literals or direct std.

1.5. Non-standard strings

Many people don't like to use C ++ style strings, some because they don't know, some are unpleasant. But whether you use MFC, QT built-in string, or use your own string, you should establish a standard for the project or team.

Guess you like

Origin www.cnblogs.com/ranbom/p/12675229.html