"C++ Programming Principles and Practice" Notes Chapter 11 Custom Input/Output

In this chapter, we focus on how to adapt the general iostream framework introduced in Chapter 10 to specific needs and preferences.

11.1 Regularity and irregularity

The input/output part of the C++ standard library - the iostream library provides a unified and extensible framework for text input and output.

So far we have treated all input sources as equivalent, and sometimes this is not enough. For example, a file differs from other input sources such as a network connection in that it can be addressed by individual bytes (whereas bytes for a network connection arrive in a stream) (similar to the difference between vectors and iterators). Also, we assume that an object's type completely determines its input and output format, but this is not entirely true. For example, we often want to specify the number of digits (precision) when outputting floating-point numbers. This chapter presents some methods by which the input/output can be customized as required.

As programmers, we prefer regularity: treating all objects consistently, treating all input sources as equivalent, enforcing a single standard for the way objects are represented gives the cleanest, simplest, most maintainable and generally the most efficient code. However, programs exist to serve humans, and humans have strong preferences (irregularity). Therefore, as programmers, we must strive to strike a balance between program complexity and satisfying user preferences.

11.2 Output Formatting

11.2.1 Integer output

Integer values ​​can be output as octal, decimal, and hexadecimal. A hexadecimal number represents exactly 4 binary bits, and 2 hexadecimal digits can be used to represent a byte.

The (decimal) number 1234 can be specified to output in decimal, hexadecimal, or octal:

cout << dec << 1234 << " (decimal)\n"
        << hex << 1234 << "  (hexadecimal)\n"
        << oct << 1234 << " (octal)\n";

will output

1234 (decimal)
4d2  (hexadecimal)
2322 (octal)

That is, 1234 in decimal = 4d2 in hexadecimal = 2322 in octal.

where dec, hexand octdo not output values, but tell the output stream that any subsequent integer values ​​should be output in decimal/hexadecimal/octal, which defaults to if not specified dec. These three operations are persistent (persist) / "sticky (sticky)", that is, the output of each integer value takes effect until another base is specified. Terms like hexand octfor changing the behavior of a stream are called manipulators .

Output integers using different bases

Note:

  • dec, hexand are equivalent to the , and of octthe C standard library , but the C++ manipulator is persistent, and the format specifier is only valid for one parameter.printf%d%x%oprintf
  • These three manipulators are actually three functions, which respectively set the format flags corresponding to the output stream. endlThis function is also frequently used , and its function is to write a character to the output stream '\n'. Also, an overloaded operator that takes a function pointer argument <<is calling that function with the stream itself as an argument. Its definition is roughly equivalent to:
ostream& ostream::operator<<(ios_base& (*pf)(ios_base&)) {
    
    
    pf(*this);
    return *this;
}

ios_base& dec(ios_base& base) {
    
    
  base.setf(ios_base::dec, ios_base::basefield);
  return base;
}

ios_base& hex(ios_base& base) {
    
    
    base.setf(ios_base::hex, ios_base::basefield);
    return base;
}

ios_base& oct(ios_base& base) {
    
    
    base.setf(ios_base::oct, ios_base::basefield);
    return base;
}

ostream& endl(ostream& os) {
    
    
    os.put('\n');
    return os;
}

Therefore, the following statements are equivalent:

cout << endl;
cout.operator<<(endl);
endl(cout);
cout.put('\n');
cout << hex << 1234;
cout.operator<<(hex).operator<<(1234);
hex(cout) << 1234;
{
    
    
    cout.setf(ios_base::hex, ios_base::basefield);
    cout << 1234;
}
  • The input/output stream uses the bitmask type ios_base::fmtflags to represent the format flag, and sets or clears the format flag through flags(), setf()and unsetf()three member functions. Manipulators such as hex, showbase, and so on are auxiliary functions that set the corresponding flags by calling them.boolalphasetf()

By default, the base is not displayed when using other bases to output values ​​(for example, the hexadecimal output of 1234 is "4d2" instead of "0x4d2"). Bases can be displayed using manipulators showbase:

cout << dec << 1234 << ' ' << hex << 1234 << ' ' << oct << 1234 << '\n';
cout << showbase;    // show bases
cout << dec << 1234 << ' ' << hex << 1234 << ' ' << oct << 1234 << '\n';

will output

1234 4d2 2322
1234 0x4d2 02322

That is, decimal numbers have no prefix, octal numbers have a prefix of "0", and hexadecimal numbers have a prefix of "0x". This is completely consistent with the representation of integer literals in C++ source code (see Chapter 2 Types, Operators and Expressions, Section 2.3 of "C Programming Language" Notes).

showbaseIt is also durable. The manipulator noshowbaserestores the default behavior of not showing the base.

Note: showbaseIt is equivalent to printfin the format specifier #, ie %#xand %#o, the difference is showbasethat it is persistent.

Summary: Integer Output Manipulators

manipulator effect
dec use decimal (default)
hex use hexadecimal
oct use octal
showbase show base prefix
noshowbase Do not show base prefix (default)

Note: All bit switch manipulators are defined in the header file <ios> (including <iostream> will automatically include the header file), see ios-cplusplus.com and ios-cppreference.com for a complete list , parameterized manipulators Defined in the header file <iomanip> (see Section 11.2.4).

11.2.2 Integer input

By default, >>the value is assumed to be in decimal representation, it can also be used hexor octspecified to read in hexadecimal or octal:

Enter integers in different bases

if input

1234 4d2 2322 2322

will output

1234 1234 1234 1234

When reading integers, decno prefix is ​​accepted, hexan optional "0x" prefix is ​​accepted, octan optional "0" prefix is ​​accepted (all three manipulators accept an optional leading 0). For example:

manipulator enter read integer value (decimal)
dec “1234”, “01234”, “00001234” 1234
hex “4d2”, “0x4d2”, “04d2”, “0x04d2” 1234
oct “2322”, “02322”, “00002322” 1234

You can use the member function of the stream unsetf()to clear all the corresponding flags: dec, hexand the input stream is in the state of accepting three bases at the same time. Now, for the codeoctcin.unsetf(ios_base::basefield)

cin >> a >> b >> c >> d;

if input

1234 0x4d2 02322 02322

will output

1234 1234 1234 1234

If unsetf()the function is not called, then bthe input will fail because "0x4d2" is not a valid decimal number.

11.2.3 Floating point output

The floating-point number output format manipulators are as follows:

manipulator effect
fixed Use fixed floating point representation
scientific expressed in scientific notation
defaultfloat Choose a more precise representation in fixedand (default)scientific

Note: These three manipulators are persistent, similar to printf, %fand %e, respectively %g.

For example:

cout << 1234.56789 << "      (defaultfloat)\n"
        << fixed << 1234.56789 << "  (fixed)\n"
        << scientific << 1234.56789 << " (scientific)\n";

will output

1234.57      (defaultfloat)
1234.567890  (fixed)
1.234568e+03 (scientific)

11.2.4 Accuracy

By default, defaultfloatformat prints floating-point values ​​using 6 significant digits , choosing the most appropriate format and rounding according to the rounding rules. For example, 1234.5678 is printed as "1234.57", 1.2345678 is printed as "1.23457", and 1234567.0 is printed as "1.23457e+06".

Output floating point numbers using different formats

Manipulators can be used setprecision(n)to set the precision (precision): for defaultfloatrefers to the number of significant digits, for fixedand scientificrefers to the number of digits after the decimal point, the default is 6. For example:

set precision

will print (note the rounding)

1234.57	1234.567890	1.234568e+03
1234.6	1234.56789	1.23457e+03
1234.5679	1234.56789000	1.23456789e+03

Note:

  • setprecision()is persistent.
  • For the floating-point number in fixedthe format, setprecision(n)it is equivalent to the printfformat specifier , which is invalid for integers and strings ( the precision in the format specifier is valid for integers and strings, see "C Programming Language" Notes Chapter 7 Input and Output Section 7.2 ).%m.nfnprintf
  • setprecision()And other parameterized manipulators are defined in the header file <iomanip> (I/O manipulators). The difference with the , and hexother showbasepositional switch manipulators is that these manipulators need to specify a parameter.
  • os << setprecision(n)Equivalent toos.precision(n)

11.2.5 Domains

For integers, floating-point numbers, and strings, manipulators can be used to setw(n)specify exactly how wide a value should be in the output. This mechanism is called a field . This is useful for printing tables. For example:

set field width

will print

12345|12345|   12345|12345|
1234.5|1234.5|  1234.5|1234.5|
abcde|abcde|   abcde|abcde|

Notice:

  • setw()Not persistent.
  • setw(n)Equivalent to in printfthe format specifier .%ndn
  • When the field width is smaller than the actual width, it will not be truncated, and the field width is invalid; when the field width is larger than the actual width, spaces will be filled, and the default right alignment. You can use manipulators setfill(c)to specify padding characters, leftand rightto specify alignment.

print contact form

11.3 File opening and positioning

From a C++ perspective, a file is an abstraction provided by the operating system. As described in Section 10.3, a file is a sequence of bytes numbered starting from 0. The properties of the stream determine what operations can be performed on the file after it is opened, and what the operations mean.

11.3.1 File Open Mode

There are various file opening modes. By default, ifstreamfiles are opened for reading and ofstreamfiles are opened for writing, which satisfies most common needs. However, other open modes can also be chosen, represented using the bitmask type ios_base::openmode :

open mode meaning
app ( app end) append mode
ate ( at e nd ) end-of-file mode
binary binary mode
in ( in put) read mode
out ( output put ) write mode
trunc ( trunc ate) overwrite the original content

The opening mode can be specified after the filename parameter of the filestream constructor:

ofstream ofs(name1);  // defaults to ios_base::out
ifstream ifs(name2);  // defaults to ios_base::in
ofstream ofs2(name, ios_base::app);  // ofstreams by default include io_base::out
fstream fs(name, ios_base::in | ios_base::out);  // both in and out

Among them |is the bitwise OR operator, which can be used to combine multiple patterns.

The exact effect of opening a file depends on the operating system. If the operating system cannot open the file in a particular mode, the result is a non- good()state for the stream. The most common reason for failure to open a file in read mode is that the file does not exist.

Note that if you open a non-existing file in write mode, the operating system will create a new file; while opening a non-existing file in read mode will fail .

Note:

  • ifstreamThe default open modes for and ofstreamare respectively inand , and are automatically added, outrespectively, even if other modes are specified .inout
  • appThe difference with ate: appit locates to the end of the file before each write operation, so it can only write data at the end of the file; it atelocates to the end of the file immediately after opening the file, but it can then locate to other positions in the file (see Section 11.3.3 ).
  • The corresponding relationship between the opening mode of the C++ file stream and fopen()the opening mode of the C standard library function is as follows:
C++ mode C mode meaning
in r read
outorout | trunc w Write (overwrite original content)
apporout | app a addition
in | out r+ read or write
in | out | trunc w+ read or write (overwrite original content)
in | out | app a+ read or append
in | binary rb read binary file
out | binaryorout | trunc | binary wb Write binary file (overwrite original content)
app | binaryorout | app | binary ab append binary
in | out | binary r+borrb+ read or write binary files
in | out | trunc | binary w+borwb+ 读或写二进制文件(覆盖原有内容)
in | out | app | binary a+bab+ 读或追加二进制文件

来源:https://timsong-cpp.github.io/cppwp/n3337/input.output#tab:iostreams.file.open.modes

11.3.2 二进制文件

默认情况下,iostream以文本模式读写文件,即读写字符序列(将字节按照特定字符集的编码转换为字符)。但是,也可以让istreamostream直接读写字节。这称为二进制I/O (binary I/O),通过以binary模式打开文件实现。

例如,下图是整数12345分别在文本文件和二进制文件中的表示方式:

文本文件和二进制文件

在文本文件中,使用31 32 33 34 35五个字节来表示 “12345” 这个字符串(字符 ‘1’ 的ASCII码是49,等于十六进制的0x31,以此类推);在二进制文件中,则使用四个字节39 30 00 00来表示4字节整数0x00003039(小端顺序),等于十进制的12345。

注:磁盘上只能保存由0和1组成的二进制数据,但通常用十六进制作为简便表示,与二进制的对应关系是:每个十六进制数对应4个二进制位,如下表所示

十六进制数 二进制数 十六进制数 二进制数
0 0000 8 1000
1 0001 9 1001
2 0010 A 1010
3 0011 B 1011
4 0100 C 1100
5 0101 D 1101
6 0110 E 1110
7 0111 F 1111

因此,上图所示的两个文件在磁盘上保存的实际数据分别为

31 32 33 34 35 = 00110001 00110010 00110011 00110100 00110101
   39 30 00 00 = 00111001 00110000 00000000 00000000

从这个角度看,文本文件和二进制文件本质上并没有区别,都是二进制字节数据,字节的含义完全是由文件格式人为定义的(如10.3节所述)。

下面是一个读写二进制整数文件的例子:

读写二进制整数文件

这里使用模式ios_base::binary打开二进制文件:

ifstream ifs(iname, ios_base::binary);
ofstream ofs(oname, ios_base::binary);

当我们从面向字符的I/O转向二进制I/O时,不能使用>><<运算符,因为这两个运算符按默认规则将值转换为字符序列(例如,字符串"asdf"转换为字符 ‘a’, ‘s’, ‘d’, ‘f’,整数123转换为字符 ‘1’, ‘2’, ‘3’)。而binary模式告诉流不要试图对字节做任何“聪明”的处理。

在这个例子中,对于int的“聪明”的处理是指用4个字节存储一个int(就像在内存中的表示方式一样),并直接将这些字节写入文件。之后,可以用相同的方式读回这些字节并重组出int

ifs.read(as_bytes(x), sizeof(int));
ofs.write(as_bytes(x), sizeof(int));

istreamread()ostreamwrite()都接受一个地址(这里由as_bytes()函数提供)和字节(字符)数量(这里使用运算符sizeof获得),其中地址指向保存要读/写的值的内存区域的第一个字节。例如,有一个int变量i,其值为1234(用十六进制表示为0x000004d2)则将其写入二进制文件的过程如下图所示:

写二进制文件

首先,通过as_bytes(i)获得指向i的第一个字节的地址p(假设为0xfc40),之后调用ofs.write(p, 4)将从该地址开始的4个字节写入ofs,即write()所做的事仅仅是简单的字节拷贝,read()同理。

注:

  • 从上图中可以看出,内存在本质上与文件一样,都是编号的字节序列。这里的编号叫做地址(address),通过取地址运算符&获得,例如&i;保存地址的变量叫做指针(pointer),例如p。详见17.3节和《C程序设计语言》笔记 第5章 指针与数组
  • 由于read()write()函数的第一个参数类型是char*,而i的地址&i的类型是int*,因此as_bytes()函数使用reinterpret_cast将其强制转换为char*类型(但指针的值不变),从而将一个int的4个字节视为4个char,见17.8节。

二进制I/O复杂、容易出错。然而,对于某些文件格式必须使用二进制I/O,典型的例子是图片或声音文件。iostream库默认提供的字符I/O可移植、人类可读,而且被类型系统所支持。如果可以选择,尽量使用字符I/O(文本格式)。

11.3.3 在文件中定位

只要可以,最好使用从头到尾读写文件的方式,这是最简单、最不容易出错的方式。很多时候,当你需要修改一个文件,更好的方式是生成一个新的文件。

但是,如果必须“原地”修改文件,可以使用定位(positioning/seek)功能:在文件中选择一个特定的位置(字节编号)进行读写。每个以读模式打开的文件都有一个读位置(read/get position),每个以写模式打开的文件都有一个写位置(write/put position),如下如所示。

读写位置

可以使用istreamostream的以下函数定位读/写位置:

函数 作用
tellg() 获取当前读位置
seekg() 设置读位置(g = “get”)
tellp() 获取当前写位置
seekp() 设置写位置(p = “put”)

例如:

fstream fs(name);  // open for input and output
if (!fs) error("can't open ", name);

fs.seekg(5);  // move reading position to 5 (the 6th character)
char ch;
fs >> ch;     // read and increment reading position
cout << "character[5] is " << ch << ' (' << int(ch) << ")\n";

fs.seekp(1); // move writing position to 1
fs << 'y';   // write and increment writing position

假设文件test.txt的原始内容为 “abcdefgh”,如上图所示,则上面的程序执行后文件的读写位置如下图所示:

程序执行后的读写位置

其中,运算符>>会使读位置增加读取的字符数,运算符<<会使写位置增加写入的字符数。

注意,如果试图定位到文件结尾之后的位置,结果是未定义的,不同操作系统可能会表现出不同的行为。

11.4 字符串流

可以将一个string作为istream的源或ostream的目标。从字符串读取的istream叫做istringstream,向字符串写入的ostream叫做ostringstream,这两个类定义在头文件 <sstream> 中。例如,istringstream可用于从字符串中提取数值:

字符串转浮点数

如果试图从istringstream的字符串结尾之后读取,istringstream将进入eof()状态。这意味着可以将“标准输入循环”用于istringstream

ostringstream可用于生成格式化字符串(类似于Java的StringBuilder):

ostringstream os;  // stream for composing a message
os << setw(8) << label << ": "
        << fixed << setprecision(5) << temp << unit;
someobject.display(Point(100, 100), os.str());

ostringstream的成员函数str()返回结果字符串。

ostringstream的一个简单应用是拼接字符串:

int seq_no = get_next_number();  // get the number of a log file
ostringstream name;
name << "myfile" << seq_no << ".log";  // e.g., myfile17.log
ofstream logfile(name.str());  // e.g., open myfile17.log

istringstreamostringstream均支持以下操作:

成员函数 作用
默认构造函数 使用空字符串初始化字符串流
构造函数(s) 使用字符串s的拷贝初始化字符串流
str() 返回当前内容的拷贝
str(s) 将字符串s设置为当前内容,覆盖原有内容

通常情况下,我们用一个字符串来初始化istringstream,然后使用>>从字符串中读取字符。相反,通常用一个空字符串初始化ostringstream,然后用<<向其中填入字符,并使用str()获取结果。

11.5 面向行的输入

运算符>>按照对象类型的标准格式读取输入。例如,在读取一个int时,会读取到非数字字符为止;在读取一个string时,会跳过空白符,并读取到下一个空白符为止;在读取一个char时,会跳过空白符,并读取下一个非空白字符。例如:

string name;
cin >> name;           // input: Dennis Ritchie
cout << name << '\n';  // output: Dennis

istream也提供了读取单个字符和整行的功能:

成员函数 作用
get() 读取一个字符并返回
get(c) 读取一个字符并赋值给c,返回输入流对象
getline(s, n) 读取字符并保存到C风格字符串s中,直到遇到换行符或已读取n个字符(包括结尾的空字符)

头文件<string>还提供了一个非成员函数版本getline(is, s),用于从输入流is读取一行内容到字符串s中(丢弃换行符)。如果在遇到换行符之前遇到EOF,则剩余输入仍然会被读取到s中,同时将is置为eof()状态。

例如:

string name;
getline(cin, name);    // input: Dennis Ritchie
cout << name << '\n';  // output: Dennis Ritchie

需要读取一整行的一个常见原因是:空白符的定义不合适。例如,需要将换行符与其他空白符区别对待。例如,游戏中的文本交互可能将一行视为一句话,此时需要先读取一整行,然后从中提取各个单词:

string command;
getline(cin, command);  // read the line

stringstream ss(command);
vector<string> words;
for (string s; ss >> s;)
    words.push_back(s);  // extract the individual words

但是,只要可以选择,还是尽量使用>>而不是getline()

11.6 字符分类

通常,我们按习惯格式读取整数、浮点数、字符串等。然而,有时必须读取单个字符并判断字符种类(例如7.8.2节词法分析器读取标识符)。

标准库头文件<cctype>定义了一组字符分类和转换函数,例如判断一个字符是否是数字、是否是大写字母,转换大小写等。

11.7 使用非标准分隔符

本节提供一个接近真实的例子:使用自定义的分隔符读取字符串。istream的运算符>>读取字符串时以默认空白符作为分隔符,而并没有提供自定义分隔符的功能。因此,在4.6.4节的例子中,如果输入

As planned, the guests arrived; then,

将会得到这些“单词”:

As
arrived;
guests
planned,
the
then,

为了去掉这些标点符号,可以直接读取一整行,将标点符号删除或替换为空白符,之后再次读取处理后的输入:

string line;
getline(cin,line);  // read into line
for (char& ch : line)  // replace each punctuation character by a space
    switch(ch) {
    
    
        case ';': case '.': case ',': case '?': case '!':
            ch = ' ';
    }
    stringstream ss(line);  // make an istream ss reading from line
    vector<string> vs;
    for (string word; ss >> word;)  // read words without punctuation characters
        vs.push_back(word);

然而,这段代码有些混乱,并且是非通用的,改变“标点”的定义就无法使用了。下面提出一种更为通用的从输入流中删除不需要的字符的方法。基本思想是:先从一个普通的输入流读取单词,之后将用户指定的分隔符视为空白符,用来分隔单词。例如,“as.not” 应该是两个单词 “as not”。

我们可以定义一个类Punct_stream来实现上述功能。该类必须从一个istream读取字符,要有读取字符串的>>运算符,并且允许用户指定分隔符。

Punct_stream类

和上面的示例一样,基本思想是每次从istream读取一行,将“空白符”(分隔符)转换为空格,然后使用istringstream>>运算符读取字符串,当这一行的字符全部读取完后再读取下一行。另外,可以通过case_sensitive()指定是否大小写敏感。

最有趣的、也是最难实现的操作是运算符>>,下面逐行分析。

首先,while (!(buffer >> s))尝试从名为bufferistringstream中读取(真正的)空白符分隔的字符串。只要buffer中还有字符,buffer >> s就会成功,之后就没什么可做的了,直接返回;当buffer中的字符被消耗完时,buffer >> s会失败,即!(buffer >> s)true,则进入循环体,此时必须从source读取一行,将buffer重新填满。注意,buffer >> s是在一个循环中,因此重新填充buffer后会再次尝试这个操作(第二次读取仍然有可能失败,因为下一行可能是个空行)。

接下来,if (buffer.bad() || !source.good())判断如果buffer处于bad()状态(一般不会发生)或者source不是good()状态(例如到达输入结尾)则放弃,直接返回。否则,清除buffer的错误状态(因为进入循环的原因通常是buffer遇到eof(),即上一行字符被消耗完)。处理流状态总是很麻烦的,而且常常是微妙错误的根源,需要冗长的调试来消除。

循环的剩余部分很简单。getline(source, line)source读取一行到line,然后检查每个字符看是否需要改变:如果是自定义的“空白符”则替换为空格,如果大小写不敏感则转换为小写。最后buffer.str(line)将处理完的内容存入buffer

注意,调用getline()后没有检测source的状态,这里不需要,因为最终会到达循环顶部的!source.good()

最后,将流自身的引用*this作为>>的返回值(见17.10节)。

is_whitespace()的实现很简单,使用string的成员函数find()即可:在字符串中查找指定的字符,如果存在则返回第一次出现的索引,否则返回string::npos

只剩下一个神秘的函数operator booloperator T是一种特殊的重载运算符,叫做类型转换运算符,用于将对象转换为T类型,必须定义为成员函数。

istream的一种习惯用法是测试>>的结果,例如if (is >> s)。而is >> s的结果是一个istream(的引用),这意味着我们需要一种将istream隐式转换为bool值的方法,这就是istream::operator bool()所做的事情(见10.6节):

  • if (is >> s)等价于if ((is >> s).operator bool())
  • if (!(is >> s))等价于if ((is >> s).operator!())

Punct_stream也支持这种操作,因此定义了operator bool函数。

注:

  • basic_ios::good()和10.6节中的表格可以推出以下逻辑蕴涵关系:

    • is.bad()is.fail()
    • is.good()!is.fail()

    从而书中代码给出的Punct_stream::operator bool()的实现可以简化为return source.good();

  • 必须保证operator>>退出循环的条件与operator bool一致,否则像while (ps >> word)这种代码会导致无限循环。

  • 如果最后一行输入的结尾没有换行符,仍然会被读取到buffer中,但此时source的状态是eof()source.good()返回false,因此最后一行中的单词不会被读取。为了能够处理最后一行不以换行符结尾的情况,将operator>>中退出循环的条件改为buffer.bad() || !sourceoperator bool的返回值改为bool(source)即可。

现在可以编写程序:

自定义分隔符的字典

注意:Punct_stream在很多重要的方面都与istream相似(支持>>operator bool),但它并不是真的istream(不是继承自istream)。例如,它没有eof()等状态,也没有提供读取整数的>>。重要的是,不能将一个Punct_stream对象传递给一个期望istream参数的函数。

11.8 还有很多

  • 本地化(locale)
  • 缓冲机制(buffering):streambuf
  • C标准库的printf()scanf()

简单练习

习题

Guess you like

Origin blog.csdn.net/zzy979481894/article/details/128750516