We all know that files have different encodings, for example, the commonly used Chinese encodings are: UTF8, GK2312, etc.
In the Windows operating system, the newly created file will be prefixed with a few characters at the beginning to identify the encoding.
For example, create a new text file, write words Hello
, and save as UTF8. Hello
It takes 5 bytes, but the text size is 8 bytes. (This is still the case under the win7 system, the encoding prefix has been removed from win10, so the file size under win10 is still 5 bytes. It seems that Microsoft itself has changed.)
We use StreamWriter
to generate the file.
using (StreamWriter sw = new StreamWriter("a.txt"))
{
sw.Write("Hello"); // 5 字节
}
using (StreamWriter sw = new StreamWriter("b.txt", false, Encoding.UTF8))
{
sw.Write("Hello"); // 8 字节
}
Something weird happened. StreamWriter
The default encoding is UTF8, which is UTF8 encoding. How can the file size be different?
UTF8Encoding
There are two private attributes: emitUTF8Identifier
and isThrowException
, passed in by the constructor during initialization.
emitUTF8Identifier
Whether to add encoding prefixisThrowException
Indicates whether an error is reported when an encoding error is encountered
This shows that whether to add a coding prefix can be controlled.
Encoding
Are UTF8
defined as follows, add code prefix.
public static Encoding UTF8 {
get {
if (utf8Encoding == null) utf8Encoding = new UTF8Encoding(true);
return utf8Encoding;
}
}
The StreamWriter
default encoding used in emitUTF8Identifier=false
:
internal static Encoding UTF8NoBOM {
get {
if (_UTF8NoBOM == null) {
UTF8Encoding noBOM = new UTF8Encoding(false, true);
_UTF8NoBOM = noBOM;
}
return _UTF8NoBOM;
}
}
This is why the two files in the code at the beginning are not the same size.