[C #] How to remove the encoding prefix when writing files

We all know that files have different encodings, for example, the commonly used Chinese encodings are: UTF8, GK2312, etc.

In the Windows operating system, the newly created file will be prefixed with a few characters at the beginning to identify the encoding.

For example, create a new text file, write words Hello, and save as UTF8. HelloIt takes 5 bytes, but the text size is 8 bytes. (This is still the case under the win7 system, the encoding prefix has been removed from win10, so the file size under win10 is still 5 bytes. It seems that Microsoft itself has changed.)

We use StreamWriterto generate the file.

using (StreamWriter sw = new StreamWriter("a.txt"))
{
    sw.Write("Hello");  // 5 字节
}

using (StreamWriter sw = new StreamWriter("b.txt", false, Encoding.UTF8))
{
    sw.Write("Hello");  // 8 字节
}

Something weird happened. StreamWriterThe default encoding is UTF8, which is UTF8 encoding. How can the file size be different?

UTF8EncodingThere are two private attributes: emitUTF8Identifierand isThrowException, passed in by the constructor during initialization.

  • emitUTF8Identifier Whether to add encoding prefix
  • isThrowException Indicates whether an error is reported when an encoding error is encountered

This shows that whether to add a coding prefix can be controlled.

EncodingAre UTF8defined as follows, add code prefix.

public static Encoding UTF8 {
    get {
        if (utf8Encoding == null) utf8Encoding = new UTF8Encoding(true);
        return utf8Encoding;
    }
}

The StreamWriterdefault encoding used in emitUTF8Identifier=false:

internal static Encoding UTF8NoBOM {
    get { 
        if (_UTF8NoBOM == null) {
            UTF8Encoding noBOM = new UTF8Encoding(false, true);
            _UTF8NoBOM = noBOM;
        }
        return _UTF8NoBOM;
    }
}

This is why the two files in the code at the beginning are not the same size.

Guess you like

Origin www.cnblogs.com/createwell/p/12731702.html