Processing when Java reads stream data and encounters the first character of BOM

When reading text file data through InputStream in a project, it is often encountered that the read character stream contains special first characters. This flag will not be removed when Java reads the file, and String.trim() cannot be deleted, resulting in the read data being 1 longer than the expected length. The special first character at this time may be the system saved text file. BOM ID added when .

What is the BOM character?

BOM stands for Byte Order Mark, which is a method recommended in the Unicode specification to mark the byte order. For example, for UTF-16, if the BOM received by the receiver is \uFEFF, it indicates that the byte stream is Big-Endian; if it receives \uFFFE, it indicates that the byte stream is Little-Endian. The BOM is not required to indicate byte order in UTF-8, but can be used to indicate the encoding rules of UTF-8. The UTF-8 encoding of the BOM is EF BB BF (you can see it by opening the text with UltraEdit and switching to hexadecimal). So if the receiver receives a byte stream starting with EF BB BF, it knows that this is UTF-8 encoding.

A text file created with a text editor under Windows, if you choose to save it in Unicode format such as UTF-8, will add an invisible BOM mark to the file header (the first character) by default.

Influence of BOM characters

When reading data, since the BOM character will not be ignored, and String.trim() cannot be deleted, it will cause unnecessary trouble when we judge the first character. For example, when we need to judge the read string with a certain The BOM character at the beginning of the character may cause the judgment failure, and special treatment is required for the file saved in Unicode format.

How to simply and uniformly handle BOM characters

You can use BOMInputStream in Apache Commons IO to encapsulate the original InputStream to obtain an input stream with BOM characters filtered, and then continue the subsequent operations.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325473016&siteId=291194637