Understand the difference in Java byte stream and character stream

What is the flow

Java in the stream is an abstract sequence of bytes , we can imagine there is a water pipe, but now the water is no longer flowing in the water, but a sequence of bytes. And as water, Java also have a flow in the "direction of flow", the object can usually be read from a sequence of bytes into the input stream is called; able to write to a sequence of bytes is called the output stream object .

Byte stream

The basic unit of Java byte stream processed in a single byte, it is usually used with binary data. Java classes in basic throttle word is InputStream and OutputStream, which represent the most basic input byte stream and output stream of bytes. InputStream class and OutputStream classes are abstract classes, we usually use them in a series of sub-class Java class libraries provided in actual use. Here we InputStream class, for example, to introduce the byte stream under Java. InputStream class defines a basic method for reading bytes from the byte stream read, this method is defined as follows:

public abstract int read() throws IOException;

 This is an abstract method, that is derived from any of the input byte stream InputStream class needs to implement this approach, the utility of this approach is to read a byte from the byte stream, to the end when or -1 otherwise read bytes. About this method we need to note that it would have been blocked to know to return to a reading of bytes or -1. In addition, byte stream by default does not support caching, which means that each method will be called once read request to the operating system to read a byte, which is often accompanied by a disk IO, and therefore the efficiency will be lower. Some may think that a small partner overloaded method to read in class InputStream byte array as a parameter can be read by a number of bytes instead of disk IO frequent. So what is not? Let's look at the source code of this method:

public int read(byte b[]) throws IOException {
  return read(b, 0, b.length);
}

It calls another version of the read overloaded methods, then we will then chase down:

public int read(byte b[], int off, int len) throws IOException {
  if (b == null) {
    throw new NullPointerException(); 
  } else if (off < 0 || len < 0 || len > b.length - off) { 
    throw new IndexOutOfBoundsException(); 
  } else if (len == 0) { 
    return 0; 
  } 
  int c = read(); 
  if (c == -1) { 
    return -1; 
  } 
  b[off] = (byte) c; 
  int i = 1; 
  try { 
    for (; i < len ; i++) { 
      c = read(); 
      if (c == -1) { 
        break; 
      } 
      b[off + i] = (byte)c; 
    } 
  } catch (IOException ee) { } 
  return i; 
}

From the above we can see the code, actually inside (byte []) The method is implemented through read "one" reads a byte array, so this method is essentially unused memory cycles by the read call () method buffer. To use memory buffers to improve the efficiency of reading, we should use BufferedInputStream.

Character stream

Basic character stream processing unit is a Java Unicode symbols (size 2 bytes), it is usually for processing the text data. The so-called Unicode symbols, i.e. a Unicode code units, in the range of 0x0000 ~ 0xFFFF. Each number are within the above range corresponds to a character, Java String type of character to put the default Unicode encoding rules that are then stored in memory. However, the different stored in memory, data stored on disk usually have a variety of encoding. Use different encoding methods, the same characters have different binary representation. In fact the character stream works like this:

  • Output character stream: the sequence of characters to be written to the file (actually a Unicode symbol sequence) into a sequence of bytes specified in the encoding, and then written to a file;
  • The input character stream: The sequence of bytes to be read decoded coding mode is designated by the corresponding sequence of characters (actually Unicode symbol sequence) can be stored in memory.

To deepen our understanding of this process through a demo, sample code as follows:

public class FileWriterDemo { 
  public static void main(String[] args) { 
    FileWriter fileWriter = null; 
    try { 
      try { 
        fileWriter = new FileWriter("demo.txt");
        fileWriter.write("demo"); 
      } finally { 
        fileWriter.close(); 
      } 
    } catch (IOException e) { 
       e.printStackTrace (); 
    } 
  }
}

 

The above code, we use FileWriter writes a "demo" of the four characters to demo.txt, we use a hex editor WinHex look under the demo.txt:

As can be seen from the figure, we write "demo" is coded to "64 65 6D 6F", but we do not explicitly specify the encoding in the above code, in fact, when we used are not specified the operating system's default character encoding to encode the character we want to write.
Since the output character stream prior to actually complete the Unicode symbol sequence corresponding to the converted byte sequence coding mode, so it will use memory buffers to store the converted sequence of bytes, then waits for completion of the conversion are written together into a disk file.

The difference between the character stream with a byte stream

After the above description, we can know the main difference between the byte stream and character stream embodied in the following areas:

  • The base unit byte stream operations (read and write) is byte; a character stream operations is the basic unit of Unicode symbols.
  • Not used by default byte stream buffer; character stream buffer is used.
  • Byte streams commonly used for processing binary data, in fact, it can handle any type of data, but it does not support direct write or read Unicode symbols; normal processing text character data stream, which supports write and read Unicode symbol .
  • Just read and write files, file contents, and unrelated, usually selected byte stream.

Byte stream using scenes

Byte stream suitable for all types of file data transmission , because the computer byte (Byte) is the smallest unit of meaning information representing the computer, because under normal circumstances ACSII code is a space to store a byte.

Character stream usage scenarios

Character stream can only handle plain text data (text files), not other types of data, but the character than the byte stream streaming text processing text to be convenient.

Character by character read data stream: a read two bytes, returns an int value (code) corresponding to the two-byte characters. When writing to the file contents of these two bytes is decoded into binary data corresponding to the character code is written in Unicode. I.e., binary data of the original document is read in the form of characters, then the character is written in binary form, so that the resulting file is stored in a character mode. The image data is stored as bytes, so when you open the picture decoding wrong! Byte stream data byte read: if the conversion between byte and no coding, decoding, need only byte character encoding, decoding! Image data can be read normally.

Original link: www.jianshu.com

Guess you like

Origin www.cnblogs.com/isxiaoming/p/12397850.html