In-depth study of insider BufferedInputStream

1 Overview

Recent studies JDK source code, found in the IO system BufferedInputStream, is very interesting, usually there are many misconceptions about this class, so to write this blog, for learning

2 BufferedInputStream source code analysis

/**
  * 此类继承FilterInputStream,该类使用了装饰着设计模式,FilterInputStream的源码超级简单
  */
public class BufferedInputStream extends FilterInputStream {
    
    // 默认的buf[]缓存数组大小
    private static int DEFAULT_BUFFER_SIZE = 8192;

    /**
     * The maximum size of array to allocate.
     * Some VMs reserve some header words in an array.
     * Attempts to allocate larger arrays may result in
     * OutOfMemoryError: Requested array size exceeds VM limit
     *
     * buf[]缓存数组最大值 为什么会 减去8呢?因为一些JVM会数组头部存一些数据
     */
    private static int MAX_BUFFER_SIZE = Integer.MAX_VALUE - 8;

    /**
     * The internal buffer array where the data is stored. When necessary,
     * it may be replaced by another array of
     * a different size.
     *
     * 缓存数组,核心成员变量,所有操作都是围绕buf[]
     */
    protected volatile byte buf[];

    /**
     * Atomic updater to provide compareAndSet for buf. This is
     * necessary because closes can be asynchronous. We use nullness
     * of buf[] as primary indicator that this stream is closed. (The
     * "in" field is also nulled out on close.)
     *
     * 多线程相关,确保操作线程安全
     */
    private static final
        AtomicReferenceFieldUpdater<BufferedInputStream, byte[]> bufUpdater =
        AtomicReferenceFieldUpdater.newUpdater
        (BufferedInputStream.class,  byte[].class, "buf");

    /**
     * The index one greater than the index of the last valid byte in
     * the buffer.
     * This value is always
     * in the range <code>0</code> through <code>buf.length</code>;
     * elements <code>buf[0]</code>  through <code>buf[count-1]
     * </code>contain buffered input data obtained
     * from the underlying  input stream.
     *
     * buf[]数组中,有效数据的总数
     */
    protected int count;

    /**
     * The current position in the buffer. This is the index of the next
     * character to be read from the <code>buf</code> array.
     * <p>
     * This value is always in the range <code>0</code>
     * through <code>count</code>. If it is less
     * than <code>count</code>, then  <code>buf[pos]</code>
     * is the next byte to be supplied as input;
     * if it is equal to <code>count</code>, then
     * the  next <code>read</code> or <code>skip</code>
     * operation will require more bytes to be
     * read from the contained  input stream.
     *
     * @see     java.io.BufferedInputStream#buf
     *
     * buf[]数组中,当前读取位置
     */
    protected int pos;

    /**
     * The value of the <code>pos</code> field at the time the last
     * <code>mark</code> method was called.
     * <p>
     * This value is always
     * in the range <code>-1</code> through <code>pos</code>.
     * If there is no marked position in  the input
     * stream, this field is <code>-1</code>. If
     * there is a marked position in the input
     * stream,  then <code>buf[markpos]</code>
     * is the first byte to be supplied as input
     * after a <code>reset</code> operation. If
     * <code>markpos</code> is not <code>-1</code>,
     * then all bytes from positions <code>buf[markpos]</code>
     * through  <code>buf[pos-1]</code> must remain
     * in the buffer array (though they may be
     * moved to  another place in the buffer array,
     * with suitable adjustments to the values
     * of <code>count</code>,  <code>pos</code>,
     * and <code>markpos</code>); they may not
     * be discarded unless and until the difference
     * between <code>pos</code> and <code>markpos</code>
     * exceeds <code>marklimit</code>.
     *
     * @see     java.io.BufferedInputStream#mark(int)
     * @see     java.io.BufferedInputStream#pos
     *
     * 最后一次,调用mark方法,标记的位置
     */
    protected int markpos = -1;

    /**
     * The maximum read ahead allowed after a call to the
     * <code>mark</code> method before subsequent calls to the
     * <code>reset</code> method fail.
     * Whenever the difference between <code>pos</code>
     * and <code>markpos</code> exceeds <code>marklimit</code>,
     * then the  mark may be dropped by setting
     * <code>markpos</code> to <code>-1</code>.
     *
     * @see     java.io.BufferedInputStream#mark(int)
     * @see     java.io.BufferedInputStream#reset()
     *
     * 该变量唯一入口就是mark(int readLimit),比如调用方法,mark(1024),那么后面读取的数据如果
     * 超过了1024字节,那么此次mark就为无效标记,子类可以选择抛弃该mark标记,从头开始。不过具体实现
     * 跟具体的子类有关,在BufferedInputStream中,会抛弃mark标记,重新将markpos赋值为-1
     */
    protected int marklimit;

    /**
     * Check to make sure that underlying input stream has not been
     
     * nulled out due to close; if not return it;
     *
     * 获取真正的输入流
     */
    private InputStream getInIfOpen() throws IOException {
        InputStream input = in;
        if (input == null)
            throw new IOException("Stream closed");
        return input;
    }

    /**
     * Check to make sure that buffer has not been nulled out due to
     * close; if not return it;
     *
     * 获取缓存数组
     */
    private byte[] getBufIfOpen() throws IOException {
        byte[] buffer = buf;
        if (buffer == null)
            throw new IOException("Stream closed");
        return buffer;
    }

    /**
     * Creates a <code>BufferedInputStream</code>
     * and saves its  argument, the input stream
     * <code>in</code>, for later use. An internal
     * buffer array is created and  stored in <code>buf</code>.
     *
     * @param   in   the underlying input stream.
     *
     * 默认缓存数组大小为8kb
     */
    public BufferedInputStream(InputStream in) {
        this(in, DEFAULT_BUFFER_SIZE);
    }

    /**
     * Creates a <code>BufferedInputStream</code>
     * with the specified buffer size,
     * and saves its  argument, the input stream
     * <code>in</code>, for later use.  An internal
     * buffer array of length  <code>size</code>
     * is created and stored in <code>buf</code>.
     *
     * @param   in     the underlying input stream.
     * @param   size   the buffer size.
     * @exception IllegalArgumentException if {@code size <= 0}.
     */
    public BufferedInputStream(InputStream in, int size) {
        super(in);
        if (size <= 0) {
            throw new IllegalArgumentException("Buffer size <= 0");
        }
        buf = new byte[size];
    }

    /**
     * Fills the buffer with more data, taking into account
     * shuffling and other tricks for dealing with marks.
     * Assumes that it is being called by a synchronized method.
     * This method also assumes that all data has already been read in,
     * hence pos > count.
     *
     * 该方法作用,通过丢弃buf[]数据、增大buf[]数组,以腾出位置,将输入流中新的数据保存到buf[]缓存数组中
     */
    private void fill() throws IOException {
        byte[] buffer = getBufIfOpen();
        if (markpos < 0)
            // 因为没有mark标记,直接丢弃buf[]数据
            pos = 0;            /* no mark: throw away the buffer */
        else if (pos >= buffer.length)  /* no room left in buffer */
            if (markpos > 0) {  /* can throw away early part of the buffer */
                int sz = pos - markpos;
                System.arraycopy(buffer, markpos, buffer, 0, sz);
                pos = sz;
                markpos = 0;
            // !!!往下执行,markpos全部等于0
            } else if (buffer.length >= marklimit) {
                markpos = -1;   /* buffer got too big, invalidate mark */
                pos = 0;        /* drop buffer contents */
            } else if (buffer.length >= MAX_BUFFER_SIZE) {
                throw new OutOfMemoryError("Required array size too large");
            } else {            /* grow buffer */
                int nsz = (pos <= MAX_BUFFER_SIZE - pos) ?
                        pos * 2 : MAX_BUFFER_SIZE;
                if (nsz > marklimit)
                    // buf[]长度不超过marklimit,这样mark标记始终有效
                    nsz = marklimit;
                byte nbuf[] = new byte[nsz];
                System.arraycopy(buffer, 0, nbuf, 0, pos);
                if (!bufUpdater.compareAndSet(this, buffer, nbuf)) {
                    // Can't replace buf if there was an async close.
                    // Note: This would need to be changed if fill()
                    // is ever made accessible to multiple threads.
                    // But for now, the only way CAS can fail is via close.
                    // assert buf == null;
                    throw new IOException("Stream closed");
                }
                buffer = nbuf;
            }
        count = pos;
        // 将输入流中的数据独到buf[]数组中
        int n = getInIfOpen().read(buffer, pos, buffer.length - pos);
        if (n > 0)
            count = n + pos;
    }

    /**
     * See
     * the general contract of the <code>read</code>
     * method of <code>InputStream</code>.
     *
     * @return     the next byte of data, or <code>-1</code> if the end of the
     *             stream is reached.
     * @exception  IOException  if this input stream has been closed by
     *                          invoking its {@link #close()} method,
     *                          or an I/O error occurs.
     * @see        java.io.FilterInputStream#in
     */
    public synchronized int read() throws IOException {
        // 说明当前buf[]数组大小不够了,需要fill()
        if (pos >= count) {
            fill();
            // 说明没有读取到任何数据
            if (pos >= count)
                return -1;
        }
        return getBufIfOpen()[pos++] & 0xff;
    }

    /**
     * Read characters into a portion of an array, reading from the underlying
     * stream at most once if necessary.
     */
    private int read1(byte[] b, int off, int len) throws IOException {
        int avail = count - pos;
        if (avail <= 0) {
            /* If the requested length is at least as large as the buffer, and
               if there is no mark/reset activity, do not bother to copy the
               bytes into the local buffer.  In this way buffered streams will
               cascade harmlessly. */
            // !!!这个位置代码很重要
            // !!!这个位置代码很重要
            // !!!这个位置代码很重要
            /**
              * 当写入指定数组b的长度大小超过BufferedInputStream中核心缓存数组buf[]的大小并且
              * markpos < 0,那么就直接从数据流中读取数据给b数组,而不通过buf[]缓存数组,避免buf[]数组急剧增大
              * 
              */
            if (len >= getBufIfOpen().length && markpos < 0) {
                return getInIfOpen().read(b, off, len);
            }
            fill();
            avail = count - pos;
            if (avail <= 0) return -1;
        }
        int cnt = (avail < len) ? avail : len;
        System.arraycopy(getBufIfOpen(), pos, b, off, cnt);
        pos += cnt;
        return cnt;
    }

    /**
     * Reads bytes from this byte-input stream into the specified byte array,
     * starting at the given offset.
     *
     * <p> This method implements the general contract of the corresponding
     * <code>{@link InputStream#read(byte[], int, int) read}</code> method of
     * the <code>{@link InputStream}</code> class.  As an additional
     * convenience, it attempts to read as many bytes as possible by repeatedly
     * invoking the <code>read</code> method of the underlying stream.  This
     * iterated <code>read</code> continues until one of the following
     * conditions becomes true: <ul>
     *
     *   <li> The specified number of bytes have been read,
     *
     *   <li> The <code>read</code> method of the underlying stream returns
     *   <code>-1</code>, indicating end-of-file, or
     *
     *   <li> The <code>available</code> method of the underlying stream
     *   returns zero, indicating that further input requests would block.
     *
     * </ul> If the first <code>read</code> on the underlying stream returns
     * <code>-1</code> to indicate end-of-file then this method returns
     * <code>-1</code>.  Otherwise this method returns the number of bytes
     * actually read.
     *
     * <p> Subclasses of this class are encouraged, but not required, to
     * attempt to read as many bytes as possible in the same fashion.
     *
     * @param      b     destination buffer.
     * @param      off   offset at which to start storing bytes.
     * @param      len   maximum number of bytes to read.
     * @return     the number of bytes read, or <code>-1</code> if the end of
     *             the stream has been reached.
     * @exception  IOException  if this input stream has been closed by
     *                          invoking its {@link #close()} method,
     *                          or an I/O error occurs.
     *
     * 该方法主要调用read1(byte[] b, int off, int len)
     */
    public synchronized int read(byte b[], int off, int len)
        throws IOException
    {
        getBufIfOpen(); // Check for closed stream
        if ((off | len | (off + len) | (b.length - (off + len))) < 0) {
            throw new IndexOutOfBoundsException();
        } else if (len == 0) {
            return 0;
        }

        int n = 0;
        for (;;) {
            int nread = read1(b, off + n, len - n);
            if (nread <= 0)
                return (n == 0) ? nread : n;
            n += nread;
            if (n >= len)
                return n;
            // if not closed but no bytes available, return
            InputStream input = in;
            if (input != null && input.available() <= 0)
                return n;
        }
    }

    /**
     * See the general contract of the <code>skip</code>
     * method of <code>InputStream</code>.
     *
     * @exception  IOException  if the stream does not support seek,
     *                          or if this input stream has been closed by
     *                          invoking its {@link #close()} method, or an
     *                          I/O error occurs.
     *
     * 跳过流中指定字节数,感觉该方法用处不大,至少到目前为止,我本人还从来没有用过skip方法
     */
    public synchronized long skip(long n) throws IOException {
        getBufIfOpen(); // Check for closed stream
        if (n <= 0) {
            return 0;
        }
        long avail = count - pos;

        if (avail <= 0) {
            // If no mark position set then don't keep in buffer
            if (markpos <0)
                return getInIfOpen().skip(n);

            // Fill in buffer to save bytes for reset
            fill();
            avail = count - pos;
            if (avail <= 0)
                return 0;
        }

        long skipped = (avail < n) ? avail : n;
        pos += skipped;
        return skipped;
    }

    /**
     * Returns an estimate of the number of bytes that can be read (or
     * skipped over) from this input stream without blocking by the next
     * invocation of a method for this input stream. The next invocation might be
     * the same thread or another thread.  A single read or skip of this
     * many bytes will not block, but may read or skip fewer bytes.
     * <p>
     * This method returns the sum of the number of bytes remaining to be read in
     * the buffer (<code>count&nbsp;- pos</code>) and the result of calling the
     * {@link java.io.FilterInputStream#in in}.available().
     *
     * @return     an estimate of the number of bytes that can be read (or skipped
     *             over) from this input stream without blocking.
     * @exception  IOException  if this input stream has been closed by
     *                          invoking its {@link #close()} method,
     *                          or an I/O error occurs.
     *
     * buf[]数组剩余字节数+输入流中剩余字节数
     */
    public synchronized int available() throws IOException {
        int n = count - pos;
        int avail = getInIfOpen().available();
        return n > (Integer.MAX_VALUE - avail)
                    ? Integer.MAX_VALUE
                    : n + avail;
    }

    /**
     * See the general contract of the <code>mark</code>
     * method of <code>InputStream</code>.
     *
     * @param   readlimit   the maximum limit of bytes that can be read before
     *                      the mark position becomes invalid.
     * @see     java.io.BufferedInputStream#reset()
     *
     * 标记位置,marklimit只有在这里才能够被赋值,readlimit表示mark()方法执行后,最多能够从流中
     * 读取的数据,如果超过该字节大小,那么在fill()的时候,就会认为此mark()标记无效,重新将
     * markpos = -1,pos = 0
     */
    public synchronized void mark(int readlimit) {
        marklimit = readlimit;
        markpos = pos;
    }

    /**
     * See the general contract of the <code>reset</code>
     * method of <code>InputStream</code>.
     * <p>
     * If <code>markpos</code> is <code>-1</code>
     * (no mark has been set or the mark has been
     * invalidated), an <code>IOException</code>
     * is thrown. Otherwise, <code>pos</code> is
     * set equal to <code>markpos</code>.
     *
     * @exception  IOException  if this stream has not been marked or,
     *                  if the mark has been invalidated, or the stream
     *                  has been closed by invoking its {@link #close()}
     *                  method, or an I/O error occurs.
     * @see        java.io.BufferedInputStream#mark(int)
     */
    public synchronized void reset() throws IOException {
        getBufIfOpen(); // Cause exception if closed
        if (markpos < 0)
            throw new IOException("Resetting to invalid mark");
        pos = markpos;
    }

    /**
     * Tests if this input stream supports the <code>mark</code>
     * and <code>reset</code> methods. The <code>markSupported</code>
     * method of <code>BufferedInputStream</code> returns
     * <code>true</code>.
     *
     * @return  a <code>boolean</code> indicating if this stream type supports
     *          the <code>mark</code> and <code>reset</code> methods.
     * @see     java.io.InputStream#mark(int)
     * @see     java.io.InputStream#reset()
     */
    public boolean markSupported() {
        return true;
    }

    /**
     * Closes this input stream and releases any system resources
     * associated with the stream.
     * Once the stream has been closed, further read(), available(), reset(),
     * or skip() invocations will throw an IOException.
     * Closing a previously closed stream has no effect.
     *
     * @exception  IOException  if an I/O error occurs.
     */
    public void close() throws IOException {
        byte[] buffer;
        while ( (buffer = buf) != null) {
            if (bufUpdater.compareAndSet(this, buffer, null)) {
                InputStream input = in;
                in = null;
                if (input != null)
                    input.close();
                return;
            }
            // Else retry in case a new buf was CASed in fill()
        }
    }
}

3 BufferedInputStream in the actual scene, there is not much use

Many online blog that BufferedInputStreamis useful, can be read from the IO-time in a lot of data, then the cache in buf [], so that you reduce the consumption of IO, a lot of bloggers, and even gives some practical operation of the code, proof BufferedInputStreamindeed improve efficiency, which in itself is no problem, but I was after in-depth study of the source code, but found the actual scene, the frequency of use of such small, do not needBufferedInputStream

I will combine the code, a more powerful explanation:

// file文件大小1个G
        private static String file = "D:\\StudySoftware\\VMware_virtualbox\\Data_vmware\\VMwareMachine\\kafka_single\\kafka-single-103-da5cf665.vmem";


private static void file() throws IOException{
        long beginTime = System.currentTimeMillis();
        FileInputStream input = new FileInputStream(file);
        byte[] bytes = new byte[1024 * 1];
        int read = 0;
        while ((read = input.read(bytes, 0, bytes.length)) != -1) {
            // 不执行任何操作,仅仅读取文件
        }
        long endTime = System.currentTimeMillis();
        System.out.println("file: 耗费时间:" + (endTime - beginTime));
    }

    private static void bufferd() throws IOException{
        long beginTime = System.currentTimeMillis();
        FileInputStream input = new FileInputStream(file);
        BufferedInputStream bufferedInput = new BufferedInputStream(input);
        byte[] bytes = new byte[1024 * 1];
        int read = 0;
        while ((read = bufferedInput.read(bytes, 0, bytes.length)) != -1) {
            //不执行任何操作,仅仅读取文件
        }
        long endTime = System.currentTimeMillis();
        System.out.println("buffered: 耗费时间:" + (endTime - beginTime));
    }

note:

When the operation codes, two methods can not be performed on the same file operation, prevent the JVM optimized automatically, as a first method of reading the entire document, when the second read method, the stored partial information may JVM, whereby resulting test data is not accurate. And in order to ensure maximum accuracy of test data, the JVM startup time, only a test method

result:

① When the byte [] bytes = new byte [1024 * 1]; array size of 1024

buffered: time-consuming: 855
File: time-consuming: 3073

② When byte [] bytes = new byte [1024 * 2]; array size of 2018

buffered: time-consuming: 813
File: time-consuming: 1909

③ When byte [] bytes = new byte [1024 * 3]; array size is 3072

buffered: time-consuming: 1304
File: time-consuming: 1476

④ When the byte [] bytes = new byte [1024 * 4]; array size of 4096

buffered: time-consuming: 844
File: time-consuming: 1287

⑤ When byte [] bytes = new byte [1024 * 5]; array size of 5120

buffered: time-consuming: 1343
File: time-consuming: 1061

⑥ When byte [] bytes = new byte [1024 * 6]; array size of 6144

buffered: time-consuming: 1280
File: time-consuming: 985

⑦ When byte [] bytes = new byte [1024 * 7]; array size of 7168

buffered: time-consuming: 1443
File: time-consuming: 851

⑧ When byte [] bytes = new byte [1024 * 8]; array size is 8192

buffered: time-consuming: 774
File: time-consuming: 739

⑨ When byte [] bytes = new byte [1024 * 9]; array size of 9216

buffered: time-consuming: 734
File: time-consuming: 749

⑩ When byte [] bytes = new byte [1024 * 10]; array size 10240

buffered: time-consuming: 739
File: time-consuming: 697

... ... ...

We can draw the following important conclusions:

When comparing hours, bytes BufferedInputStreamactually a lot faster when reading the file, but when bytes gradually increased, especially to reach 8kb, we will find BufferedInputStreamand FileInputStreamread files the same speed, no significant difference

Our in-depth source code, you can find:

So when we put while ((read = input.read(bytes, 0, bytes.length)) != -1)upon the bytes increase BufferedInputStreamhas no effect ( unless there is a mark, reset demand )

Some junior partner, would certainly say that I will be BufferedInputStreamthe buf [] the size of the increase is not on line yet?

Can be, but I will be while ((read = input.read(bytes, 0, bytes.length)) != -1)in bytes size increase is not on the list? The final analysis are byte array, a is BufferedInputStreamoutside, in an BufferedInputStreaminside, and we now be read from the stream, are often not required mark, reset operation, and we set the outer size bytes are usually relatively large, this time, can not useBufferedInputStream

4 BufferedInputStream unique usage scenarios

I personally think that BufferedInputStreamthe only usage scenario is that when we need mark, reset characteristics. But pay special attention to the use of mark, reset, and which involves a lot of things, especially when BufferedInputStreamexecuted fill () operation

public static void main(String[] args) {
    try {
        final byte[] src = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20};
        final ByteArrayInputStream bis = new ByteArrayInputStream(src);
        final BufferedInputStream bufis = new BufferedInputStream(bis, 5);
        int data = -1;
        int i = 0;
        while((data = bufis.read()) != -1) {
            if(data == 4) {
                bufis.mark(2);
            }
            if(i++ == 9) {
                bufis.reset();
            }
            System.out.printf("%d", data);
        }
    } catch(IOException ioex) {
        ioex.printStackTrace();
    }
}
// 原文链接:https://blog.csdn.net/qq_26971305/article/details/79472696

Interested friends, you can debug the above code, debug the following cases, corresponding you to BufferedInputStreamhave a deeper understanding

if(i++ == 5)

if(i++ == 6)

if(i++ == 7)

if(i++ == 8)

if(i++ == 9)

if(i++ == 10)

... ... ... time and more friends, may be provided BufferedInputStreamin buf [] of length and the size if (i ++ == xx) value is determined to look at the statement BufferedInputStreamexecution flow class

mark, reset characteristics not be used indiscriminately, or will throw an exception

    public synchronized void reset() throws IOException {
        getBufIfOpen(); // Cause exception if closed
        if (markpos < 0)
            throw new IOException("Resetting to invalid mark");
        pos = markpos;
    }



Reference Links:
https://blog.csdn.net/qq_26971305/article/details/79472696

Author: a cup of hot coffee AAA
Source: https://www.cnblogs.com/AdaiCoffee/
In this paper, learn, and share research-based, welcome to reprint. If there is nothing wrong or the wrong place paper also pointed out that hope, so as not to harm the younger generation. If you have better ideas and opinions, comments can discuss, thank you!

Guess you like

Origin www.cnblogs.com/AdaiCoffee/p/11369699.html