场景

前段时候做数据管理，提供了一个文件读取的接口。协议规范大致如下：

客户端通过http接口获取数据流，在获取过程中今天暴露了一些问题，晒一晒，希望看到的人免踩坑吧。

最开始的实现

此处省去了网络请求部分，直接看对流读取的部分；


/**
     * 一个文件就是一条数据
     * @param result 数据存储的对象，是一个{@link JSONObject}列表
     * @param input 数据输入流 {@link InputStream}
     * @param dataInfo {@link DataInfo}
     * @param fileLength 文件长度
     * @throws IOException
     * @throws UnsupportedEncodingException
     */
    private void readDataByFile(List<JSONObject> result, InputStream input, DataInfo dataInfo, int fileLength)
            throws IOException {
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        byte[] fileBytes = new byte[fileLength];
        input.read(fileBytes);
        bos.write(fileBytes);
        result.add(preProcessor.run(new String(bos.toByteArray(), StandardCharsets.UTF_8), dataInfo));
    }

方法中 InputStream 参数通过网络获取的输入流，DataInfo 和业务相关，不需要关注，fileLength 是按协议取到的一个文件的长度，代码的目的是获取到一个文件的内容，并且通过一些其他处理最终转为了一个jsonObject;

方法开始使用是没有问题的，因为读取的文件都比较小，但是前两天刚刚上线一个业务，文件相较大，当读取频率较高时，系统内存溢出了；

针对内存溢出，做了两处改进

一次少读点；

不要再去转字节数组；

1. 一次少读点

byte[] fileBytes = new byte[fileLength];

代码每次都new了一个byte大对象，GC来不及回收，容易造成内存溢出；

要读到fileLength个字节，当字节数小于4096时，一次读完；当大于4096时，每次读取4096个，改造后代码大致如下：

private void readDataByFile(List<JSONObject> result, InputStream input, DataInfo dataInfo, int fileLength)
            throws IOException {
        // 若文件长度小于4096
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        if (fileLength < 4096) {
            byte[] fileBytes = new byte[fileLength];
            input.read(fileBytes);
            bos.write(fileBytes);
        } else {
            int left = fileLength % 4096;
            int x = fileLength / 4096;
            byte[] buffer = new byte[4096];
            int bytesRead = -1;
            for (int i = 0; i < x; i++) {
                bytesRead = input.read(buffer);
                bos.write(buffer);
            }
            if (left > 0) {
                byte[] leftBytes = new byte[left];
                input.read(leftBytes);
                bos.write(leftBytes);
            }
        }
        result.add(preProcessor.run(new String(bos.toByteArray(), StandardCharsets.UTF_8) , dataInfo));
    }

1. 不要再去转字节数组

看最后一行代码

new String(bos.toByteArray(), StandardCharsets.UTF_8)

这句看了下ByteArrayOutputStream 源码，toByteArray 其实是调用了Arrays.copy来复制数组

    /**
     * Creates a newly allocated byte array. Its size is the current
     * size of this output stream and the valid contents of the buffer
     * have been copied into it.
     *
     * @return  the current contents of this output stream, as a byte array.
     * @see     java.io.ByteArrayOutputStream#size()
     */
    public synchronized byte toByteArray()[] {
        return Arrays.copyOf(buf, count);
    }

实际上 ByteArrayOutputStream 有个toString方法可以直接调用。都是API没研究好啊，多new个数组好多余

    /**
     * Converts the buffer's contents into a string decoding bytes using the
     * platform's default character set. The length of the new <tt>String</tt>
     * is a function of the character set, and hence may not be equal to the
     * size of the buffer.
     *
     * <p> This method always replaces malformed-input and unmappable-character
     * sequences with the default replacement string for the platform's
     * default character set. The {@linkplain java.nio.charset.CharsetDecoder}
     * class should be used when more control over the decoding process is
     * required.
     *
     * @return String decoded from the buffer's contents.
     * @since  JDK1.1
     */
    public synchronized String toString() {
        return new String(buf, 0, count);
    }

还没完。。。

在运行代码，发现后续还是会报错，又经过一番排查，发现读取的内容偶尔会有缺失，定位到InputStream 因为是网络输入流，一次read很可能读取不完，再次改造代码

private void readDataByFile(List<JSONObject> result, InputStream input, DataInfo dataInfo, int fileLength)
            throws IOException {
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    int len = -1;
    int total = 0; // 记录总读取的字节数
    int last = 0;  // 记录最后一批要读取的数量
    if (fileLength <= 4096 ) {
        last = fileLength;
    }
    // 要读取fileLength个字节
    byte[] fileBytes = new byte[4096];
    
    while (last=0 && (len = input.read(fileBytes)) > -1) {
        out.write(fileBytes, 0, len);
        total += len;
        // 读到最后一批待读字节结束
        if (fileLength - total <= 4096) {
            last = fileLength - total;
        }
    }
    // 读取最后一批待读字节
    byte[] lastBuffer = new byte[last];
    while (total < fileLength && (len = input.read(lastBuffer)) > -1) {
        out.write(lastBuffer, 0, len);
        total += len;
    }
    result.add(preProcessor.run(out.toString("UTF-8"), dataInfo));

}

OK! 顺便说下，ByteArrayOutputStream 之所以不用释放资源，不妨看下它的close方法做了什么。

Java网络字节流读取的一次实现记录

场景

最开始的实现

1. 一次少读点

1. 不要再去转字节数组

还没完。。。

猜你喜欢