How to realize large file breakpoint resume transmission and second transmission

Let's first understand a few concepts:

  • "File chunking" : split large files into small files, upload/download small files, and finally assemble small files into large files;

  • "Breakpoint resume upload" : On the basis of file division, each small file is uploaded/downloaded in a separate thread. If a network failure occurs, you can continue to upload/download from the part that has already been uploaded/downloaded part, and there is no need to upload\download from scratch;

  • "File transfer in seconds" : The file already exists in the resource server, and the URI of the file is returned directly when uploaded by others.

1、RandomAccessFile

Usually we use IO streams such as FileInputStream, FileOutputStream, FileReader and FileWriter to read files. Today we will learn about RandomAccessFile.

It is an independent class that directly inherits Object, and it implements the DataInput and DataOutput interfaces in the underlying implementation. This class supports random reading of files, which are similar to large byte arrays stored in the file system.

Its implementation is based on a " file pointer " (a cursor or an index pointing to an implicit array). The file pointer can be read by the getFilePointer method or set by the seek method.

On input, bytes are read from the beginning of the file pointer, and the file pointer is past the bytes read, and an output operation that writes past the current end of the implied array causes the array to be extended. This class has four modes to choose from:

  • r: Open the file in read-only mode, if a write operation is performed, an IOException will be thrown;
  • rw: Open the file for reading and writing, if the file does not exist, try to create the file;
  • rws: Open the file in read and write mode, requiring each update of the file content or metadata to be written to the underlying storage device synchronously;
  • rwd: Open the file in read and write mode, requiring each update of the file content to be written to the underlying storage device synchronously;

In rw mode, the default is to use buffer, and only when the cache is full or when the stream is closed with RandomAccessFile.close() can the file be actually written.

1.1、API

1. void seek(long pos): Set the file pointer offset for the next read or write. In layman's terms, it is to specify the position of the next read file data.

The offset can be set beyond the end of the file, and the file length can only be changed by writing after the offset is set beyond the end of the file;

2. native long getFilePointer(): returns the cursor position of the current file;

3. native long length(): returns the length of the current file;

4. "Reading" method

5. "Write" method

6. readFully(byte[] b): The function of this method is to fill the buffer b with the contents of the text. If the buffer b cannot be filled, the process of reading the stream will be blocked, and if it is found to be the end of the stream, an exception will be thrown;

7. FileChannel getChannel(): returns the unique FileChannel object associated with this file;

8. int skipBytes(int n): try to skip the input of n bytes, and discard the skipped bytes;

Most of the functions of RandomAccessFile have been replaced by the "memory mapping" file of JDK1.4's NIO, that is, the file is mapped to the memory and then operated, eliminating the need for frequent disk io.

2. File block

File chunking needs to be processed at the front end, and a powerful js library or ready-made components can be used for chunking processing. It is necessary to determine the size of the block and the number of blocks, and then specify an index value for each block.

In order to prevent the block of the uploaded file from being confused with other files, the md5 value of the file is used to distinguish it. This value can also be used to verify whether the file exists on the server and the upload status of the file.

  • If the file exists, return the file address directly;
  • If the file does not exist, but there is an upload status, that is, some parts are uploaded successfully, return the unuploaded part index array;
  • If the file does not exist and the upload status is empty, all parts need to be uploaded.
fileRederInstance.readAsBinaryString(file);
fileRederInstance.addEventListener("load", (e) => {
    let fileBolb = e.target.result;
    fileMD5 = md5(fileBolb);
    const formData = new FormData();
    formData.append("md5", fileMD5);
    axios
        .post(http + "/fileUpload/checkFileMd5", formData)
        .then((res) => {
            if (res.data.message == "文件已存在") {
                //文件已存在不走后面分片了,直接返回文件地址到前台页面
                success && success(res);
            } else {
                //文件不存在存在两种情况,一种是返回data:null代表未上传过 一种是data:[xx,xx] 还有哪几片未上传
                if (!res.data.data) {
                    //还有几片未上传情况,断点续传
                    chunkArr = res.data.data;
                }
                readChunkMD5();
            }
        })
        .catch((e) => {});
});

 Before calling the upload interface, use the slice method to retrieve the block corresponding to the index in the file.

const getChunkInfo = (file, currentChunk, chunkSize) => {
       //获取对应下标下的文件片段
       let start = currentChunk * chunkSize;
       let end = Math.min(file.size, start + chunkSize);
       //对文件分块
       let chunk = file.slice(start, end);
       return { start, end, chunk };
   };

Then call the upload interface to complete the upload.

3. Breakpoint resume upload, file transfer in seconds

The backend is developed based on spring boot and uses redis to store the status of the uploaded file and the address of the uploaded file.

If the file is completely uploaded, the file path is returned; if the file is partially uploaded, the unuploaded block array is returned; if it has not been uploaded, a prompt message is returned.

Two files will be generated when uploading chunks, one is the main body of the file and the other is a temporary file. The temporary file can be regarded as an array file, and a byte with a value of 127 is allocated for each block.

Two values ​​are used when verifying the MD5 value:

  • File upload status: as long as the file has been uploaded, it will not be empty, if it is completely uploaded, it will be true, and if it is partially uploaded, it will return false;
  • File upload address: If the file is uploaded completely, return the file path; partial upload returns the temporary file path.
/**
 * 校验文件的MD5
 **/
public Result checkFileMd5(String md5) throws IOException {
    //文件是否上传状态:只要该文件上传过该值一定存在
    Object processingObj = stringRedisTemplate.opsForHash().get(UploadConstants.FILE_UPLOAD_STATUS, md5);
    if (processingObj == null) {
        return Result.ok("该文件没有上传过");
    }
    boolean processing = Boolean.parseBoolean(processingObj.toString());
    //完整文件上传完成时为文件的路径,如果未完成返回临时文件路径(临时文件相当于数组,为每个分块分配一个值为127的字节)
    String value = stringRedisTemplate.opsForValue().get(UploadConstants.FILE_MD5_KEY + md5);
    //完整文件上传完成是true,未完成返回false
    if (processing) {
        return Result.ok(value,"文件已存在");
    } else {
        File confFile = new File(value);
        byte[] completeList = FileUtils.readFileToByteArray(confFile);
        List<Integer> missChunkList = new LinkedList<>();
        for (int i = 0; i < completeList.length; i++) {
            if (completeList[i] != Byte.MAX_VALUE) {
                //用空格补齐
                missChunkList.add(i);
            }
        }
        return Result.ok(missChunkList,"该文件上传了一部分");
    }
}

Speaking of this, you will definitely ask: After all the multi-part uploads of this file are completed, how to get the complete file? Next, let's talk about the problem of block merging.

4. Upload in chunks and merge files

Above we mentioned that the md5 value of the file is used to maintain the relationship between the block and the file, so we will merge the blocks with the same md5 value. Since each block has its own index value, we will block the block Insert them into the file separately by index like inserting an array to form a complete file.

When uploading in parts, it should correspond to the front-end block size, number of blocks, and current block index, etc., for use when merging files. Here we use the "disk mapping" method to merge files .

//读操作和写操作都是允许的
RandomAccessFile tempRaf = new RandomAccessFile(tmpFile, "rw");
//它返回的就是nio通信中的file的唯一channel
FileChannel fileChannel = tempRaf.getChannel();

//写入该分片数据   分片大小 * 第几块分片获取偏移量
long offset = CHUNK_SIZE * multipartFileDTO.getChunk();
//分片文件大小
byte[] fileData = multipartFileDTO.getFile().getBytes();
//将文件的区域直接映射到内存
MappedByteBuffer mappedByteBuffer = fileChannel.map(FileChannel.MapMode.READ_WRITE, offset, fileData.length);
mappedByteBuffer.put(fileData);
// 释放
FileMD5Util.freedMappedByteBuffer(mappedByteBuffer);
fileChannel.close();

Whenever a block upload is completed, it is also necessary to check the progress of the file upload to see if the file upload is complete.

RandomAccessFile accessConfFile = new RandomAccessFile(confFile, "rw");
//把该分段标记为 true 表示完成
accessConfFile.setLength(multipartFileDTO.getChunks());
accessConfFile.seek(multipartFileDTO.getChunk());
accessConfFile.write(Byte.MAX_VALUE);

//completeList 检查是否全部完成,如果数组里是否全部都是(全部分片都成功上传)
byte[] completeList = FileUtils.readFileToByteArray(confFile);
byte isComplete = Byte.MAX_VALUE;
for (int i = 0; i < completeList.length && isComplete == Byte.MAX_VALUE; i++) {
    //与运算, 如果有部分没有完成则 isComplete 不是 Byte.MAX_VALUE
    isComplete = (byte) (isComplete & completeList[i]);
}
accessConfFile.close();

Then update the upload progress of the file to Redis.

//更新redis中的状态:如果是true的话证明是已经该大文件全部上传完成
if (isComplete == Byte.MAX_VALUE) {
    stringRedisTemplate.opsForHash().put(UploadConstants.FILE_UPLOAD_STATUS, multipartFileDTO.getMd5(), "true");
    stringRedisTemplate.opsForValue().set(UploadConstants.FILE_MD5_KEY + multipartFileDTO.getMd5(), uploadDirPath + "/" + fileName);
} else {
    if (!stringRedisTemplate.opsForHash().hasKey(UploadConstants.FILE_UPLOAD_STATUS, multipartFileDTO.getMd5())) {
        stringRedisTemplate.opsForHash().put(UploadConstants.FILE_UPLOAD_STATUS, multipartFileDTO.getMd5(), "false");
    }
    if (!stringRedisTemplate.hasKey(UploadConstants.FILE_MD5_KEY + multipartFileDTO.getMd5())) {
        stringRedisTemplate.opsForValue().set(UploadConstants.FILE_MD5_KEY + multipartFileDTO.getMd5(), uploadDirPath + "/" + fileName + ".conf");
    }
}

Guess you like

Origin blog.csdn.net/qq_34272760/article/details/129406494