How to upload and download large java files (above 100M)

Core principle:

 

The core of the project is to upload files in blocks. The front and back ends need to be highly coordinated. Both parties need to agree on some data to complete the large file block. The following issues we need to focus on in the project.

* How to slice;

* How to synthesize a file;

* From which segment the interruption started.

How to divide and use the powerful js library to ease our work. There are already wheels on large files in the market, although the nature of the programmer has forced me to reinvent the wheel. But because of time and work, I can only give up. Finally, I chose Baidu's WebUploader to achieve the front-end needs.

How to combine, before combining, we have to solve a problem first, how do we distinguish which file the block belongs to. At the beginning, I used the front-end to generate a unique uuid to mark the file, and put it on every fragment request. However, I gave up when doing second transmission and adopted Md5 to maintain the relationship between blocks and files.

The problem of merging files on the server side and dividing records into blocks has actually been provided by the industry in this regard. Refer to Xunlei, you will find that every time you download, there will be two files, one is the main body of the file, and the other is the temporary file. The temporary file stores the state of the corresponding byte of each block.

All of these require close contact between the front and back ends. The front end needs to fragment the file according to a fixed size, and the fragment sequence number and size must be included in the request. After the front-end sends the request to the background, the server only needs to calculate the starting position according to the fragment sequence number given in the request data and the size of each fragment (the fragment size is fixed and the same), and write the file fragment data. Just enter the file.

In order to facilitate the development, I divided the business logic of the server as follows, divided into initialization, block processing, file upload completed, etc.

The business logic module of the server is as follows

 

Functional Analysis:

Folder generation module

After the folder is uploaded, the server will scan the code as follows

For uploading in blocks, the block processing logic should be the simplest logic. Up6 has divided the file into blocks, and has identified each block data. These identifications include the index, size, offset, and file of the file block. MD5, file block MD5 (need to open) and other information, the server can process it very conveniently after receiving this information. For example, save block data in a distributed storage system

 

Multi-part upload can be said to be the basis of our entire project, such as resumable upload and pause, all of which need to be divided into blocks.

Blocking is relatively simple. The front end adopts webuploader, and basic functions such as block have been encapsulated, which is easy to use.

With the file API provided to us by webUpload, the front end is extremely simple.

Front HTML template

 

The division must be combined. The large file is fragmented, but there is no original file function after fragmentation, so we need to combine the fragments into the original file. We only need to write the fragments into the file according to their original positions. Because we have already talked about the previous principle, we know the block size and block serial number, so I can know the starting position of the block in the file. So it is wise to use RandomAccessFile here, RandomAccessFile can move back and forth inside the file. However, most of the functions of animalAccessFile have been replaced by the "memory-mapped files" of NIO of JDK1.4. In this project, I wrote about using RandomAccessFile and MappedByteBuffer to synthesize files. The corresponding methods are uploadFileRandomAccessFile and uploadFileByMappedByteBuffer. The two method codes are as follows.

Second transmission function

Server logic

I believe everyone has demonstrated the function of uploading in seconds. When uploading from the network disk, it was found that the uploaded files were uploaded in seconds. In fact, students who have studied the principle a little should know that it is actually to check the MD5 of the file, record the MD5 of the file uploaded to the system, get the MD5 value of the file content or partial value MD5 before uploading a file, and then check it on the matching system. data.

Breakpoint-http implements the principle of second transmission. After the client selects a file, it triggers to obtain the MD5 value of the file when clicking upload. After obtaining the MD5, it calls an interface of the system (/index/checkFileMd5) to query whether the MD5 already exists (I am in this project Use redis to store the data, use the MD5 value of the file as the key, and the value is the address of the file storage.) The interface returns to the check status, and then proceed to the next step. I believe everyone can understand by looking at the code.

Well, the MD5 value of the front end also uses the function of webuploader, which is still a good tool.

After the control calculates the MD5 of the file, it will trigger the md5_complete event and pass the value md5. Developers only need to handle this event.

http

Up6 has automatically processed the resumable transmission, and no development is required for separate processing.

Receive these parameters in f_post.jsp and process them. Developers only need to pay attention to business logic, not other aspects.

Resumable uploading means that the file upload process is interrupted. Human factors (pause) or force majeure (interrupted or poor network) caused the file upload to fail halfway. Then when the environment is restored, re-upload the file instead of re-uploading.

As mentioned earlier, the function of resumable upload is implemented based on block upload. A large file is divided into many small blocks. The server can land each block successfully uploaded, and the client is uploading. At the beginning of the file, the interface is called for quick verification, and the condition chooses to skip a certain block.

The realization principle is that before each file is uploaded, the MD5 value of the file is obtained, and the interface (/index/checkFileMd5, yes, it is also the check interface of the second pass) is called before the file is uploaded. If the status of the obtained file is incomplete, then Return the numbers of all the parts that have not been uploaded, and then the front end will perform condition screening to figure out which parts have not been uploaded, and then upload them.

When the file block is received, it can be directly written to the file on the server

This is the effect of file block upload

This is the effect after uploading the folder

This is the storage structure on the server after the folder is uploaded

Reference article: http://blog.ncmem.com/wordpress/2019/08/12/java-http%E5%A4%A7%E6%96%87%E4%BB%B6%E6%96%AD%E7 %82%B9%E7%BB%AD%E4%BC%A0%E4%B8%8A%E4%BC%A0/

Welcome to join the group to discuss together: 374992201

Guess you like

Origin blog.csdn.net/weixin_45525177/article/details/108645946