Upload and download components for large java files (above 100M)

1. Functional requirements and non-functional requirements

Requires convenient operation, select multiple files and folders to upload at one time;
support PC-side full-platform operating system, Windows, Linux, Mac

Support batch downloading of files and folders, resumable uploading. Continue the transfer after refreshing the page. Keep the progress information after closing the browser.

Supports batch upload and download of folders, the server-side retains the folder hierarchy, and the server-side folder hierarchy is the same as the local.

Supports bulk upload (20G) and download of large files, and at the same time, it is necessary to ensure that the user's computer does not experience freezing during uploading;
support folder upload, the number of files in the folder reaches more than 10,000, and contains a hierarchical structure.

Support breakpoint resumable upload, the progress can still be retained after closing the browser or refreshing the browser.

Support folder structure management, support new folder, support folder directory navigation

Friendly interaction, able to feedback the upload progress in time;

The security of the server will not affect the use of other functions due to the JVM memory overflow caused by the file upload function;

Maximize the use of network uplink bandwidth to increase upload speed;


2. Design analysis

For the processing of large files, whether it is on the user side or the server side, it is not advisable to read, send, and receive at one time, which can easily cause memory problems. So for uploading large files, use segmented upload

From the perspective of upload efficiency, the use of multi-threaded concurrent upload can achieve maximum efficiency.


3. Solution:

The front end of the file upload page can choose to use some easy-to-use upload components, such as Baidu’s open source component WebUploader, and Zeyou Software’s up6. These components can basically meet some of the daily required functions of file upload, such as asynchronous upload of files and folders. , Drag-and-drop upload, paste upload, upload progress monitoring, file thumbnails, even large file breakpoint resumable upload, large file upload in seconds. 

 

Uploading folders in web projects has now become a mainstream requirement. There are similar requirements in OA or enterprise ERP systems. Uploading folders and preserving the hierarchical structure can guide users well and make it more convenient for users to use. Can provide more advanced application support.

Folder data table structure

CREATE TABLE IF NOT EXISTS `up6_folders` (

  `f_id`               char(32) NOT NULL ,

  `f_nameLoc`               varchar(255) default '',

  `f_pid`                   char(32) default '',

  `f_uid`                   int(11) default '0',

  `f_lenLoc`           bigint(19) default '0',

  `f_sizeLoc`               varchar(50) default '0',

  `f_pathLoc`               varchar(255) default '',

  `f_pathSvr`               varchar(255) default '',

  `f_pathRel`               varchar(255) default '',

  `f_folders`               int(11) default '0',

  `f_fileCount`        int(11) default '0',

  `f_filesComplete`    int(11) default '0',

  `f_complete`              tinyint(1) default '0',

  `f_deleted`               tinyint(1) default '0',

  `f_time`                  timestamp NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,

  `f_pidRoot`               char(32) default '',

  PRIMARY KEY  (`f_id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

File data table structure

CREATE TABLE IF NOT EXISTS `up6_files` (

  `f_id`               char(32) NOT NULL,

  `f_pid` char(32) default'', /*parent folder ID*/

  `f_pidRoot` char(32) default'', /*root-level folder ID*/

  `f_fdTask` tinyint(1) default '0', /*Is it a folder information*/

  `f_fdChild` tinyint(1) default '0', /*Is it a file in a folder*/

  `f_uid`                   int(11) default '0',

  `f_nameLoc` varchar(255) default'', /*The local name of the file (original file name)*/

  `f_nameSvr` varchar(255) default'', /*The name of the file on the server*/

  `f_pathLoc` varchar(512) default'', /*The local path of the file*/

  `f_pathSvr` varchar(512) default'', /*The location of the file in the remote server*/

  `f_pathRel`               varchar(512) default '',

  `f_md5` varchar(40) default'', /*File MD5*/

  `f_lenLoc` bigint(19) default '0', /*File size*/

  `f_sizeLoc` varchar(10) default '0', /*File size (formatted)*/

  `f_pos` bigint(19) default '0', /*Resume position*/

  `f_lenSvr` bigint(19) default '0', /*Uploaded size*/

  `f_perSvr` varchar(7) default '0%', /*Percentage uploaded*/

  `f_complete` tinyint(1) default '0', /*Is the upload completed?*/

  `f_time`                  timestamp NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,

  `f_deleted`               tinyint(1) default '0',

  `f_scan`                  tinyint(1) default '0',

  PRIMARY KEY  (`f_id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

The core of the project is to upload files in blocks. The front and back ends need to be highly coordinated. Both parties need to agree on some data to complete the large file block. The following issues we need to focus on in the project.

* How to slice;

* How to synthesize a file;

* From which segment the interruption started.

How to divide and use the powerful js library to ease our work. There are already wheels on large files in the market, although the nature of the programmer has forced me to reinvent the wheel. But because of time and work, I can only give up. Finally, I chose Baidu's WebUploader to achieve the front-end needs.

How to combine, before combining, we have to solve a problem first, how do we distinguish which file the block belongs to. At the beginning, I used the front-end to generate a unique uuid to mark the file, and put it on every fragment request. However, I gave up when doing second transmission and adopted Md5 to maintain the relationship between blocks and files.

The problem of merging files on the server side and dividing records into blocks has actually been provided by the industry in this regard. Refer to Xunlei, you will find that every time you download, there will be two files, one is the main body of the file, and the other is the temporary file. The temporary file stores the state of the corresponding byte of each block.

All of these require close contact between the front and back ends. The front end needs to fragment the file according to a fixed size, and the fragment sequence number and size must be included in the request. After the front-end sends the request to the background, the server only needs to calculate the starting position according to the fragment sequence number given in the request data and the size of each fragment (the fragment size is fixed and the same), and write the file fragment data. Just enter the file.

In order to facilitate the development, I divided the business logic of the server as follows, divided into initialization, block processing, file upload completed, etc.

The business logic module of the server is as follows

 

Functional Analysis:

Folder generation module

After the folder is uploaded, the server will scan the code as follows

public class fd_scan

{

    DbHelper db;

    Connection con;

    PreparedStatement cmd_add_f = null;

    PreparedStatement cmd_add_fd = null;

    public FileInf root = null;//root node

   

    public fd_scan()

    {

        this.db = new DbHelper();

        this.con = this.db.GetCon();       

    }

   

    public void makeCmdF()

    {

        StringBuilder sb = new StringBuilder();

        sb.append("insert into up6_files (");

        sb.append(" f_id");//1

        sb.append(",f_pid");//2

        sb.append(",f_pidRoot");//3

        sb.append(",f_fdTask");//4

        sb.append(",f_fdChild");//5

        sb.append(",f_uid");//6

        sb.append(",f_nameLoc");//7

        sb.append(",f_nameSvr");//8

        sb.append(",f_pathLoc");//9

        sb.append(",f_pathSvr");//10

        sb.append(",f_pathRel");//11

        sb.append(",f_md5");//12

        sb.append(",f_lenLoc");//13

        sb.append(",f_sizeLoc");//14

        sb.append(",f_lenSvr");//15

        sb.append(",f_perSvr");//16

        sb.append(",f_complete");//17

       

        sb.append(") values(");

       

        sb.append(" ?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(")");

 

        try {

            this.cmd_add_f = this.con.prepareStatement(sb.toString());

            this.cmd_add_f.setString(1, "");//id

            this.cmd_add_f.setString(2, "");//pid

            this.cmd_add_f.setString(3, "");//pidRoot

            this.cmd_add_f.setBoolean(4, true);//fdTask

            this.cmd_add_f.setBoolean(5, false);//f_fdChild

            this.cmd_add_f.setInt(6, 0);//f_uid

            this.cmd_add_f.setString(7, "");//f_nameLoc

            this.cmd_add_f.setString(8, "");//f_nameSvr

            this.cmd_add_f.setString(9, "");//f_pathLoc

            this.cmd_add_f.setString(10, "");//f_pathSvr

            this.cmd_add_f.setString(11, "");//f_pathRel

            this.cmd_add_f.setString(12, "");//f_md5

            this.cmd_add_f.setLong(13, 0);//f_lenLoc

            this.cmd_add_f.setString(14, "");//f_sizeLoc

            this.cmd_add_f.setLong(15, 0);//f_lenSvr            

            this.cmd_add_f.setString(16, "");//f_perSvr

            this.cmd_add_f.setBoolean(17, true);//f_complete

        } catch (SQLException e) {

            // TODO Auto-generated catch block

            e.printStackTrace ();

        }

    }

   

    public void makeCmdFD()

    {

        StringBuilder sb = new StringBuilder();

        sb.append("insert into up6_folders (");

        sb.append(" f_id");//1

        sb.append(",f_pid");//2

        sb.append(",f_pidRoot");//3

        sb.append(",f_nameLoc");//4

        sb.append(",f_uid");//5

        sb.append(",f_pathLoc");//6

        sb.append(",f_pathSvr");//7

        sb.append(",f_pathRel");//8

        sb.append(",f_complete");//9

        sb.append(") values(");//

        sb.append(" ?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(",?");

        sb.append(")");

 

        try {

            this.cmd_add_fd = this.con.prepareStatement(sb.toString());

            this.cmd_add_fd.setString(1, "");//id

            this.cmd_add_fd.setString(2, "");//pid

            this.cmd_add_fd.setString(3, "");//pidRoot

            this.cmd_add_fd.setString(4, "");//name

            this.cmd_add_fd.setInt(5, 0);//f_uid

            this.cmd_add_fd.setString(6, "");//pathLoc

            this.cmd_add_fd.setString(7, "");//pathSvr

            this.cmd_add_fd.setString(8, "");//pathRel

            this.cmd_add_fd.setBoolean(9, true);//complete

        } catch (SQLException e) {

            // TODO Auto-generated catch block

            e.printStackTrace ();

        }

    }

   

    protected void GetAllFiles(FileInf inf,String root)

    {

        File dir = new File(inf.pathSvr);

        File [] allFile = dir.listFiles();

        for(int i = 0; i < allFile.length; i++)

        {

            if(allFile[i].isDirectory())

            {

                FileInf fd = new FileInf();

                String uuid = UUID.randomUUID().toString();

                uuid = uuid.replace("-", "");

                fd.id = uuid;

                fd.pid = inf.id;

                fd.pidRoot = this.root.id;

                fd.nameSvr = allFile[i].getName();

                fd.nameLoc = fd.nameSvr;

                fd.pathSvr = allFile[i].getPath();

                fd.pathSvr = fd.pathSvr.replace("\\", "/");

                fd.pathRel = fd.pathSvr.substring(root.length() + 1);

                fd.perSvr = "100%";

                fd.complete = true;

                this.save_folder(fd);

               

                this.GetAllFiles(fd, root);

            }

            else

            {

                FileInf fl = new FileInf();

                String uuid = UUID.randomUUID().toString();

                uuid = uuid.replace("-", "");

                fl.id = uuid;

                fl.pid = inf.id;

                fl.pidRoot = this.root.id;

                fl.nameSvr = allFile[i].getName();

                fl.nameLoc = fl.nameSvr;

                fl.pathSvr = allFile[i].getPath();

                fl.pathSvr = fl.pathSvr.replace("\\", "/");

                fl.pathRel = fl.pathSvr.substring(root.length() + 1);

                fl.lenSvr = allFile[i].length();

                fl.lenLoc = fl.lenSvr;

                fl.perSvr = "100%";

                fl.complete = true;

                this.save_file(fl);

            }

        }

    }

   

    protected void save_file(FileInf f)

    {      

        try {

            this.cmd_add_f.setString(1, f.id);//id

            this.cmd_add_f.setString(2, f.pid);//pid

            this.cmd_add_f.setString(3, f.pidRoot);//pidRoot

            this.cmd_add_f.setBoolean(4, f.fdTask);//fdTask

            this.cmd_add_f.setBoolean(5, true);//f_fdChild

            this.cmd_add_f.setInt(6, f.uid);//f_uid

            this.cmd_add_f.setString(7, f.nameLoc);//f_nameLoc

            this.cmd_add_f.setString(8, f.nameSvr);//f_nameSvr

            this.cmd_add_f.setString(9, f.pathLoc);//f_pathLoc

            this.cmd_add_f.setString(10, f.pathSvr);//f_pathSvr

            this.cmd_add_f.setString(11, f.pathRel);//f_pathRel

            this.cmd_add_f.setString(12, f.md5);//f_md5

            this.cmd_add_f.setLong(13, f.lenLoc);//f_lenLoc

            this.cmd_add_f.setString(14, f.sizeLoc);//f_sizeLoc

            this.cmd_add_f.setLong(15, f.lenSvr);//f_lenSvr        

            this.cmd_add_f.setString(16, f.perSvr);//f_perSvr

            this.cmd_add_f.setBoolean(17, f.complete);//f_complete

            this.cmd_add_f.executeUpdate();

        } catch (SQLException e) {

            // TODO Auto-generated catch block

            e.printStackTrace ();

        }//

    }

   

    protected void save_folder(FileInf f)

    {

        try {

            this.cmd_add_fd.setString(1, f.id);//id

            this.cmd_add_fd.setString(2, f.pid);//pid

            this.cmd_add_fd.setString(3, f.pidRoot);//pidRoot

            this.cmd_add_fd.setString(4, f.nameSvr);//name

            this.cmd_add_fd.setInt(5, f.uid);//f_uid

            this.cmd_add_fd.setString(6, f.pathLoc);//pathLoc

            this.cmd_add_fd.setString(7, f.pathSvr);//pathSvr

            this.cmd_add_fd.setString(8, f.pathRel);//pathRel

            this.cmd_add_fd.setBoolean(9, f.complete);//complete

            this.cmd_add_fd.executeUpdate();

        } catch (SQLException e) {

            // TODO Auto-generated catch block

            e.printStackTrace ();

        }

    }

   

    public void scan(FileInf inf, String root) throws IOException, SQLException

    {

        this.makeCmdF();

        this.makeCmdFD();

        this.GetAllFiles(inf, root);

        this.cmd_add_f.close();

        this.cmd_add_fd.close();

        this.con.close();

    }

}

 

For uploading in blocks, the block processing logic should be the simplest logic. Up6 has divided the file into blocks, and has identified each block data. These identifications include the index, size, offset, and file of the file block. MD5, file block MD5 (need to open) and other information, the server can process it very conveniently after receiving this information. For example, save block data in a distributed storage system

Multi-part upload can be said to be the basis of our entire project, such as resumable upload and pause, all of which need to be divided into blocks.

Blocking is relatively simple. The front end adopts webuploader, and basic functions such as block have been encapsulated, which is easy to use.

With the file API provided to us by webUpload, the front end is extremely simple.

Front HTML template

this.GetHtmlFiles = function()

{

     var acx = "";

     acx += '<div class="file-item" id="tmpFile" name="fileItem">\

                <div class="img-box"><img name="file" src="js/file.png"/></div>\

                   <div class="area-l">\

                       <div class="file-head">\

                            <div name="fileName" class="name">HttpUploader程序开发.pdf</div>\

                            <div name="percent" class="percent">(35%)</div>\

                            <div name="fileSize" class="size" child="1">1000.23MB</div>\

                    </div>\

                       <div class="process-border"><div name="process" class="process"></div></div>\

                       <div name="msg" class="msg top-space">15.3MB 20KB/S 10:02:00</div>\

                   </div>\

                   <div class="area-r">\

                    <span class="btn-box" name="cancel" title="取消"><img name="stop" src="js/stop.png"/><div>取消</div></span>\

                    <span class="btn-box hide" name="post" title="继续"><img name="post" src="js/post.png"/><div>继续</div></span>\

                       <span class="btn-box hide" name="stop" title="停止"><img name="stop" src="js/stop.png"/><div>停止</div></span>\

                       <span class="btn-box hide" name="del" title="删除"><img name="del" src="js/del.png"/><div>删除</div></span>\

                   </div>';

     acx += '</div>';

     acx += '<div class="file-item" name="folderItem">\

                   <div class="img-box"><img name="folder" src="js/folder.png"/></div>\

                   <div class="area-l">\

                       <div class="file-head">\

                            <div name="fileName" class="name">HttpUploader程序开发.pdf</div>\

                            <div name="percent" class="percent">(35%)</div>\

                            <div name="fileSize" class="size" child="1">1000.23MB</div>\

                    </div>\

                       <div class="process-border top-space"><div name="process" class="process"></div></div>\

                       <div name="msg" class="msg top-space">15.3MB 20KB/S 10:02:00</div>\

                   </div>\

                   <div class="area-r">\

                    <span class="btn-box" name="cancel" title="取消"><img name="stop" src="js/stop.png"/><div>取消</div></span>\

                    <span class="btn-box hide" name="post" title="继续"><img name="post" src="js/post.png"/><div>继续</div></span>\

                       <span class="btn-box hide" name="stop" title="停止"><img name="stop" src="js/stop.png"/><div>停止</div></span>\

                       <span class="btn-box hide" name="del" title="删除"><img name="del" src="js/del.png"/><div>删除</div></span>\

                   </div>';

     acx += '</div>';

     acx += '<div class="files-panel" name="post_panel">\

                   <div name="post_head" class="toolbar">\

                       <span class="btn" name="btnAddFiles">Select multiple files</span>\

                       <span class="btn" name="btnAddFolder">Select a folder</span>\

                       <span class="btn" name="btnPasteFile">Paste files and directories</span>\

                       <span class="btn" name="btnSetup">安装控件</span>\

                   </div>\

                   <div class="content" name="post_content">\

                       <div name="post_body" class="file-post-view"></div>\

                   </div>\

                   <div class="footer" name="post_footer">\

                       <span class="btn-footer" name="btnClear">Clear completed files</span>\

                   </div>\

              </div>';

     return acx;

};

 

The division must be combined. The large file is fragmented, but there is no original file function after fragmentation, so we need to combine the fragments into the original file. We only need to write the fragments into the file according to their original positions. Because we have already talked about the previous principle, we know the block size and block serial number, so I can know the starting position of the block in the file. So it is wise to use RandomAccessFile here, RandomAccessFile can move back and forth inside the file. However, most of the functions of animalAccessFile have been replaced by the "memory-mapped files" of NIO of JDK1.4. In this project, I wrote about using RandomAccessFile and MappedByteBuffer to synthesize files. The corresponding methods are uploadFileRandomAccessFile and uploadFileByMappedByteBuffer. The two method codes are as follows.

Second transmission function

Server logic

I believe everyone has demonstrated the function of uploading in seconds. When uploading from the network disk, it was found that the uploaded files were uploaded in seconds. In fact, students who have studied the principle a little should know that it is actually to check the MD5 of the file, record the MD5 of the file uploaded to the system, get the MD5 value of the file content or partial value MD5 before uploading a file, and then check it on the matching system. data.

Breakpoint-http implements the principle of second transmission. After the client selects a file, it triggers to obtain the MD5 value of the file when clicking upload. After obtaining the MD5, it calls an interface of the system (/index/checkFileMd5) to query whether the MD5 already exists (I am in this project Use redis to store the data, use the MD5 value of the file as the key, and the value is the address of the file storage.) The interface returns to the check status, and then proceed to the next step. I believe everyone can understand by looking at the code.

Well, the MD5 value of the front end also uses the function of webuploader, which is still a good tool.

After the control calculates the MD5 of the file, it will trigger the md5_complete event and pass the value md5. Developers only need to handle this event.

 

http

Up6 has automatically processed the resumable transmission, and no development is required for separate processing.

Receive these parameters in f_post.jsp and process them. Developers only need to pay attention to business logic, not other aspects.

 

Resumable uploading means that the file upload process is interrupted. Human factors (pause) or force majeure (interrupted or poor network) caused the file upload to fail halfway. Then when the environment is restored, re-upload the file instead of re-uploading.

As mentioned earlier, the function of resumable upload is implemented based on block upload. A large file is divided into many small blocks. The server can land each block successfully uploaded, and the client is uploading. At the beginning of the file, the interface is called for quick verification, and the condition chooses to skip a certain block.

The realization principle is that before each file is uploaded, the MD5 value of the file is obtained, and the interface (/index/checkFileMd5, yes, it is also the check interface of the second pass) is called before the file is uploaded. If the status of the obtained file is incomplete, then Return the numbers of all the parts that have not been uploaded, and then the front end will perform condition screening to figure out which parts have not been uploaded, and then upload them.

When the file block is received, it can be directly written to the file on the server

This is the effect after uploading the folder

This is the storage structure on the server after the folder is uploaded

Reference article: http://blog.ncmem.com/wordpress/2019/08/12/java-http%E5%A4%A7%E6%96%87%E4%BB%B6%E6%96%AD%E7 %82%B9%E7%BB%AD%E4%BC%A0%E4%B8%8A%E4%BC%A0/

Welcome to join the group to discuss "374992201"

Guess you like

Origin blog.csdn.net/weixin_45525177/article/details/108593386