Preface

Whether it is an interview or actual work, we will encounter problems with uploading large files. In fact, in my previous interview, I was also asked how to handle uploading large files (Excel). The stumbling answer at that time directly caused the entire interview to end in failure. I remembered the whole thing recently, and it took some time to sort out this demo. Of course, the whole article will not only talk about the thought process, but also the source code at the end.

Front end: Vue.js Element-Ui

Backend: node.js express fs

Ideas

front end

Large file upload

Convert large files into binary stream format
Use stream-cutting attributes to split the binary stream into multiple copies
Assemble and split the same number of request blocks, and issue requests in parallel or serial form
After we monitor that all requests are successfully sent, we will send a merged signal to the server

http

Add a different logo for each file cutting block
When the upload is successful, record the identifier of the successful upload
When we pause or send failed, we can resend the cut files that were not uploaded successfully

rear end

Receive each cut file, save it to the designated location after receiving it successfully, and tell the front-end to receive it successfully
Receive the merge signal, sort and merge all the cutting files to generate the final large file, then delete the small cutting file, and inform the front end of the address of the large file

Actually speaking, if you understand and understand the above ideas, then you have learned 80% of large file upload + resumable upload. The following specific implementation process is a trivial matter to you...

Large file upload code part

In this html part, we used Element-Ui components, the code is easy to understand

<el-upload
      drag
      action
      :auto-upload="false" 
      :show-file-list="false" 
      :on-change="changeFile"
      >
      <i class="el-icon-upload"></i>
      <div class="el-upload__text">将文件拖到此处，或<em>点击上传</em></div>
</el-upload>

The logic of the js part, according to our above analysis, we can write the following structure

methods: {
    // 提交文件后触发
    changeFile() {
        this.filepParse()

        // coding... 进行分片
        // ...
        
        // 创建切片请求
        this.createSendQeq()
        this.sendQeq()
        this.mergeUpload
    },
    // 将文件变成二进制，方便后续分片
    filepParse() {
    },
    // 创建切片请求
    createSendQeq() {
    },
    // 将每一个切片 并行/串行 的方式发出
    sendQeq() {
    },
    // 发送代码合并请求
    mergeUpload() {
    }
  }

According to the above code, it is very easy to write such a structure, the next thing to do is to complete these logics

Turn the file into binary to facilitate subsequent fragmentation

The common binary formats of js are Blob, ArrayBuffer and Buffe. Instead of using Blobs commonly used in other articles, we use ArrayBuffer. Because our resolution process takes a long time, we use promise and asynchronous processing.

filepParse(file, type) {
  const caseType = {
    'base64': 'readAsDataURL',
    'buffer': 'readAsArrayBuffer'
  }
  const fileRead = new FileReader()
  return new Promise(resolve => {
    fileRead[caseType[type]](file)
    fileRead.onload = (res) => {
      resolve(res.target.result)
    }
  })
}

Fragmenting large files

After we get the specific binary stream, we can partition it, which is as convenient as manipulating an array.

Of course, when we split and slice large files, we must also consider the merging of large files, so our split must be regular, such as 1-1, 1-2, 1-3, 1-5, to When the server gets the slice data, when the merge signal is received, the slices can be sorted and merged.

At the same time, in order to avoid multiple uploads of the same file (name changed), we introduced spark-md5 to generate a hash value according to the specific file content

const buffer = await this.filepParse(file,'buffer')
      
const sparkMD5 = new SparkMD5.ArrayBuffer()

sparkMD5.append(buffer)
this.hash = sparkMD5.end()

When we named each slice, we also changed it to hash-1 and hash-2.

When we divide a large file, we can use fixed number of slices and fixed slice size. There are two ways. We use the simple method of fixed number of slices as an example.

const partSize = file.size / 10
let current = 0

 for (let i = 0 ;i < 10 ;i++) {
   let reqItem = {
     chunk: file.slice(current, current + partSize),
     filename: `${this.hash}_${i}.${suffix}`
   }
   current += partSize
   partList.push(reqItem)
 }
 this.partList = partList

When we use the method of determining the number of slices, we cut our large file and save the cut data to an array variable, and then we can encapsulate the slice request

Create slice request

It should be noted here that the data we send out is in FormData data format.

createSendQeq() {
    const reqPartList = []
    this.partList.forEach((item,index) => {
      const reqFn = () => {
        const formData = new FormData();
        formData.append("chunk", item.chunk);
        formData.append("filename", item.filename);
        return axios.post("/upload",formData,{
          headers: {"Content-Type": "multipart/form-data"}
        }).then(res => {
          console.log(res)
        })
      }
      reqPartList.push(reqFn)
    })
    return reqPartList
}

Send each slice in parallel/serial mode

Now the slices have been divided, and our request has also been packaged. Currently we have two schemes parallel/serial because serial is easy to understand, here is an example of serial.

Every time we send a request successfully, then our corresponding subscript is increased by one to prove that our sending is successful. When the i subscript is the same as our number of slices, we send it successfully by default, triggering a merge request

sendQeq() {
 const reqPartList = this.createSendQeq()
  let i  = 0 
  let send = async () => {
    if (i >= reqPartList.length) { 
      // 上传完成
      this.mergeUpload()
      return 
    }
    await reqPartList[i]()
    i++
    send()
  }
  send()
  
}

Of course, the biggest disadvantage of parallel transmission is that it is not as fast as serial, but the advantage is that the code is simple and easy to understand.

Resuming the code part of the breakpoint

After understanding the previous ideas, this part of the code is also easy to get ideas. When you click Pause, stop uploading. If you click to continue uploading, we will continue to upload the remaining requests, so we need to process the successful upload request

if (res.data.code === 0) {
    this.count += 1;
    // 传完的切片我们把它移除掉
    this.partList.splice(index, 1);
}

If the upload is successful, we will strip it out of the request array, so that we can ensure that the rest is to be uploaded. This is the core of the resumable upload, and the others are some businesses...

summary of a problem

The current example only provides a simple idea, such as:

What should I do if a slice fails to upload?
How to upload in parallel (faster)
Currently, the number of slices is stipulated. How to deal with the fixed slice size
What to do if the page is refreshed during upload
How to upload large files combined with Web Work to handle large file uploads
How to realize the second pass
Finally, if you need these materials, you can click here to get them

Finally sorted out the knowledge points of "front-end large file upload", resuming the upload with the source code