Large file upload/download

I. Introduction

Uploading and downloading large files has always been a hot topic that is commonly used and often tested in the front end. This article will introduce the idea of large files 上传/ 下载and the front-end implementation code.

2. Partial upload

overall process

To slice the file, after selecting the file, use the slice method on the obtained file object to slice it according to the specified size. Generally speaking, it is to split the file (because the object is based on, the Blob instance Filehas Blobmethods Blob.prototype.slice();
Using spark-md5.min.jsthe calculation of the full Hash of the entire file, this step can be web workerperformed in the middle, so as not to cause the browser to freeze;
Pass the obtained file hash value to the backend to determine whether the file exists:
if it exists, "秒传"the condition is met, and the upload is successful.
If it does not exist, query “断点续传“whether there is a slice file of the file, and if it exists, return the detailed information of the slice file;
If it is not uploaded, upload all of it, if a section is uploaded, upload the remaining file fragments (control concurrency and resuming upload);
After the upload is complete, notify the backend to merge, and the backend will return the file address after the merge is completed.

If it does not exist, query whether there is a slice file of the file, and if so, return the detailed information of the slice file.

1. File Slicing

/**
 * 开始文件切片
 * @param file
 * @param size
 * @returns {*[]}
 */
const CHUNK_SIZE = 1024 * 50;   //50KB 指定切片大小
function splitFile(file, size = CHUNK_SIZE) {
    
    
  const fileChunkList = []
  let curChunkIndex = 0
  while (curChunkIndex < file.size) {
    
    
    const chunk = file.slice(curChunkIndex, curChunkIndex + size)
    fileChunkList.push(chunk)
    curChunkIndex += size
  }
  return fileChunkList
}

2. Determine whether the file exists (full verification)

Example of web worker usage:

//主线程代码
const worker  = new Worker('worker.js') //创建worker,worker.js是你要执行的脚本路径同级目录下直接写名字
worker.postMessage(files) //将得到的文件对象传递给worker线程
worker.onmessage = (e)=>{
    
     // 接受worder传递回来的参数
  console.log(e.data) //参数在e.data中
}

// worker.js  worker文件代码
onmessage = (e)=>{
    
    
  const files = e.data //拿到文件对象后既可以执行相关切片操作了
}

Here we still slice in the main thread, but put the verification of whether the file exists in the web worker

// 计算hash值
const chunkList = splitFile(file) //上面分割完的文件切片
function calculateHash(chunkList) {
    
    
  return new Promise(resolve => {
    
    
    const woker = new Worker('/hash.js')
    woker.postMessage({
    
    chunkList})
    woker.onmessage = e => {
    
    
      const {
    
    hash} = e.data;
      if (hash) {
    
    
        resolve(hash)
      }
    }
  })
}

hash.js web worker , you need to import spark-md5.min.jsthe slice calculation file md5值
Note:

The md5 value of each file is unique, spark.end()which is the md5 value of the file.
Since 浏览器有并发请求数限制, if all requests are sent at the same time, requests exceeding the limit will wait and may time out, so we need to control the slice 并发池upload

/**
 * 创建web worker 进行文件校验计算hash值
 * @param e
 */
self.onmessage = e => {
    
    
  self.importScripts("/spark-md5.min.js");
  const {
    
     chunkList } = e.data;
  const spark = new self.SparkMD5.ArrayBuffer();
  let percentage = 0;
  let count = 0;
  const loadNext = index => {
    
    
    const reader = new FileReader();
    reader.readAsArrayBuffer(chunkList[index]);
    reader.onload = e => {
    
    
      count++;
      spark.append(e.target.result);
      if (count === chunkList.length) {
    
    
        self.postMessage({
    
    
          percentage: 100,
          hash: spark.end()
        });
        self.close();
      } else {
    
    
        percentage += 100 / chunkList.length;
        self.postMessage({
    
    
          percentage: Number.parseFloat(percentage).toFixed(2)
        });
        // calculate recursively
        loadNext(count);
      }
    };
  };
  loadNext(0);
};

3. Slice upload

    // 获取已经上传了多少片段  hash：整个文件的md5值
    const uploadRes = await getUploadedPartList(hash)
    let fileData = fileChunkList.map((file, index) => ({
    
    
        fileHash: hash, // 文件唯一hash值
        hash,
        file, // 文件bolb
        index,
        size: file.size, // 文件大小
        percentage: uploadRes.data.includes(index) ? 100 : 0
      })
    )
	// 开始上传片段
    let targetRes = await uploadChunks(uploadRes.data, hash, fileData, file, loaded => {
    
    
      onProgress(parseInt((loaded / file.size).toFixed(2)))
    })
    resolve(targetRes)

// 开始上传切片 返回上传进度
/**
 * @param uploadList 已上传的切片
 * @param hash       整个文件的md5值
 * @param fileData   文件切片
 * @param fileOriginalData  文件源数据
 * @param onProgress  进度回调函数
 * @returns {Promise<any>}
 */
async function uploadChunks(uploadList = [], hash, fileData = [], fileOriginalData, onProgress) {
    
    
  return new Promise(async (resolve, reject) => {
    
    
      let pool = []//并发池
      let max = 3 //最大并发量
      let finish = 0//完成的数量
      let failList = []//失败的列表
      
      //获取未上传的切片数组 并格式化切片数据
      const requestList = fileData.filter(({
     
     index}) => !uploadList.includes(index)).map(({
     
     file,fileHash, index}) => {
    
    
      const formData = new FormData()
      formData.append('file', file)
      formData.append('uploadId', fileHash)
      formData.append('partNumber', index)
      return {
    
    formData}
    })
    
    // 控制并发和断点续传
    for(let i=0;i<requestList.length;i++){
    
    
      let item = requestList[i]
      let task = axios.post('xxxxx',{
    
    params: item.formData})
      let task = axios({
    
    
          url: 'xxxx',
          method: 'POST',
          data: item.formData,
          isUpload: 1,
          //监测切片上传进度
          onUploadProgress: e => {
    
    
            fileData[index].percentage = parseInt(String((e.loaded / e.total) * 100))
            const loaded = fileData.map(i => i.size * i.percentage).reduce((acc, cur) => acc + cur);
            onProgress(loaded)
          },
      }))
      task.then(()=>{
    
    
          //请求结束后将该Promise任务从并发池中移除
          let index = pool.findIndex(t=> t===task)
          pool.splice(index)
      }).catch(()=>{
    
    
          failList.push(item)
      }).finally(()=>{
    
    
          finish++
          //所有请求都请求完成，通知后端合并文件
          if(finish===list.length){
    
    
              mergeRequestList()
          }
      })
      pool.push(task)
      if(pool.length === max){
    
    
          //每当并发池跑完一个任务，就再塞入一个任务
          await Promise.race(pool)
      }
  }
}

// 上传完成后通知后端合并文件
function mergeRequestList(hash, file) {
    
    
  return new Promise(async (resolve, reject) => {
    
    
    let {
    
    data} = await axios({
    
    
      url:'xxx',
      method: 'POST',
      data,
      isUpload: 0
    })
    if (data.success) {
    
    
      Message.success('上传成功!')
      resolve(data.data)
    } else {
    
    
      reject(data.message)
    }
  })
}
/**
 * 自定义axios
 * @param url
 * @param method
 * @param data
 * @param isUpload
 * @param onUploadProgress
 * @returns {Promise<any>}
 */
function request({
    
    url, method = 'post', data, isUpload, onUploadProgress = e => e}) {
    
    
  const service = axios.create({
    
    
    baseURL,
    timeout: 0,
    onUploadProgress,
    headers: {
    
    
      'Content-Type': 'application/json'
    },
  })
  if (isUpload) {
    
    
    axios.defaults.headers.post['Content-Type'] = 'multipart/form-data'
    axios.defaults.headers.post['UpLoadFile'] = '1'
  }
  service.defaults.headers.common['Authorization'] = `Bearer ${
      
      token}`
  service.interceptors.response.use(response => {
    
    
    if (response) {
    
    
      return Promise.resolve(response)
    }
  }, error => {
    
    
    return Promise.reject(error)
  })
  return service.request({
    
    
    url,
    method,
    data
  })
}

3. Fragment download

Principle:
The server uses the HTTP response header Accept-Rangesto identify itself as supporting range requests (partial requests). The concrete value of the field is used to define the unit of the range request.
When the browser finds Accept-Rangesthe header, it can try to resume the interrupted download instead of restarting it.

The server needs to support the Range request header

Description of Range

In a Range header, multiple parts can be requested at one time, and the server will return it in the form of a multipart file.
The status code is 206 Partial Content: The server returned a range response.
The status code is 416 Range Not Satisfiable: Indicates that the requested range is illegal and the client is wrong.
The status code is 200 : the server allows to ignore the Range header and return the entire file\

Instructions

Range: <unit>=<range-start>-
Range: <unit>=<range-start>-<range-end>
Range: <unit>=<range-start>-<range-end>, <range-start>-<range-end>
Range: <unit>=<range-start>-<range-end>, <range-start>-<range-end>, <range-start>-<range-end>

unit : The unit used by the range request, usually bytes.
range-start : The starting value of the range, an integer.
range-end : The end value of the range, optional, if not present, it will continue to the end of the file.

Simple implementation

A resource pointed to by a url, if the browser can parse it, it will render it, and if it can’t parse it, it will be downloaded directly, so if it is .html/.jpg/.mp4, it will be rendered in a tab, if it is .rar/.zip, etc. It will be downloaded directly.

If you need to download all resources directly, regardless of the media type, you need to add Content-Disposition: attachment to the response header.

app.use(async (ctx, next) => {
    
    
  fileFilter(ctx)
  await next()
})
function fileFilter(ctx){
    
    
  const url = ctx.request.url
  const p = /^\/files\//
  if(p.test(url)){
    
    
    ctx.set('Accept-Ranges', 'bytes')
    ctx.set('Content-Disposition', 'attachment')
  }
}

breakpoint download

The overall process is as follows:

Get the file size
First, we need to get the total size of the file, so as to calculate the fragmentation range and download it in pieces.

// 获取待下载文件的大小
async getFileSize (name = this.fileName) {
  try {
    const res = await http.get(`/size/${name}`)
    this.fileSize = res.data.data
    return res.data.data
  } catch (error) {
    console.log({ error })
  }
}

Calculate the number of shards based on file size and shard size

    const CHUNK_SIZE = 10 * 1024 * 1024 // 一个分片10MB
    async onDownload () {
    
    
      try {
    
    
        // 根据文件大小和分片大小计算分片数量
        const fileSize = await this.getFileSize(this.fileName)
        const chunksCount = Math.ceil(fileSize / CHUNK_SIZE)
	// 使用 asyncPool 实现并发下载
        const results = await asyncPool(3, [...new Array(chunksCount).keys()], (i) => {
    
    
          const start = i * CHUNK_SIZE
          const end = i + 1 === chunksCount ? fileSize : (i + 1) * CHUNK_SIZE - 1
          return this.getBinaryContent(start, end, i)
        })
        results.sort((a, b) => a.index - b.index)
        // 根据分片结果数组构建新的 Blob 对象
        const buffers = new Blob(results.map((r) => r.data.data))
        // 文件合并与下载
        saveFile(this.fileName, buffers)
      } catch (error) {
    
    
        console.log({
    
     error })
      }
    }

The above uses asyncPool to realize the concurrent download of file fragments. The specific implementation of this function is:

async function asyncPool(poolLimit, array, iteratorFn) {
    
    
  const allTask = [] // 存储所有的异步任务
  const executing = [] // 存储正在执行的异步任务
  for (const item of array) {
    
    
    // 调用 iteratorFn 函数创建异步任务
    const p = Promise.resolve().then(() => iteratorFn(item, array))
    allTask.push(p) // 保存新的异步任务

    // 当 poolLimit 值小于或等于总任务个数时，进行并发控制
    if (poolLimit <= array.length) {
    
    
      // 当任务完成后，从正在执行的任务数组中移除已完成的任务
      const e = p.then(() => executing.splice(executing.indexOf(e), 1))
      executing.push(e) // 保存正在执行的异步任务
      if (executing.length >= poolLimit) {
    
    
        await Promise.race(executing) // 等待较快的任务执行完成
      }
    }
  }
  return Promise.all(allTask)
}

Request to download slice content
Set the start and end of the request header Accept-Rangesto get the slice content of the file. Because all slices need to be merged after downloading the file, the index value of the file slice is also required.

    /**
     * 下载分片内容
     * @param {*} start
     * @param {*} end
     * @param {*} i
     * @param {*} ifRange
     */
    async getBinaryContent (start, end, i, ifRange = true) {
    
    
      try {
    
    
        let options = {
    
    
          responseType: "blob",
        }
        // 如果需要分片下载，则加上 Range 请求头
        if (ifRange) {
    
    
          options.headers = {
    
    
            Range: `bytes=${
      
      start}-${
      
      end}`
          }
        }
        const result = await http.get(`/down/${
      
      this.fileName}`, options);
        return {
    
     index: i, data: result };
      } catch (error) {
    
    
        return {
    
    }
      }
    },

File merging and downloading
The file data we get is ArrayBuffertype, this data cannot be directly manipulated, so we need to use type array to operate it, here we use type array Unit8Arrayto merge file data, and finally BlobUrldownload the file by generating.

const saveFile = (name, buffers, mime = 'application/octet-stream') => {
    
    
    //文件合并，result为合并的文件数据
    if (!buffers.length) return
    const totalLength = buffers.reduce((acc, value) => acc + value.length, 0)
    const result = new Uint8Array(totalLength)
    let length = 0
    for (const array of buffers) {
    
    
    result.set(array, length)
    length += array.length
    }
    //文件下载
    const blob = new Blob([result], {
    
     type: mime })
    const blobUrl = URL.createObjectURL(blob)
    const a: HTMLAnchorElement = document.createElement('a')
    a.download = filename
    a.href = blobUrl
    a.click()
    URL.revokeObjectURL(blobUrl)
}