[Front-end interview] Zhongda file upload/download: medium file proxy server release + large file slice transmission + concurrent request + localstorage to achieve breakpoint resume

Table of contents

Medium file proxy server release: 10MB increments

proxy

nginx

Large file slices: 100MB units

Breakpoint: Store slice hash

Front-end solution A

localstorage

Backend solution B

Server

upload

front end

rear end

download

Partial download: response header Content-Range+206 status code [actual development]

front end

rear end

Multiple large file transfer: spark-md5

Hash collision

Summarize

Blob.prototype.slice slice

web-worker uses spark-md5 in the worker thread to calculate hash based on the file content.

promise.allSettled() concurrent requests


Medium file proxy server release: 10MB increments

proxy

proxy_bufferingto control whether proxy buffering is enabled,

proxy_buffer_sizeand proxy_buffersto resize the buffer

nginx

In the nginx.conf configuration file, find or add a http, serveror locationblock, depending on the scope of the modification you wish to make. In this block, add or modify client_max_body_sizethe directive

http {
    ...
    server {
        ...
        location /upload {
            client_max_body_size 100M;
            ...
        }
        ...
    }
    ...
}

Check the configuration file for syntax errors:

sudo nginx -t

If no errors are reported, reload Nginx for the configuration changes to take effect:

sudo systemctl reload nginx

For the React version, see: Front-end file streaming, slice download and upload: Optimizing file transfer efficiency and user experience - Nuggets

The <pre> tag defines preformatted text.

A common application of the <pre> tag is to represent computer source code

Blob (Binary Large Object) object: stores binary data

ArrayBuffer object type: cached binary data

Large file slices: 100MB units

Each fragment size typically ranges from a few hundred KB to several MB

Breakpoint: Store slice hash

Front-end solution A

localstorage

  1. Capacity limit:  Different browsers may have different limits, but usually the limit is between 5MB and 10MB. Sufficient for storing breakpoint subscripts

  2. Follow the same origin policy

  3. Persistence:  It also exists after closing, only if the user actively clears the browser cache or uses code to delete the data,

  4. Access synchronization, which may block when reading or writing large amounts of data

  5. Data type:  string

  6. Applicable scenarios: small capacity, non-sensitive, persistent data. If you need to handle larger volumes of data, or if you need to share data between different domains, consider IndexedDB or server-side storage.

In this way, the next upload can skip the previously uploaded part. There are two ways to implement the memory function.

Backend solution B

Server

The front-end solution has a flaw. If you change the browser, localstorage will become invalid, so the latter is recommended.

upload

front end

<template>
  <div>
    <input type="file" @change="handleFileChange" />
    <button @click="startUpload">Start Upload</button>
  </div>
</template>

<script>
export default {
  data() {
    return {
      file: null,
      chunkSize: 1024 * 1024, // 1MB
      totalChunks: 0,
      uploadedChunks: [],
    };
  },
  methods: {
    handleFileChange(event) {
      this.file = event.target.files[0];
    },
    startUpload() {
      if (this.file) {
        this.totalChunks = this.getTotalChunks();
        this.uploadedChunks = JSON.parse(localStorage.getItem('uploadedChunks')) || [];
        this.uploadChunks(0);
      }
    },
    uploadChunks(startChunk) {
      if (startChunk >= this.totalChunks) {
        console.log('Upload complete');
        localStorage.removeItem('uploadedChunks'); 
        return;
      }
      //模拟每次至多发起5个并发请求,实际开发中根据请求资源的限定决定?
      const endChunk = Math.min(startChunk + 5, this.totalChunks);

      const uploadPromises = [];
      for (let chunkIndex = startChunk; chunkIndex < endChunk; chunkIndex++) {
        if (!this.uploadedChunks.includes(chunkIndex)) {
          const startByte = chunkIndex * this.chunkSize;
          const endByte = Math.min((chunkIndex + 1) * this.chunkSize, this.file.size);
          const chunkData = this.file.slice(startByte, endByte);

          const formData = new FormData();
          formData.append('chunkIndex', chunkIndex);
          formData.append('file', chunkData);

          uploadPromises.push(
            fetch('/upload', {
              method: 'POST',
              body: formData,
            })
          );
        }
      }
      Promise.allSettled(uploadPromises)
        .then(() => {
          const newUploadedChunks = Array.from(
            new Set([...this.uploadedChunks, ...Array.from({ length: endChunk - startChunk }, (_, i) => i + startChunk)])
          );
          this.uploadedChunks = newUploadedChunks;
          localStorage.setItem('uploadedChunks', JSON.stringify(this.uploadedChunks));

          this.uploadChunks(endChunk);
        })
        .catch(error => {
          console.error('Error uploading chunks:', error);
        });
    },
    getTotalChunks() {
      return Math.ceil(this.file.size / this.chunkSize);
    },
  },
};
</script>

rear end

const express = require('express');
const path = require('path');
const fs = require('fs');
const multer = require('multer');
const app = express();
const chunkDirectory = path.join(__dirname, 'chunks');

app.use(express.json());
app.use(express.static(chunkDirectory));

const storage = multer.diskStorage({
  destination: chunkDirectory,
  filename: (req, file, callback) => {
    callback(null, `chunk_${req.body.chunkIndex}`);
  },
});

const upload = multer({ storage });

app.post('/upload', upload.single('file'), (req, res) => {
  const { chunkIndex } = req.body;
  console.log(`Uploaded chunk ${chunkIndex}`);
  res.sendStatus(200);
});

app.listen(3000, () => {
  console.log('Server started on port 3000');
});

download

Partial download: response header Content-Range+206 status code [actual development]

Content-Range: bytes <start>-<end>/<total>

206 status code indicates that the server successfully processed part of the request and returned the corresponding data range.

Game files on many platforms, such as Xbox and PlayStation, are also implemented in this way when downloading files exceeding 100 G.

front end

<template>
  <div>
    <button @click="startDownload">Start Download</button>
  </div>
</template>

<script>
import { saveAs } from 'file-saver';

export default {
  data() {
    return {
      totalChunks: 0,
      chunkSize: 1024 * 1024, // 默认1M
      fileNm: "file.txt",
      downloadedChunks: [],
      chunks: [], // 存储切片数据
      concurrentDownloads: 5, // 并发下载数量
    };
  },
  methods: {
    startDownload() {
      this.fetchMetadata();
    },
    fetchMetadata() {
      fetch('/metadata')
        .then(response => response.json())
        .then(data => {
          this.totalChunks = data.totalChunks;
          this.chunkSize = data.chunkSize;
          this.fileNm = data.fileNm;
          this.continueDownload();
        })
        .catch(error => {
          console.error('Error fetching metadata:', error);
        });
    },
   async continueDownload() {
      const storedChunks = JSON.parse(localStorage.getItem('downloadedChunks')) || [];
      this.downloadedChunks = storedChunks;

      const downloadPromises = [];
      let chunkIndex = 0;

      while (chunkIndex < this.totalChunks) {
        const chunkPromises = [];
        
        for (let i = 0; i < this.concurrentDownloads; i++) {
          if (chunkIndex < this.totalChunks && !this.downloadedChunks.includes(chunkIndex)) {
            chunkPromises.push(this.downloadChunk(chunkIndex));
          }
          chunkIndex++;
        }

        await Promise.allSettled(chunkPromises);
      }
// 当所有切片都下载完成时 合并切片
      this.mergeChunks();
    },
    
    downloadChunk(chunkIndex) {
      return new Promise((resolve, reject) => {
        const startByte = chunkIndex * this.chunkSize;
        const endByte = Math.min((chunkIndex + 1) * this.chunkSize, this.totalChunks * this.chunkSize);
 //我不太清楚实际开发中切片是靠idx,还是startByte、endByte,还是两者都用....
        fetch(`/download/${chunkIndex}?start=${startByte}&end=${endByte}`)
          .then(response => response.blob())
          .then(chunkBlob => {
            this.downloadedChunks.push(chunkIndex);
            localStorage.setItem('downloadedChunks', JSON.stringify(this.downloadedChunks));

            this.chunks[chunkIndex] = chunkBlob; // 存储切片数据

            resolve();
          })
          .catch(error => {
            console.error('Error downloading chunk:', error);
            reject();
          });
      });
    },
    mergeChunks() {
      const mergedBlob = new Blob(this.chunks);
      // 保存合并后的 Blob 数据到本地文件
      saveAs(mergedBlob, this.fileNm);
      // 清空切片数据和已下载切片的 localStorage
      this.chunks = [];
      localStorage.removeItem('downloadedChunks');
    },
  },
};
</script>

rear end

const express = require('express');
const path = require('path');
const fs = require('fs');
const app = express();
const chunkDirectory = path.join(__dirname, 'chunks');

app.use(express.json());

app.get('/metadata', (req, res) => {
  const filePath = path.join(__dirname, 'file.txt'); 
  const chunkSize = 1024 * 1024; // 1MB
  const fileNm='file.txt';
  const fileStats = fs.statSync(filePath);
  const totalChunks = Math.ceil(fileStats.size / chunkSize);
  res.json({ totalChunks, chunkSize, fileNm });
});

app.get('/download/:chunkIndex', (req, res) => {
  const chunkIndex = parseInt(req.params.chunkIndex);
  const chunkSize = 1024 * 1024; // 1MB
  const startByte = chunkIndex * chunkSize;
  const endByte = (chunkIndex + 1) * chunkSize;

  const filePath = path.join(__dirname, 'file.txt'); 

  fs.readFile(filePath, (err, data) => {
    if (err) {
      res.status(500).send('Error reading file.');
    } else {
      const chunkData = data.slice(startByte, endByte);
      res.send(chunkData);
    }
  });
});

app.listen(3000, () => {
  console.log('Server started on port 3000');
});

Multiple large file transfer: spark-md5

MD5 (Message Digest Algorithm 5): Hash function

若使用 文件名 + 切片下标 作为切片 hash, once the file name is modified, the effect will be lost.

所以应该用spark-md5根据文件内容生成 hash

webpack's contenthash is also implemented based on this idea

In addition, considering that if a very large file is uploaded, it is very time-consuming to read the content of the file and calculate the hash, and it will cause the page to freeze , so we use web-worker to calculate the hash in the worker thread, so that users can still use it normally on the main interface interaction引起 UI 的阻塞

// /public/hash.js

// 导入脚本
self.importScripts("/spark-md5.min.js");

// 生成文件 hash
self.onmessage = e => {
  const { fileChunkList } = e.data;
  const spark = new self.SparkMD5.ArrayBuffer();
  let percentage = 0;
  let count = 0;

  // 递归加载下一个文件块
  const loadNext = index => {
    const reader = new FileReader();
    reader.readAsArrayBuffer(fileChunkList[index].file);
    reader.onload = e => {
      count++;
      spark.append(e.target.result);

      // 检查是否处理完所有文件块
      if (count === fileChunkList.length) {
        self.postMessage({
          percentage: 100,
          hash: spark.end()
        });
        self.close();
      } else {
        // 更新进度百分比并发送消息
        percentage += 100 / fileChunkList.length;
        self.postMessage({
          percentage
        });

        // 递归调用以加载下一个文件块
        loadNext(count);
      }
    };
  };

  // 开始加载第一个文件块
  loadNext(0);
};
  1. Slicing hashing/transmission and other purposes are for

  2. Memory efficiency: For large files, loading the entire file into memory at once can cause excessive memory usage or even cause the browser to crash . By cutting the file into small chunks, only a single chunk needs to be manipulated during processing, reducing memory pressure.

  3. Performance Optimization: Passing the entire file directly to the hash function may result in long computation times , especially for large files. Divide the hash value into small blocks and calculate the hash value one by one . Multiple blocks can be processed in parallel to improve calculation efficiency.

  4. Error recovery: During the upload or download process, network interruptions or other errors may cause some file blocks to not be transferred successfully. By computing the hash in chunks, you can easily detect which chunks were not transmitted correctly, giving you a chance to recover or retransmit those chunks.

  5. // 生成文件 hash(web-worker)
    calculateHash(fileChunkList) {
      return new Promise(resolve => {
        // 创建一个新的 Web Worker,并加载指向 "hash.js" 的脚本
        this.container.worker = new Worker("/hash.js");
    
        // 向 Web Worker 发送文件块列表
        this.container.worker.postMessage({ fileChunkList });
    
        // 当 Web Worker 发送消息回来时触发的事件处理程序
        this.container.worker.onmessage = e => {
          const { percentage, hash } = e.data;
    
          // 更新 hash 计算进度
          this.hashPercentage = percentage;
    
          if (hash) {
            // 如果计算完成,解析最终的 hash 值
            resolve(hash);
          }
        };
      });
    },
    
    // 处理文件上传的函数
    async handleUpload() {
      if (!this.container.file) return;
    
      // 将文件划分为文件块列表
      const fileChunkList = this.createFileChunk(this.container.file);
    
      // 计算文件 hash,并将结果存储在容器中
      this.container.hash = await this.calculateHash(fileChunkList);
    
      // 根据文件块列表创建上传数据对象
      this.data = fileChunkList.map(({ file, index }) => ({
        fileHash: this.container.hash,
        chunk: file,
        hash: this.container.file.name + "-" + index,
        percentage: 0
      }));
    
      // 上传文件块
      await this.uploadChunks();
    }
    

Hash collision

The input space is usually larger than the output space and collisions cannot be completely avoided

Hash(A) = 21 % 10 = 1

Hash(B) = 31 % 10 = 1

Therefore, in the spark-md5 document, it is required to pass in all slices and calculate the hash value. You cannot directly put the entire file into the calculation, otherwise even different files will have the same hash

Summarize

Blob.prototype.slice slice

web-worker uses spark-md5 in the worker thread to calculate hash based on the file content.

promise.allSettled()并发请求

The interviewer Jie Jie smiled: Have you never done a large file upload function? Then go back and wait for the notification! - Nuggets

Guess you like

Origin blog.csdn.net/qq_28838891/article/details/132358193