Table of contents
Medium file proxy server release: 10MB increments
Large file slices: 100MB units
Partial download: response header Content-Range+206 status code [actual development]
Multiple large file transfer: spark-md5
web-worker uses spark-md5 in the worker thread to calculate hash based on the file content.
promise.allSettled() concurrent requests
Medium file proxy server release: 10MB increments
proxy
proxy_buffering
to control whether proxy buffering is enabled,
proxy_buffer_size
and proxy_buffers
to resize the buffer
nginx
In the nginx.conf configuration file, find or add a http
, server
or location
block, depending on the scope of the modification you wish to make. In this block, add or modify client_max_body_size
the directive
http {
...
server {
...
location /upload {
client_max_body_size 100M;
...
}
...
}
...
}
Check the configuration file for syntax errors:
sudo nginx -t
If no errors are reported, reload Nginx for the configuration changes to take effect:
sudo systemctl reload nginx
For the React version, see: Front-end file streaming, slice download and upload: Optimizing file transfer efficiency and user experience - Nuggets
The <pre> tag defines preformatted text.
A common application of the <pre> tag is to represent computer source code
Blob (Binary Large Object) object: stores binary data
ArrayBuffer object type: cached binary data
Large file slices: 100MB units
Each fragment size typically ranges from a few hundred KB to several MB
Breakpoint: Store slice hash
Front-end solution A
localstorage
-
Capacity limit: Different browsers may have different limits, but usually the limit is between 5MB and 10MB. Sufficient for storing breakpoint subscripts
-
Follow the same origin policy
-
Persistence: It also exists after closing, only if the user actively clears the browser cache or uses code to delete the data,
-
Access synchronization, which may block when reading or writing large amounts of data
-
Data type: string
-
Applicable scenarios: small capacity, non-sensitive, persistent data. If you need to handle larger volumes of data, or if you need to share data between different domains, consider IndexedDB or server-side storage.
In this way, the next upload can skip the previously uploaded part. There are two ways to implement the memory function.
Backend solution B
Server
The front-end solution has a flaw. If you change the browser, localstorage will become invalid, so the latter is recommended.
upload
front end
<template>
<div>
<input type="file" @change="handleFileChange" />
<button @click="startUpload">Start Upload</button>
</div>
</template>
<script>
export default {
data() {
return {
file: null,
chunkSize: 1024 * 1024, // 1MB
totalChunks: 0,
uploadedChunks: [],
};
},
methods: {
handleFileChange(event) {
this.file = event.target.files[0];
},
startUpload() {
if (this.file) {
this.totalChunks = this.getTotalChunks();
this.uploadedChunks = JSON.parse(localStorage.getItem('uploadedChunks')) || [];
this.uploadChunks(0);
}
},
uploadChunks(startChunk) {
if (startChunk >= this.totalChunks) {
console.log('Upload complete');
localStorage.removeItem('uploadedChunks');
return;
}
//模拟每次至多发起5个并发请求,实际开发中根据请求资源的限定决定?
const endChunk = Math.min(startChunk + 5, this.totalChunks);
const uploadPromises = [];
for (let chunkIndex = startChunk; chunkIndex < endChunk; chunkIndex++) {
if (!this.uploadedChunks.includes(chunkIndex)) {
const startByte = chunkIndex * this.chunkSize;
const endByte = Math.min((chunkIndex + 1) * this.chunkSize, this.file.size);
const chunkData = this.file.slice(startByte, endByte);
const formData = new FormData();
formData.append('chunkIndex', chunkIndex);
formData.append('file', chunkData);
uploadPromises.push(
fetch('/upload', {
method: 'POST',
body: formData,
})
);
}
}
Promise.allSettled(uploadPromises)
.then(() => {
const newUploadedChunks = Array.from(
new Set([...this.uploadedChunks, ...Array.from({ length: endChunk - startChunk }, (_, i) => i + startChunk)])
);
this.uploadedChunks = newUploadedChunks;
localStorage.setItem('uploadedChunks', JSON.stringify(this.uploadedChunks));
this.uploadChunks(endChunk);
})
.catch(error => {
console.error('Error uploading chunks:', error);
});
},
getTotalChunks() {
return Math.ceil(this.file.size / this.chunkSize);
},
},
};
</script>
rear end
const express = require('express');
const path = require('path');
const fs = require('fs');
const multer = require('multer');
const app = express();
const chunkDirectory = path.join(__dirname, 'chunks');
app.use(express.json());
app.use(express.static(chunkDirectory));
const storage = multer.diskStorage({
destination: chunkDirectory,
filename: (req, file, callback) => {
callback(null, `chunk_${req.body.chunkIndex}`);
},
});
const upload = multer({ storage });
app.post('/upload', upload.single('file'), (req, res) => {
const { chunkIndex } = req.body;
console.log(`Uploaded chunk ${chunkIndex}`);
res.sendStatus(200);
});
app.listen(3000, () => {
console.log('Server started on port 3000');
});
download
Partial download: response header Content-Range+206 status code [actual development]
Content-Range: bytes <start>-<end>/<total>
206 status code indicates that the server successfully processed part of the request and returned the corresponding data range.
Game files on many platforms, such as Xbox and PlayStation, are also implemented in this way when downloading files exceeding 100 G.
front end
<template>
<div>
<button @click="startDownload">Start Download</button>
</div>
</template>
<script>
import { saveAs } from 'file-saver';
export default {
data() {
return {
totalChunks: 0,
chunkSize: 1024 * 1024, // 默认1M
fileNm: "file.txt",
downloadedChunks: [],
chunks: [], // 存储切片数据
concurrentDownloads: 5, // 并发下载数量
};
},
methods: {
startDownload() {
this.fetchMetadata();
},
fetchMetadata() {
fetch('/metadata')
.then(response => response.json())
.then(data => {
this.totalChunks = data.totalChunks;
this.chunkSize = data.chunkSize;
this.fileNm = data.fileNm;
this.continueDownload();
})
.catch(error => {
console.error('Error fetching metadata:', error);
});
},
async continueDownload() {
const storedChunks = JSON.parse(localStorage.getItem('downloadedChunks')) || [];
this.downloadedChunks = storedChunks;
const downloadPromises = [];
let chunkIndex = 0;
while (chunkIndex < this.totalChunks) {
const chunkPromises = [];
for (let i = 0; i < this.concurrentDownloads; i++) {
if (chunkIndex < this.totalChunks && !this.downloadedChunks.includes(chunkIndex)) {
chunkPromises.push(this.downloadChunk(chunkIndex));
}
chunkIndex++;
}
await Promise.allSettled(chunkPromises);
}
// 当所有切片都下载完成时 合并切片
this.mergeChunks();
},
downloadChunk(chunkIndex) {
return new Promise((resolve, reject) => {
const startByte = chunkIndex * this.chunkSize;
const endByte = Math.min((chunkIndex + 1) * this.chunkSize, this.totalChunks * this.chunkSize);
//我不太清楚实际开发中切片是靠idx,还是startByte、endByte,还是两者都用....
fetch(`/download/${chunkIndex}?start=${startByte}&end=${endByte}`)
.then(response => response.blob())
.then(chunkBlob => {
this.downloadedChunks.push(chunkIndex);
localStorage.setItem('downloadedChunks', JSON.stringify(this.downloadedChunks));
this.chunks[chunkIndex] = chunkBlob; // 存储切片数据
resolve();
})
.catch(error => {
console.error('Error downloading chunk:', error);
reject();
});
});
},
mergeChunks() {
const mergedBlob = new Blob(this.chunks);
// 保存合并后的 Blob 数据到本地文件
saveAs(mergedBlob, this.fileNm);
// 清空切片数据和已下载切片的 localStorage
this.chunks = [];
localStorage.removeItem('downloadedChunks');
},
},
};
</script>
rear end
const express = require('express');
const path = require('path');
const fs = require('fs');
const app = express();
const chunkDirectory = path.join(__dirname, 'chunks');
app.use(express.json());
app.get('/metadata', (req, res) => {
const filePath = path.join(__dirname, 'file.txt');
const chunkSize = 1024 * 1024; // 1MB
const fileNm='file.txt';
const fileStats = fs.statSync(filePath);
const totalChunks = Math.ceil(fileStats.size / chunkSize);
res.json({ totalChunks, chunkSize, fileNm });
});
app.get('/download/:chunkIndex', (req, res) => {
const chunkIndex = parseInt(req.params.chunkIndex);
const chunkSize = 1024 * 1024; // 1MB
const startByte = chunkIndex * chunkSize;
const endByte = (chunkIndex + 1) * chunkSize;
const filePath = path.join(__dirname, 'file.txt');
fs.readFile(filePath, (err, data) => {
if (err) {
res.status(500).send('Error reading file.');
} else {
const chunkData = data.slice(startByte, endByte);
res.send(chunkData);
}
});
});
app.listen(3000, () => {
console.log('Server started on port 3000');
});
Multiple large file transfer: spark-md5
MD5 (Message Digest Algorithm 5): Hash function
若使用 文件名 + 切片下标 作为切片 hash
, once the file name is modified, the effect will be lost.
所以应该用spark-md5根据文件内容生成 hash
webpack's contenthash is also implemented based on this idea
In addition, considering that if a very large file is uploaded, it is very time-consuming to read the content of the file and calculate the hash, and it will cause the page to freeze , so we use web-worker to calculate the hash in the worker thread, so that users can still use it normally on the main interface interaction引起 UI 的阻塞
// /public/hash.js
// 导入脚本
self.importScripts("/spark-md5.min.js");
// 生成文件 hash
self.onmessage = e => {
const { fileChunkList } = e.data;
const spark = new self.SparkMD5.ArrayBuffer();
let percentage = 0;
let count = 0;
// 递归加载下一个文件块
const loadNext = index => {
const reader = new FileReader();
reader.readAsArrayBuffer(fileChunkList[index].file);
reader.onload = e => {
count++;
spark.append(e.target.result);
// 检查是否处理完所有文件块
if (count === fileChunkList.length) {
self.postMessage({
percentage: 100,
hash: spark.end()
});
self.close();
} else {
// 更新进度百分比并发送消息
percentage += 100 / fileChunkList.length;
self.postMessage({
percentage
});
// 递归调用以加载下一个文件块
loadNext(count);
}
};
};
// 开始加载第一个文件块
loadNext(0);
};
-
Slicing hashing/transmission and other purposes are for
-
Memory efficiency: For large files, loading the entire file into memory at once can cause excessive memory usage or even cause the browser to crash . By cutting the file into small chunks, only a single chunk needs to be manipulated during processing, reducing memory pressure.
-
Performance Optimization: Passing the entire file directly to the hash function may result in long computation times , especially for large files. Divide the hash value into small blocks and calculate the hash value one by one . Multiple blocks can be processed in parallel to improve calculation efficiency.
-
Error recovery: During the upload or download process, network interruptions or other errors may cause some file blocks to not be transferred successfully. By computing the hash in chunks, you can easily detect which chunks were not transmitted correctly, giving you a chance to recover or retransmit those chunks.
-
// 生成文件 hash(web-worker) calculateHash(fileChunkList) { return new Promise(resolve => { // 创建一个新的 Web Worker,并加载指向 "hash.js" 的脚本 this.container.worker = new Worker("/hash.js"); // 向 Web Worker 发送文件块列表 this.container.worker.postMessage({ fileChunkList }); // 当 Web Worker 发送消息回来时触发的事件处理程序 this.container.worker.onmessage = e => { const { percentage, hash } = e.data; // 更新 hash 计算进度 this.hashPercentage = percentage; if (hash) { // 如果计算完成,解析最终的 hash 值 resolve(hash); } }; }); }, // 处理文件上传的函数 async handleUpload() { if (!this.container.file) return; // 将文件划分为文件块列表 const fileChunkList = this.createFileChunk(this.container.file); // 计算文件 hash,并将结果存储在容器中 this.container.hash = await this.calculateHash(fileChunkList); // 根据文件块列表创建上传数据对象 this.data = fileChunkList.map(({ file, index }) => ({ fileHash: this.container.hash, chunk: file, hash: this.container.file.name + "-" + index, percentage: 0 })); // 上传文件块 await this.uploadChunks(); }
Hash collision
The input space is usually larger than the output space and collisions cannot be completely avoided
Hash(A) = 21 % 10 = 1
Hash(B) = 31 % 10 = 1
Therefore, in the spark-md5 document, it is required to pass in all slices and calculate the hash value. You cannot directly put the entire file into the calculation, otherwise even different files will have the same hash