Table of contents
2. Drag + paste + style optimization
3. Breakpoint resume + second transmission + progress bar
Breakpoint resume + second transmission (front end)
Breakpoint resume + second transmission (backend)
4. Sampling hash and webWorker
Determine file type by file header
6. Asynchronous concurrency control (important)
This article is suitable for front-end students who have a certain node back-end foundation. If you don’t know the back-end at all, please make up for the pre-knowledge.
Without further ado, let's get straight to the point.
Let's take a look at what the file upload components of each version look like
grade | Function |
---|---|
Bronze - Trash | native+axios.post |
Silver - experience upgrade | Paste, drag and drop, progress bar |
Gold - Feature Upgrade | Breakpoint resume, second transmission, type judgment |
Platinum - Speed Upgrade | web-worker, time slicing, sampling hash |
Diamond - Network Upgrade | Asynchronous concurrency control, slice error retry |
King - finely crafted | Slow start control, defragmentation and more |
1. The easiest file upload
For file upload, we need to get the file object, and then use formData to send it to the backend to receive it
function upload(file){
let formData = new FormData();
formData.append('newFile', file);
axios.post(
'http://localhost:8000/uploader/upload',
formData,
{ headers: { 'Content-Type': 'multipart/form-data' } }
)
}
2. Drag + paste + style optimization
Too lazy to write, you can look for libraries on the Internet, there are everything on the Internet, or directly component libraries to solve the problem
3. Breakpoint resume + second transmission + progress bar
file slice
We divide a file into multiple small pieces, save them in an array, and send them to the backend one by one to realize breakpoint resume transmission.
// 计算文件hash作为id
const { hash } = await calculateHashSample(file)
//todo 生成文件分片列表
// 使用file.slice()将文件切片
const fileList = [];
const count = Math.ceil(file.size / globalProp.SIZE);
const partSize = file.size / count;
let cur = 0 // 记录当前切片的位置
for (let i = 0; i < count; i++) {
let item = {
chunk: file.slice(cur, cur + partSize),
filename: `${hash}_${i}`
};
fileList.push(item);
}
calculate hash
In order to let the backend know that this slice is a part of a file so that it can be aggregated into a complete file. We need to calculate the unique value (md5) of the complete file as the file name of the slice.
// 通过input的event获取到file
<input type="file" @change="getFile">
// 使用SparkMD5计算文件hash,读取文件为blob,计算hash
let fileReader = new FileReader();
fileReader.onload = (e) => {
let hexHash = SparkMD5.hash(e.target.result);
console.log(hexHash);
};
Breakpoint resume + second transmission (front end)
We have an array of 100 file slices saved at this time, just traverse the slices and send axios.post requests to the backend continuously. Set a switch to implement the start-pause function.
What if we passed 50 copies and closed the browser?
At this point, we need the cooperation of the backend. Before uploading files, we need to check how many files the backend has received .
Of course, if it is found that the backend has already uploaded this file, it will directly display that the upload is complete (second transmission)
// 解构出已经上传的文件数组 文件是否已经上传完毕
// 通过文件hash和后缀查询当前文件有多少已经上传的部分
const {isFileUploaded, uploadedList} = await axios.get(
`http://localhost:8000/uploader/count
?hash=${hash}
&suffix=${fileSuffix}`
)
Breakpoint resume + second transmission (backend)
As for the operation of the back end, it is relatively simple
-
Create folders based on file hashes and save file slices
-
Check the upload status of a file and return it to the front end through the interface
For example the following file slice folder
//! --------通过hash查询服务器中已经存放了多少份文件(或者是否已经存在文件)------
function checkChunks(hash, suffix) {
//! 查看已经存在多少文件 获取已上传的indexList
const chunksPath = `${uploadChunksDir}${hash}`;
const chunksList = (fs.existsSync(chunksPath) && fs.readdirSync(chunksPath)) || [];
const indexList = chunksList.map((item, index) =>item.split('_')[1])
//! 通过查询文件hash+suffix 判断文件是否已经上传
const filename = `${hash}${suffix}`
const fileList = (fs.existsSync(uploadFileDir) && fs.readdirSync(uploadFileDir)) || [];
const isFileUploaded = fileList.indexOf(filename) === -1 ? false : true
console.log('已经上传的chunks', chunksList.length);
console.log('文件是否存在', isFileUploaded);
return {
code: 200,
data: {
count: chunksList.length,
uploadedList: indexList,
isFileUploaded: isFileUploaded
}
}
}
progress bar
It’s not enough to calculate the successfully uploaded fragments in real time, let’s realize it by ourselves
4. Sampling hash and webWorker
Because before uploading, we need to calculate the md5 value of the file and use it as the id of the slice. The calculation of md5 is a very time-consuming task. If the file is too large, js will be stuck in the step of calculating md5, causing the page to freeze for a long time.
Here we provide three ideas for optimization:
Sampling hash(md5)
Sampling hash means that we intercept a part of the entire file, calculate the hash, and improve the calculation speed.
1. We parse file into binary buffer data,
2. Extract 2mb from the beginning and end of the file, and extract 2kb from the middle part every 2mb
3. Combine these fragments into a new buffer for md5 calculation.
Illustration:
sample code
//! ---------------抽样md5计算-------------------
function calculateHashSample(file) {
return new Promise((resolve) => {
//!转换文件类型(解析为BUFFER数据 用于计算md5)
const spark = new SparkMD5.ArrayBuffer();
const { size } = file;
const OFFSET = Math.floor(2 * 1024 * 1024); // 取样范围 2M
const reader = new FileReader();
let index = OFFSET;
// 头尾全取,中间抽2字节
const chunks = [file.slice(0, index)];
while (index < size) {
if (index + OFFSET > size) {
chunks.push(file.slice(index));
} else {
const CHUNK_OFFSET = 2;
chunks.push(file.slice(index, index + 2),
file.slice(index + OFFSET - CHUNK_OFFSET, index + OFFSET)
);
}
index += OFFSET;
}
// 将抽样后的片段添加到spark
reader.onload = (event) => {
spark.append(event.target.result);
resolve({
hash: spark.end(),//Promise返回hash
});
}
reader.readAsArrayBuffer(new Blob(chunks));
});
}
webWorker
In addition to sampling hash, we can also open a webWorker thread to calculate md5.
webWorker: It is to create a multi-threaded running environment for JS, allowing the main thread to create worker threads and assign tasks to the latter. While the main thread is running, the worker threads are also running without interfering with each other. After the worker threads run, the results are returned to the main thread.
For specific usage methods, please refer to MDN or other articles:
Using Web Workers \- Web API Interface Reference | MDN \(mozilla.org\)[1]
Learn to use web worker thoroughly in one article \- Nuggets\(juejin.cn\)[2]
time slice
Students who are familiar with React time slicing can also give it a try, but I personally think that this solution is not as good as the above two.
Students who are not familiar with it can find out by themselves, there are still many articles. Not much discussion here, just provide ideas.
requestIdleCallback,requestAnimationFrame
Time slicing is the two legendary APIs, or it can be encapsulated by messageChannel at a higher level.
Slice calculates hash, distributes multiple short tasks in each frame, and reduces page lag.
5. File type judgment
Simpler, we can judge the type through the accept attribute of the input tag, or by intercepting the file name
<input id="file" type="file" accept="image/*" />
const ext = file.name.substring(file.name.lastIndexOf('.') + 1);
Of course, this limitation can be broken simply by modifying the file extension, which is not rigorous.
Determine file type by file header
We convert the file into a binary blob. The first few bytes of the file indicate the file type, and we can read it to judge.
For example, the following code
// 判断是否为 .jpg
async function isJpg(file) {
// 截取前几个字节,转换为string
const res = await blobToString(file.slice(0, 3))
return res === 'FF D8 FF'
}
// 判断是否为 .png
async function isPng(file) {
const res = await blobToString(file.slice(0, 4))
return res === '89 50 4E 47'
}
// 判断是否为 .gif
async function isGif(file) {
const res = await blobToString(file.slice(0, 4))
return res === '47 49 46 38'
}
Of course, we have ready-made libraries that can do this, such as the file-type library
file-type \- npm \(npmjs.com\)[3]
6. Asynchronous concurrency control (important)
We need to upload multiple file fragments to the backend, can't we send them one by one? Here we use TCP concurrency + to implement control concurrency for uploading.
First, we encapsulate 100 file fragments as axios.post function and store them in the task pool
-
Create a concurrent pool, execute tasks in the concurrent pool, and send fragments
-
Set the counter i, when i<concurrency number, the task can be pushed into the concurrent pool
-
The request that is executed first through the promise.race method will be returned and its .then method can be called to pass in the next request (recursive)
-
When the last request is sent, a request is made to the backend to merge file fragments
diagram
the code
//! 传入请求列表 最大并发数 全部请求完毕后的回调
function concurrentSendRequest(requestArr: any, max = 3, callback: any) {
let i = 0 // 执行任务计数器
let concurrentRequestArr: any[] = [] //并发请求列表
let toFetch: any = () => {
// (每次执行i+1) 如果i=arr.length 说明是最后一个任务
// 返回一个resolve 作为最后的toFetch.then()执行
// (执行Promise.all() 全部任务执行完后执行回调函数 发起文件合并请求)
if (i === requestArr.length) {
return Promise.resolve()
}
//TODO 执行异步任务 并推入并发列表(计数器+1)
let it = requestArr[i++]()
concurrentRequestArr.push(it)
//TODO 任务执行后 从并发列表中删除
it.then(() => {
concurrentRequestArr.splice(concurrentRequestArr.indexOf(it), 1)
})
//todo 如果并发数达到最大数,则等其中一个异步任务完成再添加
let p = Promise.resolve()
if (concurrentRequestArr.length >= max) {
//! race方法 返回fetchArr中最快执行的任务结果
p = Promise.race(concurrentRequestArr)
}
//todo race中最快完成的promise,在其.then递归toFetch函数
if (globalProp.stop) { return p.then(() => { console.log('停止发送') }) }
return p.then(() => toFetch())
}
// 最后一组任务全部执行完再执行回调函数(发起合并请求)(如果未合并且未暂停)
toFetch().then(() =>
Promise.all(concurrentRequestArr).then(() => {
if (!globalProp.stop && !globalProp.finished) { callback() }
})
)
}
7. Concurrent error retry
-
Use catch to catch task errors. After the execution of the above axios.post task fails, put the task in the task queue again
-
Set a tag for each task object to record the number of task retries
-
If a slicing task fails more than 3 times, it will be rejected directly. And can directly terminate the file transfer
8. Slow start control
Due to the different file sizes, it is a bit clumsy to set the size of each slice to be fixed. We can refer to 慢启动
the strategy of the TCP protocol. Set an initial size, and dynamically adjust the size of the next slice according to the completion of the upload task to ensure that the size of the file slice matches the current network speed.
-
Bring the size value in the chunk, but the number of progress bars is uncertain, modify createFileChunk, request to add time statistics
-
·For example, our ideal is to pass one in 30 seconds. The initial size is set to 1M, if the upload takes 10 seconds, the next block size becomes 3M. If the upload took 60 seconds, the next block size becomes 500KB and so on
9. Debris Cleanup
If the user uploads the file and terminates halfway through, and will not upload it in the future, the file fragments saved in the backend will be useless.
We can set a scheduled task on the node side setInterval
to check and clean up unnecessary fragmented files every once in a while.
It can be used node-schedule
to manage scheduled tasks, such as checking the directory once a day, and if the file is a month ago, delete it directly.
junk file
postscript
The above is all the functions of a complete and more advanced file upload component. I hope you have the patience to see that the novice friends here can master it. Make small but daily progress.
References
[1] https://developer.mozilla.org/zh-CN/docs/Web/API/Web_Workers_API/Using_web_workers: https://link.juejin.cn?target=https%3A%2F%2Fdeveloper.mozilla.org%2Fzh-CN%2Fdocs%2FWeb%2FAPI%2FWeb_Workers_API%2FUsing_web_workers
[2] https://juejin.cn/post/7139718200177983524: https://juejin.cn/post/7139718200177983524
[3] https://www.npmjs.com/package/file-type: https://link.juejin.cn?target=https%3A%2F%2Fwww.npmjs.com%2Fpackage%2Ffile-type