What should an enterprise-level file upload component look like

Table of contents

1. The easiest file upload

2. Drag + paste + style optimization

3. Breakpoint resume + second transmission + progress bar

file slice

calculate hash

Breakpoint resume + second transmission (front end)

Breakpoint resume + second transmission (backend)

progress bar

4. Sampling hash and webWorker

Sampling hash(md5)

webWorker

time slice

5. File type judgment

Determine file type by file header

6. Asynchronous concurrency control (important)

7. Concurrent error retry

8. Slow start control

9. Debris Cleanup

postscript

References


This article is suitable for front-end students who have a certain node back-end foundation. If you don’t know the back-end at all, please make up for the pre-knowledge.

Without further ado, let's get straight to the point.


Let's take a look at what the file upload components of each version look like

grade Function
Bronze - Trash native+axios.post
Silver - experience upgrade Paste, drag and drop, progress bar
Gold - Feature Upgrade Breakpoint resume, second transmission, type judgment
Platinum - Speed ​​Upgrade web-worker, time slicing, sampling hash
Diamond - Network Upgrade Asynchronous concurrency control, slice error retry
King - finely crafted Slow start control, defragmentation and more

1. The easiest file upload

For file upload, we need to get the file object, and then use formData to send it to the backend to receive it

function upload(file){
    let formData = new FormData();
    formData.append('newFile', file);
    
    axios.post(
    'http://localhost:8000/uploader/upload',
    formData, 
    { headers: { 'Content-Type': 'multipart/form-data' } }
    )
}

2. Drag + paste + style optimization

Too lazy to write, you can look for libraries on the Internet, there are everything on the Internet, or directly component libraries to solve the problem

3. Breakpoint resume + second transmission + progress bar

file slice

We divide a file into multiple small pieces, save them in an array, and send them to the backend one by one to realize breakpoint resume transmission.

picture

// 计算文件hash作为id
const { hash } = await calculateHashSample(file)
//todo 生成文件分片列表 
// 使用file.slice()将文件切片
const fileList = [];
const count = Math.ceil(file.size / globalProp.SIZE);
const partSize = file.size / count;
let cur = 0  // 记录当前切片的位置
for (let i = 0; i < count; i++) {

    let item = { 
        chunk: file.slice(cur, cur + partSize), 
        filename: `${hash}_${i}`
    };
    
    fileList.push(item);
}

calculate hash

        In order to let the backend know that this slice is a part of a file so that it can be aggregated into a complete file. We need to calculate the unique value (md5) of the complete file as the file name of the slice.

// 通过input的event获取到file
<input type="file" @change="getFile">

// 使用SparkMD5计算文件hash,读取文件为blob,计算hash
let fileReader = new FileReader();

fileReader.onload = (e) => {
    let hexHash = SparkMD5.hash(e.target.result);
    console.log(hexHash); 
};

Breakpoint resume + second transmission (front end)

        We have an array of 100 file slices saved at this time, just traverse the slices and send axios.post requests to the backend continuously. Set a switch to implement the start-pause function.

What if we passed 50 copies and closed the browser?

At this point, we need the cooperation of the backend. Before uploading files, we need to check how many files the backend has received .

Of course, if it is found that the backend has already uploaded this file, it will directly display that the upload is complete (second transmission)

// 解构出已经上传的文件数组 文件是否已经上传完毕 
// 通过文件hash和后缀查询当前文件有多少已经上传的部分
const {isFileUploaded, uploadedList} = await axios.get(
    `http://localhost:8000/uploader/count 
        ?hash=${hash}         
        &suffix=${fileSuffix}`
)

Breakpoint resume + second transmission (backend)

As for the operation of the back end, it is relatively simple

  1. Create folders based on file hashes and save file slices

  2. Check the upload status of a file and return it to the front end through the interface

For example the following file slice folder

picture

//! --------通过hash查询服务器中已经存放了多少份文件(或者是否已经存在文件)------
function checkChunks(hash, suffix) { 
    //! 查看已经存在多少文件 获取已上传的indexList 
    const chunksPath = `${uploadChunksDir}${hash}`;
    const chunksList = (fs.existsSync(chunksPath) && fs.readdirSync(chunksPath)) || []; 
    const indexList = chunksList.map((item, index) =>item.split('_')[1]) 
    //! 通过查询文件hash+suffix 判断文件是否已经上传 
    const filename = `${hash}${suffix}`
    const fileList = (fs.existsSync(uploadFileDir) && fs.readdirSync(uploadFileDir)) || []; 
    const isFileUploaded = fileList.indexOf(filename) === -1 ? false : true 

    console.log('已经上传的chunks', chunksList.length); 
    console.log('文件是否存在', isFileUploaded); 

    return { 
        code: 200,
        data: { 
            count: chunksList.length, 
            uploadedList: indexList, 
            isFileUploaded: isFileUploaded
        }
    }
}

progress bar

        It’s not enough to calculate the successfully uploaded fragments in real time, let’s realize it by ourselves

4. Sampling hash and webWorker

        Because before uploading, we need to calculate the md5 value of the file and use it as the id of the slice. The calculation of md5 is a very time-consuming task. If the file is too large, js will be stuck in the step of calculating md5, causing the page to freeze for a long time.

Here we provide three ideas for optimization:

Sampling hash(md5)

        Sampling hash means that we intercept a part of the entire file, calculate the hash, and improve the calculation speed.

1. We parse file into binary buffer data,

2. Extract 2mb from the beginning and end of the file, and extract 2kb from the middle part every 2mb

3. Combine these fragments into a new buffer for md5 calculation.

Illustration:

picture

sample code

//! ---------------抽样md5计算-------------------
function calculateHashSample(file) {

    return new Promise((resolve) => {
        //!转换文件类型(解析为BUFFER数据 用于计算md5)
        const spark = new SparkMD5.ArrayBuffer();
        const { size } = file;
        const OFFSET = Math.floor(2 * 1024 * 1024); // 取样范围 2M
        const reader = new FileReader();
        let index = OFFSET;
        // 头尾全取,中间抽2字节
        const chunks = [file.slice(0, index)];
        while (index < size) {
            if (index + OFFSET > size) {
                chunks.push(file.slice(index));
            } else {
                const CHUNK_OFFSET = 2;
                chunks.push(file.slice(index, index + 2),
                    file.slice(index + OFFSET - CHUNK_OFFSET, index + OFFSET)
                );
            }
            index += OFFSET;
        }
        // 将抽样后的片段添加到spark
        reader.onload = (event) => {
            spark.append(event.target.result);
            resolve({
                hash: spark.end(),//Promise返回hash
            });
        }
        reader.readAsArrayBuffer(new Blob(chunks));
    });
}

webWorker

        In addition to sampling hash, we can also open a webWorker thread to calculate md5.

        webWorker:  It is to create a multi-threaded running environment for JS, allowing the main thread to create worker threads and assign tasks to the latter. While the main thread is running, the worker threads are also running without interfering with each other. After the worker threads run, the results are returned to the main thread.

For specific usage methods, please refer to MDN or other articles:

        Using Web Workers \- Web API Interface Reference | MDN \(mozilla.org\)[1]

        Learn to use web worker thoroughly in one article \- Nuggets\(juejin.cn\)[2]

time slice

        Students who are familiar with React time slicing can also give it a try, but I personally think that this solution is not as good as the above two.

        Students who are not familiar with it can find out by themselves, there are still many articles. Not much discussion here, just provide ideas.

requestIdleCallback,requestAnimationFrame Time slicing is the two         legendary APIs, or it can be encapsulated by messageChannel at a higher level.

        Slice calculates hash, distributes multiple short tasks in each frame, and reduces page lag.

5. File type judgment

Simpler, we can judge the type through the accept attribute of the input tag, or by intercepting the file name

<input id="file" type="file" accept="image/*" />

const ext = file.name.substring(file.name.lastIndexOf('.') + 1);

Of course, this limitation can be broken simply by modifying the file extension, which is not rigorous.

Determine file type by file header

We convert the file into a binary blob. The first few bytes of the file indicate the file type, and we can read it to judge.

For example, the following code

// 判断是否为 .jpg 
async function isJpg(file) {
  // 截取前几个字节,转换为string
  const res = await blobToString(file.slice(0, 3))
  return res === 'FF D8 FF'
}
// 判断是否为 .png 
async function isPng(file) {
  const res = await blobToString(file.slice(0, 4))
  return res === '89 50 4E 47'
}
// 判断是否为 .gif 
async function isGif(file) {
  const res = await blobToString(file.slice(0, 4))
  return res === '47 49 46 38'
}

Of course, we have ready-made libraries that can do this, such as the file-type library

        file-type \- npm \(npmjs.com\)[3]

6. Asynchronous concurrency control (important)

        We need to upload multiple file fragments to the backend, can't we send them one by one? Here we use TCP concurrency + to implement control concurrency for uploading.

picture

         First, we encapsulate 100 file fragments as axios.post function and store them in the task pool

  1. Create a concurrent pool, execute tasks in the concurrent pool, and send fragments

  2. Set the counter i, when i<concurrency number, the task can be pushed into the concurrent pool

  3. The request that is executed first through the promise.race method will be returned and its .then method can be called to pass in the next request (recursive)

  4. When the last request is sent, a request is made to the backend to merge file fragments

diagram

picture

the code

//! 传入请求列表  最大并发数  全部请求完毕后的回调
function concurrentSendRequest(requestArr: any, max = 3, callback: any) {
    let i = 0 // 执行任务计数器
    let concurrentRequestArr: any[] = [] //并发请求列表

    let toFetch: any = () => {
        // (每次执行i+1) 如果i=arr.length 说明是最后一个任务  
        // 返回一个resolve 作为最后的toFetch.then()执行
        // (执行Promise.all() 全部任务执行完后执行回调函数  发起文件合并请求)
        if (i === requestArr.length) {
            return Promise.resolve()
        }

        //TODO 执行异步任务  并推入并发列表(计数器+1)
        let it = requestArr[i++]()
        concurrentRequestArr.push(it)

        //TODO 任务执行后  从并发列表中删除
        it.then(() => {
            concurrentRequestArr.splice(concurrentRequestArr.indexOf(it), 1)
        })

        //todo 如果并发数达到最大数,则等其中一个异步任务完成再添加
        let p = Promise.resolve()
        if (concurrentRequestArr.length >= max) {
            //! race方法 返回fetchArr中最快执行的任务结果 
            p = Promise.race(concurrentRequestArr)
        }
        //todo race中最快完成的promise,在其.then递归toFetch函数
        if (globalProp.stop) { return p.then(() => { console.log('停止发送') }) }
        return p.then(() => toFetch())
    }

    // 最后一组任务全部执行完再执行回调函数(发起合并请求)(如果未合并且未暂停)
    toFetch().then(() =>
        Promise.all(concurrentRequestArr).then(() => {
            if (!globalProp.stop && !globalProp.finished) { callback() }
        })
    )
}

7. Concurrent error retry

  1. Use catch to catch task errors. After the execution of the above axios.post task fails, put the task in the task queue again

  2. Set a tag for each task object to record the number of task retries

  3. If a slicing task fails more than 3 times, it will be rejected directly. And can directly terminate the file transfer

8. Slow start control

        Due to the different file sizes, it is a bit clumsy to set the size of each slice to be fixed. We can refer to 慢启动the strategy of the TCP protocol. Set an initial size, and dynamically adjust the size of the next slice according to the completion of the upload task to ensure that the size of the file slice matches the current network speed.

  1. Bring the size value in the chunk, but the number of progress bars is uncertain, modify createFileChunk, request to add time statistics

  2. ·For example, our ideal is to pass one in 30 seconds. The initial size is set to 1M, if the upload takes 10 seconds, the next block size becomes 3M. If the upload took 60 seconds, the next block size becomes 500KB and so on

9. Debris Cleanup

        If the user uploads the file and terminates halfway through, and will not upload it in the future, the file fragments saved in the backend will be useless.

        We can set a scheduled task on the node side setIntervalto check and clean up unnecessary fragmented files every once in a while.

        It can be used  node-schedule to manage scheduled tasks, such as checking the directory once a day, and if the file is a month ago, delete it directly.

junk file

picture

postscript

        The above is all the functions of a complete and more advanced file upload component. I hope you have the patience to see that the novice friends here can master it. Make small but daily progress.

References

[1] https://developer.mozilla.org/zh-CN/docs/Web/API/Web_Workers_API/Using_web_workers: https://link.juejin.cn?target=https%3A%2F%2Fdeveloper.mozilla.org%2Fzh-CN%2Fdocs%2FWeb%2FAPI%2FWeb_Workers_API%2FUsing_web_workers

[2] https://juejin.cn/post/7139718200177983524: https://juejin.cn/post/7139718200177983524

[3] https://www.npmjs.com/package/file-type: https://link.juejin.cn?target=https%3A%2F%2Fwww.npmjs.com%2Fpackage%2Ffile-type

Guess you like

Origin blog.csdn.net/lambert00001/article/details/131881712