Python realizes uploading large files in pieces

Please indicate the source for reprint: http://blog.csdn.net/jinixin/article/details/77545140


introduction


I want to take this article to briefly talk about the implementation of WebUploader large file upload combined with Python.

WebUploader is the front-end implementation of the Baidu team for uploading large files, and the back-end needs to be implemented according to different languages. Here I use the Flask framework of the Python language to build the backend, and use the Bootstrap front-end framework to render the upload progress bar. The rendering is at the bottom of the article.

WebUploader official website: click here ; WebUploader API: click here
 ;



implement


The http protocol is not very suitable for uploading large files, so it is necessary to consider fragmentation, that is, splitting the large file before uploading, and what WebUploader does is to split a large file and upload it to the server part by part. In the http request to upload each fragment, you need to carry:

1) The unique identifier of the file: task_id;

2) The total number of fragments of the file: chunks;

3) The position of the shard in all shards of the file: chunk;

The last two WebUploaders have been automatically uploaded for us, and the first task_id only needs us to call the corresponding function to generate, and then write it into form-data.


WebUploader is a front-end framework, so the part of receiving files needs to be implemented by ourselves, and I chose Python and its Flask framework. What the backend has to do is to receive this bunch of shards and re-merge them into one file, so there are three problems:


1) How to determine whether the whole file is uploaded after a certain segment is uploaded?

WebUploader has solved it for us, see the code below for details.

<script type="text/javascript">
$(document).ready(function() {
    var task_id = WebUploader.Base.guid(); //Generate task_id to uniquely identify the file
    var uploader = WebUploader.create ({
        server: '/upload/accept', //The server receives and processes the url address of the fragment
        formData: {
            task_id: task_id, //Data carried by the http request to upload the fragment
        },
    });

    uploader.on('uploadSuccess', function(file) { //This function is called when all segments of the file are uploaded successfully
        //Uploaded information (file unique identifier, file suffix)
        var data = { 'task_id': task_id, 'ext': file.source['ext'], 'type': file.source['type'] };
        $.get('/upload/complete', data); //ajax sends a request to the url with data
    });
});
</script>


2) How to handle the relationship between receiving shards and writing shard content to files?
Option 1: Open a string, and while receiving the shard, read the content in the shard and add it to the end of the string; after receiving all the shards, write the string into a new file.
Option 2: Create a file, and while receiving the shards, read the contents of the shards and write them at the end of the file.

Option 3: Create a new temporary file for each shard to save its contents; after all the shards are uploaded, read the contents of all the temporary files in sequence, and write the data into the new file.


The first two solutions seem to be good, but in fact there are some problems. Option 1 is prone to memory overflow because it takes too long to wait for all shards; since shards are not necessarily uploaded in order, option 2 is also not feasible; therefore, option 3 can only be selected.


3) How does the backend distinguish files of different users? How to distinguish the sequence of different shards of the same file?
Different files can be distinguished by the task_id carried by the http request. The sequence of fragments of the same file can be distinguished by the chunks carried by the http request. Therefore, the combination of task_id+chunk can uniquely mark a certain shard among the shards of different files of different users, that is, the name of a shard of a certain file is task_id+chunk.


key code



front-end code

<!DOCTYPE html>
<html>

<head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <script src="./static/jquery-1.11.1.min.js"></script>
    <script src="./static/bootstrap/js/bootstrap.min.js"></script>
    <script src="./static/webuploader/webuploader.min.js"></script>
    <link rel="stylesheet" type="text/css" href="./static/webuploader/webuploader.css">
    <link rel="stylesheet" type="text/css" href="./static/bootstrap/css/bootstrap.min.css">
</head>

<body>
    <div>
        <div id="picker">Please select</div> <!-- Upload button, the value of id selector must be specified-->
        <div class="progress"> <!-- progress bar -->
            <div class="progress-bar progress-bar-striped active" role="progressbar" style="width:0%;"></div>
        </div>
    </div>
    <script type="text/javascript">
    $(document).ready(function() {
        var task_id = WebUploader.Base.guid(); //Generate task_id
        var uploader = WebUploader.create({ //Create uploader control
            swf: './static/webuploader/Uploader.swf', //swf location, this may be related to flash
            server: '/upload/accept', //receive the server address of each fragment
            pick: '#picker', //fill in the id selector value of the upload button
            auto: true, // Whether to upload the file automatically after selecting the file
            chunked: true, //whether to fragment
            chunkSize: 20 * 1024 * 1024, //The size of each fragment, here is 20M
            chunkRetry: 3, //If a fragment fails to upload, the number of retries
            threads: 1, //Number of threads, considering the server, 1 is selected here
            duplicate: true, //Whether the shard is automatically deduplicated
            formData: { //The data to be carried with each upload of a fragment
                task_id: task_id,
            },
        });

        uploader.on('startUpload', function() { //When uploading starts, call this method
            $('.progress-bar').css('width', '0%');
            $('.progress-bar').text('0%');
        });

        uploader.on('uploadProgress', function(file, percentage) { //After a segment is uploaded successfully, call this method
            $('.progress-bar').css('width', percentage * 100 - 1 + '%');
            $('.progress-bar').text(Math.floor(percentage * 100 - 1) + '%');
        });

        uploader.on('uploadSuccess', function(file) { //All segments of the entire file are uploaded successfully, call this method
            //Uploaded information (file unique identifier, file name)
            var data = {'task_id': task_id, 'filename': file.source['name'] };
            $.get('/upload/complete', data); //ajax sends a request to the url with data
            $('.progress-bar').css('width', '100%');
            $('.progress-bar').text('Upload completed');
        });

        uploader.on('uploadError', function(file) { //An exception occurs during upload, call this method
            $('.progress-bar').css('width', '100%');
            $('.progress-bar').text('Upload failed');
        });

        uploader.on('uploadComplete', function(file) {//Upload is over, this method will be called regardless of whether the file is uploaded successfully or not
            $('.progress-bar').removeClass('active progress-bar-striped');
        });

    });
    </script>
</body>

</html>

backend code

@app.route('/', methods=['GET', 'POST'])
def index(): # Called after a segment is uploaded
    if request.method == 'POST':
        upload_file = request.files['file']
        task = request.form.get('task_id') # Get the file unique identifier
        chunk = request.form.get('chunk', 0) # Get the sequence number of this shard in all shards
        filename = '%s%s' % (task, chunk) # constitute the unique identifier of the shard
        upload_file.save('./upload/%s' % filename) # save the shard to the local
    return rt('./index.html')


@app.route('/success', methods=['GET'])
def upload_success(): # Called after all segments are uploaded
    target_filename = request.args.get('filename') # Get the filename of the uploaded file
    task = request.args.get('task_id') # Get the unique identifier of the file
    chunk = 0 # fragment sequence number
    with open('./upload/%s' % target_filename, 'wb') as target_file: # create new file
        while True:
            try:
                filename = './upload/%s%d' % (task, chunk)
                source_file = open(filename, 'rb') # open each shard in order
                target_file.write(source_file.read()) # Read shard content and write to new file
                source_file.close()
            except IOError:
                break
            chunk += 1
            os.remove(filename) # delete the shard to save space
    return rt('./index.html')


result

renderings

Uploaded successfully

upload failed


test

Three computers, one as a server, upload a movie on the other two at the same time, with a size of 2.6G and 3.8G; after uploading, both movies can be played normally on the server.


Remark

If you have friends who want the source code, you can move here . The project will continue to improve, if you are interested, you may wish to star, thank you.


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325441345&siteId=291194637