Breakpoint resuming of Winform file download

In the first two articles of this series,   the basic usage and some practical skills of WebClinet  and  WinINet used to complete the download task were introduced to you.

Today, I'm going to tell you about the most common problem of resuming the download from a breakpoint during the download process.

First of all, it is clear that the breakpoint resuming mentioned in this article refers specifically to the breakpoint resuming in the HTTP protocol. The article describes the method ideas and key codes for realizing the breakpoint resuming. Students who want to know more details, please download and Check out the demo attached to this article.


working principle

Some request/response headers are defined in the http protocol. By combining these headers, the purpose of downloading the same file in batches can be achieved. For example, in an http request, only a part of the data in the file is requested, and then the requested data is saved. Next time, only the remaining part of the data is requested. When all the data is downloaded to the local, the data merging work is completed.

The http protocol states that the range of request data can be specified through the Range header in the http request.

The use of the Range header is very simple, and it can be used in the following format:

Range: bytes=500-999

The above means: only request the 500th to 999th, the 500th bytes of the target file.

For example, there is a 1000-byte file that needs to be downloaded, and the Range header is not specified in the first request, which means that the entire file is downloaded; but after the 499th byte is downloaded, the download is interrupted, then the next request is left. When downloading a file, only the 500th to 999th bytes of data need to be downloaded.

The principle seems simple, but the following issues need to be considered:

1. Do all web servers support the Range header?

2. There may be a long interval between multiple requests. What if the file on the server changes?

3. How to save some downloaded data and related information?

4. How can we verify that a file is exactly the same as the source file after we have stitched it into its original size through byte manipulation?

Next, this paper provides solutions for the above problems.


1. How to check whether the server side supports the Range header?

When the server responds to the request, it will indicate whether to accept part of the data of the requested resource through Accept-Ranges in the response header. There seems to be a small problem here, that is, different servers may return different values ​​to indicate whether to accept the request to download some resources. A more unified approach is: when the server does not support requesting partial data, it will return Accept-Ranges: none, so it is enough to judge whether the return value is equal to none.

code show as below:

private static bool IsAcceptRanges ( WebResponse res )

{

    if ( res.Headers["Accept-Ranges"] != null )

    {

        string s = res.Headers["Accept-Ranges"];

        if ( s == "none" )

        {

            return false;

        }

    }

    return true;

}


2. How to check whether the file on the server side has changed?

When we are in the process of downloading files, the download process is interrupted due to network failures and other reasons. At this time, if the files on the server have changed, we need to start the download again anyway, only when the files on the server have not changed. In this case, it only makes sense to resume the transfer from a breakpoint.

When you need to continue downloading the file next time, how to determine whether the file on the server is still half of the file that was downloaded?

For this problem, the http response header provides us with two options, using ETag and Last-Modified to complete the download task.

First look at ETag:

The ETag response-header field provides the current value of the entity tag for the requested variant. (引自RFC2616 14.19 ETag)

简单点说 ETag 就是一个标识当前请求内容的字符串,当请求的资源发生变化后,对应的 ETag 也会变化,所以最简单的办法是,第一次请求时把响应头中的 ETag 保存下来,下次请求时做相应的比较。

代码如下:

string newEtag = GetEtag( response );

// tempFileName指已经下载到本地的部分文件内容

// tempFileInfoName指保存了Etag内容的临时文件

if ( File.Exists(tempFileName) && File.Exists(tempFileInfoName) )

{

    string oldEtag = File.ReadAllText( tempFileInfoName );

    if ( !string.IsNullOrEmpty(oldEtag) && !string.IsNullOrEmpty(newEtag) && newEtag == oldEtag )

    {

        // Etag没有变化,可以断点续传

        resumeDowload = true;

    }

}

else

{

    if ( !string.IsNullOrEmpty(newEtag) )

    {

        File.WriteAllText( tempFileInfoName, newEtag );

    }

}

//GetEtag函数

private static string GetEtag( WebResponse res )

{

    if ( res.Headers["ETag"] != null )

    {

        return res.Headers["ETag"];

    }

    return null;

}

再看 Last-Modified:

The Last-Modified entity-header field indicates the date and time at which the origin server believes the variant was last modified. (引自RFC2616 14.29 Last-Modified)

Last-Modified 就是所请求的资源在服务器上最后一次的修改时间,使用方法和 ETag 大体相同。

不论是使用 ETag 还是 Last-Modified,都能达到检测服务器端文件是否发生变化的目的。

当然也可以同时使用这两种方法,做 double check,以便更好的实现检测目的。


三、如何保存下载的部分数据和相关信息?

这里主要是指使用 C# 进行数据和相关信息的保存操作,大体思路是如果有未下载完的文件,先将已下载数据保存在某一路径下,然后将后下载的字节数据添加到已下载文件的末尾。

详细的实现方法,请查看 demo 代码。


四、如何验证下载文件与源文件的一致性?

在断点续传的过程中,我们以 byte 为单位进行文件的下载和合并,如果下载的整个过程中出现了异常,可能最后得到的文件就和源文件不一样了,因此最好能够对下载好的文件进行一次与源文件一致性的校验,这是很重要的一步,也是最难实现的部分。之所以难以实现,是因为需要服务器端的支持,例如要求服务器端不但提供了可供下载的文件,同时还需要提供该文件的 MD5 hush。

当然,如果服务器端也是我们自己创建的,我们就可以实现服务器端方面的支持。目前已有部分产品在下载过程中提供断点续传的能力,Spread Studio表格控件就是其中之一。 

Demo 下载

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326714735&siteId=291194637