16 Shu elephant put into the refrigerator: HTTP method to transfer large files

Description "Perspective HTTP protocol" is Luo Jianfeng (Qihoo 360 technical experts) at the time Geeks open a lesson column, record what I study notes, for reference purposes only.

Last time we talked about the HTTP message in the body, can know the HTTP transfer many types of data, not only text, but also can transmit pictures, audio and video.

       Transmission of the early Internet are basically only a few K size of text and small pictures, and now the situation is very different. The web page contains information is too much, a casual home page HTML likely hundreds of K, high-quality images are to M theory, not to mention those movies, TV shows, and a few G, there are dozens of G may.

In contrast, 100M fiber-optic fixed-line or 4G mobile network under the pressure of these large files becomes a "small pipe", either upload or download, the network transmission link will squeeze the "full to the brim."

       So, how to quickly and efficiently transfer these large files in a limited bandwidth has become an important issue. This is like a refrigerator door is already open (establish a connection), which is how the elephant (file) and then go into closed the door (complete transfer) it?

       Today we take a look at the HTTP protocol in which the means to solve this problem.

data compression

       Remember the last lecture when it comes to the "data type and encoding" it? If you have the impression, then, can certainly think of a basic solution, that is, " Data Compression ", the elephant becomes a pig Pec, and then into the refrigerator.

       Typically when the browser sends a request would take " the Accept-Encoding " header field, which is a compression format supported by the browser list, e.g. gzip, deflate, br, etc., so that the server can select one compression algorithm, into " Encoding-Content "response in advance, then the original data is compressed to the browser.

       If the compression ratio can be 50%, that is 100K of data can be compressed to a size of 50K, then the equivalent in the case of the bandwidth doubling constant speed, acceleration effect is very obvious.

       However, this solution also has drawbacks, such as gzip compression algorithm usually only have better compression rates for text files, pictures, audio, video and other multimedia data itself is already highly compressed, then gzip process does not become small ( there may even increase a little), so it ineffective.

       However, data compression effect is still very good, so the major site's server will be used when dealing with text of this as a means of "security at the end." For example, it will use "gzip on" instruction in Nginx, the enable compression of "text / html" is.

Transmission block

       In addition to data compression, there would be no way to solve the problem of large file it?

       Compression is the overall large files smaller, we can turn to think, if not the whole large files smaller, it would be "open", broken down into small pieces, put these pieces in batches to the browser, browse receives and reassembled recovery.

Such browsers and servers do not have all the files stored in the memory of a time and send only a small part of the network will not be prolonged occupation of large files, memory, bandwidth and other resources will save down.

       This " piecemeal " ideas in the HTTP protocol is " chunked " chunked transfer encoding, in the response packet with the header field " Transfer-Encoding: chunked " to indicate, in the meaning that the packet is not a body portion disposable sent, but divided into a number of blocks (the chunk) sending individual.

It's like magic elephant into a "Lego" breaking up one by one put into the refrigerator, put together to reach the destination and then cast "full of blood resurrection."

       Block transmission may also be used "streaming data" form pages dynamically generated by the database, for example, the length of the body in this case is unknown data, not in the header field " the Content-the Length give exact length" where, so you can only block chunked transmission mode.

"Transfer-Encoding: chunked" and "Content-Length" These two fields are mutually exclusive , that is to say the response message in these two fields can not appear at the same time, a response packet transmission length either known or is the length of the unknown (chunked), this is something you must remember.

Here we look at the coding block transmission of the rules, in fact, very simple, uses the same clear text, very similar response headers.

Each block comprises two portions, the length of the header and data blocks;

Head length is CRLF (carriage return line, i.e. \ r \ n) a line plain end, represented by a length of 16 hexadecimal numbers;

Data block length immediately after the header, finally ending with CRLF, but the data does not include CRLF;

Finally, a length of 0 indicates the end of a block, i.e., "0 \ r \ n \ r \ n".

It sounds like a bit difficult to understand, look at the map like to understand:

 

 

 

 

Experimental environment in the URI "/ 16-1" simply simulates block transmission, this address can be accessed using Chrome look at the results:

 

       However, the browser after receiving a data block transmission according to the rules automatically removed chunked encoding, re-assemble the content, so want to see the form of the original message sent by the server must send a request using the Telnet manually (or by grasping Wireshark package):

 

GET /16-1 HTTP/1.1

 

Host: www.chrono.com

Copy the code 

Because Telnet only receives the response packet on the bin, the data block is not resolved, it can clearly be seen in the response packet data format chunked: hex first row length, and then the data, and then 16 length and binary data, repeat the last block is the end of zero length.

 

 

 

Range request

       With chunked transfer encoding, the server can easily send and receive large files, but for very large files on G, there are some issues to consider.

       For example, you are looking at the moment being a hit through the drama, want to skip titles, looking directly at the positive, or for some very boring story, want to drag the progress bar to fast forward a few minutes, which is actually a large file which want to get data fragments, and sub-block transfer does not have this ability.

       HTTP protocol to meet this demand, proposed the " range request " concept (range requests), and allows the client to request a dedicated field in advance to obtain part of a file represented, equivalent client "piecemeal" .

       The scope of the request is not necessary for Web server functionality can be achieved or may not be realized, so the server must use a field in the response in advance, " the Accept-Ranges: bytes " clearly inform the client: "I support the scope of the request."

     If not, then how to do it? The server can send "Accept-Ranges: none", or simply do not send "Accept-Ranges" field, so the client assumes that it does not implement a range request feature, can only honestly transceivers entire files.

       Request header Range is the range of HTTP requests specific fields, the format is " bytes = XY ", wherein x and y are the range data bytes.

       To be noted that x, y represents the "offset", the range must be counted from 0, for example, the first 10 bytes represented as "0-9", the second 10 bytes is expressed as "10-19" and " 0-10 "is in fact the first 11 bytes.

       Range is also very flexible format, the start point and end point x y may be omitted, it can easily represent reciprocal positive or range. Assuming that the file is 100 bytes, then:

"0-" means the document from the beginning to the end of the document, the equivalent of "0-99", that is, the entire file;

"10-" is started from the 10th byte to the end of the document, corresponds to the "10-99";

"-1" is the last byte of the file, corresponds to the "99-99";

"-10" from the end of the document is the reciprocal of 10 bytes, the equivalent of "90-99."

When the server receives Range field, you need to do four things.

First, it must examine the legality of the range, such as the file is only 100 bytes, but the request "200-300", which is the scope of cross-border. The server returns a status code 416 , which means "the scope of your request is wrong, I can not handle, please check again."

Second, if the range is correct, the server can calculate the offset according Range header, read the clip file, and return a status code " 206 the Partial the Content ", meaning and almost 200, but represents only part of the original data body.

Third, the server to add a header field in response to the Content-Range , the total size of the fragment to tell the actual offset value and the resource, the format is " bytes XY / length ", with no head difference Range "=", the more range The total length. For example, for a range of "0-10" request, the value is "bytes 0-10 / 100".

Finally, the remaining data is transmitted, directly to the client using TCP segment, even if a range request is finished processing.

       You can "/ 16-2" to test the scope of the request URI with the experimental environment, the object it deals with "/mime/a.txt". But we can not use the Chrome browser, because it has no editing function HTTP request header (not as good as Firefox easy on this point), it still uses Telnet.

The following example, this request using the Range field acquired first 32 bytes of file:

 

GET /16-2 HTTP/1.1

 

Host: www.chrono.com

 

Range: bytes=0-31

Copy the code 

The data is returned (removed several independent field):

 

HTTP/1.1 206 Partial Content

 

Content-Length: 32

 

Accept-Ranges: bytes

 

Content-Range: bytes 0-31/96

 

 

 

// this is a plain text json doc

Copy the code 

       Once you have the scope of the request, HTTP it easier to handle large files, and can be calculated based on the Range point in time when watching video files without downloading the entire file directly to obtain accurate data piece of content resides.

       Not only drag the video to see the progress required range requests, download tools commonly used in the multi-segment download, HTTP is also based on its implementation, points are:

Starting a HEAD, to see whether the server supports range requests, and get the file size;

Open N threads, each thread using the Range field are each divided fragment is responsible for downloading, the retransmission request transmission of data;

Download unplanned outages are not afraid, do not have to start over again, as long as the previous record based on the download request with a Range rest of that part of it.

Multiple pieces of data

       Just say the scope of the request only once to obtain a fragment, in fact, it also supports the use of multiple in Range ahead "xy", a one-time acquiring multiple segments of data.

       This situation requires use of a special MIME type: " multipart / byteranges ", indicating that the message body is composed of segments of the multi-byte sequence, and also with a parameter " boundary = XXX " is given separation between the segments mark.

       Transport block format of multiple pieces of data is relatively similar, but it requires a division marks to distinguish between different boundary segments, may be compare by FIG.

 

 

 

Each segment must start with - (preceded by two "-") "-boundary", then use the "Content-Type" and "Content-Range" tag type and scope of this data resides, then just like ordinary as a response to the head end of the carriage return line, together with the data segment, and finally with a "- -boundary- -" (two each before and after "-") indicating that all of the segments.

For example, we have sent two requests with Telnet range of experimental environment:

 

GET /16-2 HTTP/1.1

 

Host: www.chrono.com

 

Range: bytes=0-9, 20-29

Copy the code 

Will get is this:

 

HTTP/1.1 206 Partial Content

 

Content-Type: multipart/byteranges; boundary=00000000001

 

Content-Length: 189

 

Connection: keep-alive

 

Accept-Ranges: bytes

 

 

 

 

 

--00000000001

 

Content-Type: text/plain

 

Content-Range: bytes 0-9/96

 

 

 

// this is

 

--00000000001

 

Content-Type: text/plain

 

Content-Range: bytes 20-29/96

 

 

 

ext json d

 

--00000000001--

Copy the code 

Message in the "---00000000001" is a multi-stage separator, the client can use it to separate the multi-Range area data easily.

summary

Today we learned knowledge related to HTTP transfer large files, make a brief summary here:

Compress text files such as HTML is the basic method for transferring large files;

Block transmission may send and receive streaming data, and save memory bandwidth using the response header field "Transfer-Encoding: chunked" to indicate the format of the block is a hexadecimal header length + data block;

Range request may acquire only some data, i.e., "segment request", for video or drag HTTP, a request header field "Range" and the response header field "Content-Range", response status code must be 206;

May request a plurality of ranges, this time in response to the packet data type is "multipart / byteranges", body in a plurality of portions separated with an boundary string.

After four methods to be noted that this is not mutually exclusive, but may be used in combination of, for example, then the compressed block transmission, then block or segment, experimental environment URI "/ 16-3" simulated on a case, you can use Telnet own try.

Lesson at work

Time block transmission of data, if the data contains carriage return line feed (\ r \ n) will affect the processing block it?

If the scope of the implementation of the gzip file a request, such as "Range: bytes = 10-19", then this range is applied to the original file or compressed file it?

Guess you like

Origin www.cnblogs.com/wxcx/p/12641384.html