Android of communication based on HTTP protocol Detailed

Android system itself is downloaded mechanisms, such as the browser used by DownloadManager. Pity, DownloadManager only to use the browser, the general application can not call it. In addition, if you download invoked frequently, use DownloadManager actually very inefficient practices. To solve these problems, I think we achieved the best way is to download their own, this article is to introduce some simple download based on the HTTP protocol.

A, HTTP protocol Introduction

HTTP is a protocol belonging to the object-oriented application layer, due to its simple, fast way for distributed hypermedia information system. It is proposed in 1990, after several years of use and development, has been continuously improved and expanded. Currently used in the WWW is HTTP among /1.0 Sixth Edition, standardization work in progress HTTP / 1.1, and HTTP- NG (the Next Generation of HTTP) recommendations have been proposed.
 
The main characteristics of the HTTP protocol can be summarized as follows:

    1. Supports client / server model.

    2 . Simple and fast: a customer service request to the server, instead of sending the request method and path. Request method commonly used GET, HEAD, POST. Each method provides a different type of client contacts the server. Due to the simple HTTP protocol, HTTP server makes the program a small scale, so the communication is very fast.

    3. Flexible: HTTP allows the transmission of any type of data object. Content- type being transmitted by the be labeled Type.

    4 No Connection: Meaning No limitation is attached only one request per connection. After the server processes client requests and receives the customer's response, i.e., disconnected. In this way it can save transmission time.

    5 . Stateless: HTTP protocol is stateless protocol. No state is no protocol for transaction processing and memory. If the lack of state means that the subsequent processing required in front of the information, it must be retransmitted, which may result in the amount of data transmitted for each connection is increased. On the other hand, when it does not require previous information in response to a faster server.


 1.1 URL

    HTTP URL (URL is a special type of URI, contains enough information for finding a resource) in the following format:

     http://host[%22%22port][abs_path/] 

    http pledged to locate network resources via the HTTP protocol;

    host represents a legal Internet host domain name or IP address;

    port specify a port number, then the default is empty port 80 ;

    abs_path specified URI requested resource;
    
    Note: If the URL is not given abs_path, then when it as a request URI, must begin with " / give" form, usually the job browser automatically help us to complete.

  E.g:

    1 , enter: www.guet.edu.cn
      The browser automatically converted to: HTTP: // www.guet.edu.cn/2

     2, HTTP: 192.168.0.116: 8080 / index.jsp 


 1.2 Request

    http request consists of three parts, namely: a request line, header message, request body.

    1.2.1 request line

    The method begins with a request line symbol, separated by spaces, URI, and the protocol version followed by a request in the following format:
    Method Request-URI HTTP-Version CRLF

    among them:

    Method indicates request method;
    Request - URI of the is a uniform resource identifier;
    HTTP - Version The HTTP protocol version requested;
    Carriage return and line represents a CRLF (CRLF as the end of the addition, does not allow a separate character CR or LF).
    
    E.g:

    POST /hello.htm HTTP/1.1(“/r/n”)
    
    1 ) Request Method:
    
    Request method (all methods all uppercase) There are various methods of interpretation of each as follows:
    
    GET Request Request - resources identified by URI

    In POST Request - additional new data resource identified by URI

    HEAD request acquired by the Request - Response message resource identified by the URI header

    A PUT request server stores resources, and with the Request - the URI as identification

    DELETE requests the server to delete Request - resources identified by URI

    TRACE request to the server to send the request back to the information received, mainly used for testing or diagnosis

    CONNECT reserved for future use

    OPTIONS request to query the server performance, or other relevant resource needs and options
    2) Request-URI:

    For network resource identifier to access. Usually only given phase relative to the root directory of the directory server, so that the " / " at the beginning.
    
    3 ) protocol version.

    1.2.2 message header

    HTTP request message from the client to the server and the server to the client in response to the composition. Request and response messages are a start line (request message, the request line is the starting line, a response message, a status line is the starting line), the message header (optional), a blank line (CRLF line only), the message body (optional) components.

    1 ) Ordinary header:

    In general the header, there are few header field for all request and response messages, but not for the entity to be transmitted only for message transmission.

     The Cache - Control: the instruction for specifying a cache, the instruction cache is unidirectional (buffer command response will not necessarily appear in the request), and a separate instruction cache (a cache message does not affect the other message processing mechanism), similar to the header field HTTP1.0 use of Pragma.

    Cache instruction request comprises: NO -cache (request or response message used to indicate not cache), Store-NO, Age-max, max-STALE, min-Fresh, only- IF - cached;
    Response instruction cache comprises: public , Private , Cache-NO, NO-Store, Transform-NO, MUST-revalidate, Proxy-revalidate, the maxage, S- the maxage.

    Date: header field indicates the normal date and time of message generation.

    Connection: common header field allows the option to send the specified connection. For example, a continuous connection is specified, or designated "close" option, notification server, when the response is completed, close the connection.

    2 ) request header:

    It allows the client to pass a request to the server and the client information additional information itself. Common request headers are as follows:

    Accept:

    Accept request header field is used to specify what type of information the client accepted. EG: the Accept: Image / GIF, indicates that the client wishes to receive resources GIF image format; the Accept: text / html, indicates that the client wishes to accept html text.

    Accept-Charset:

    The Accept -Charset request header field for specifying the characters accepted by the client. EG: the Accept-the Charset: ISO-8859-1 ., GB2312 If no field in the request message, the default character set can be any acceptable.

    Accept-Encoding:

    The Accept -Encoding request header field is similar to Accept, but it is acceptable for specifying content encoding. EG: Accept- Encoding:. gzip.deflate If the request is not set this field to the message server assumes that the client for encoding various contents are acceptable.

    Accept-Language:

    The Accept - Language request header field is similar to Accept, but it is used to specify a natural language. EG: the Accept-Language: en-US CN if the request is not set in the message header field, the server assumes that the client can accept a variety of languages.

    Authorization:

    Authorization request header field is mainly used to prove the client has permission to view a resource. When a browser to access a page, if the server receives a response code of 401 (unauthorized), you may send a request Authorization request header field contains, requires the server to be verified.

    Host (transmission request, the header fields are required):

    Host request header field is mainly used to specify the Internet host and port number of the requested resource, it is usually extracted from the HTTP URL.
    
    eg: We enter in the browser: HTTP: // www.guet.edu.cn/index.html. Request message sent by the browser, it will contain Host request header field, as follows:

    Host: I www.guet.edu.cn

    Here the default port number 80, if the port number is specified, then becomes: Host: www.guet.edu.cn: specify the port number

    User-Agent:

    We landed Internet forums, you often will see some welcome message, which lists your name and version of the operating system, name and version of browser you use, this is often a lot of people feel very magical, in fact, the server application is from the User -Agent request header field to obtain such information. User-Agent request header field allows the client to its operating system, browser, and other attribute tells the server. However, the header field is not required, if you write a browser ourselves, do not use User- Agent request header field, then the server will not know our message. 

    Example request header:
    GET /form.html HTTP/1.1 (CRLF)
    Accept:image/gif,image/x-xbitmap,image/jpeg,application/x-shockwave-flash,application/vnd.ms-excel,application/vnd.ms-
    powerpoint,application/msword,*/* (CRLF)
    Accept-Language:zh-cn (CRLF)
    Accept-Encoding:gzip,deflate (CRLF)
    If-Modified-Since:Wed,05 Jan 2007 11:21:25 GMT (CRLF)
    If-None-Match:W/"80b1a4c018f3c41:8317" (CRLF)
    User-Agent:Mozilla/4.0(compatible;MSIE6.0;Windows NT 5.0) (CRLF)
    Host:www.guet.edu.cn (CRLF)
    Connection:Keep-Alive (CRLF)
    (CRLF)
    3) response header:

    Transmitting the server response header is not allowed in response to the additional information in the status line, and information about the information and resources identified by the Request-URI of the server to be accessed next.
    
    Common response headers:

    Location:

    Location response header field receiver for redirecting to a new location. Location response header field used to replace the domain name in time.

    Server:

    Server response header field contains server software to process the information request. Header field is the User-Agent request corresponds. Here is an example Server response header field:

    Server:Apache-Coyote/1.1
    WWW-Authenticate:

    When the WWW-Authenticate response header field must be included in 401 (unauthorized) response message, the client receives the 401 response message, when, and transmits Authorization header field requests the server to be verified, the server response header contains header area. eg: WWW-Authenticate: Basic realm = "! Basic Auth Test" // server can be seen on the requested resource uses basic authentication mechanism.
    4) entities header:

    Request and response messages may be transmitted one entity. An entity by the entity header field, and entity body composition, but does not mean that the entity header fields and entities to send the text to be together, you can send only entity header field. Entity headers defined on the entity body: meta-information resources (eg whether the entity body) and request identified.

    Common entity header:

    Content-Encoding:

    Content-Encoding header field is used as a solid media type modifier that indicates the value of the encoding has been applied to the entity body of the additional content, thus to obtain the media type of the referenced Content-Type header field, to employ the appropriate the decoding mechanism. Content-Encoding compression method for recording such a document. eg: Content-Encoding: gzip

    Content-Language:

    Content-Language entity header field describes the natural language resources used. This field is not set is considered an entity content will be available to all language readers. eg: Content-Language: da

    Content-Length:

    Content-Length entity header field indicates the length of the entity body of a decimal number is stored in bytes to represent.

    Content-Type:

    Content-Type header field term entity to a specified recipient entity body of the media type. eg: Content-Type: text / html; charset = ISO-8859-1, Content-Type: text / html; charset = GB2312

    Last-Modified:

    And time of last modification date Last-Modified entity header field for indicating the resources.

    Expires:

    Expires entity header field gives the date and time response expired. In order for a proxy server or browser updates the cache after a period of time (when accessing the page was visited again, loaded directly from the cache, faster response times and reduce server load) of the page, we can use Expires entity header fields specified page time expired. eg: Expires: Thu, 15 Sep 2006 16:23:12 GMT

    HTTP1.1 the client and the cache must be other illegal date format (including zero) considered to have expired. eg: In order for the browser not to cache pages, we can also use Expires entity header fields, set to 0, jsp program as follows: response.setDateHeader ( "Expires", "0");

1.3 response

    After receiving and interpreting a request message, the server returns a HTTP response message. HTTP response is composed of three parts, namely: a status line, the message header, the response body.

    Mainly talk about the status line. Status line format is as follows:
    HTTP-Version Status-Code Reason-Phrase CRLF
    among them:

    HTTP-Version indicates the version of the HTTP protocol server;
    Status-Code represents the server sends back a response status code;
    Reason-Phrase represent text description of the status code.
    Status code has three numbers, the first number in response to the defined categories, and there are five possible values:

    1xx: indication information - indicates a request has been received, processing continues
    2xx: Success - indicates that the request has been successfully received, understood, accepted
    3xx: Redirection - to fulfill the request must go a step further
    4xx: Client Error - The request has a syntax error or a request can not be achieved
    5xx: Server-side Error - The server failed to achieve a legitimate request
    Common status codes, state description, description:

    200 OK // client request was successful
    400 Bad Request // client requests a syntax error, it can not be understood by the server
    401 Unauthorized // unauthorized request, the status code must be used with the WWW-Authenticate header field 
    403 Forbidden // server receives the request, but refused to provide services
    404 Not Found // requested resource does not exist, eg: enter the wrong URL
    Unexpected error 500 Internal Server Error // server occurs
    503 Server Unavailable // server is currently unable to process the client's request, after a period of time, it may return to normal

    eg:HTTP/1.1 200 OK (CRLF)

Second, download the HTTP protocol

    After learning the basic rules of the HTTP protocol, we can apply it to download the file. This section describes is downloaded via HTTP protocol works.

    2.1 file requests

    Sending request to the server as follows:

    GET /Path/FileName HTTP/1.0
    Host: www.server.com:80
    Accept: */*
    User-Agent: GeneralDownloadApplication
    Connection: close

    Per line with a "carriage return" separated, then append the end of a "carriage return" as the end of the entire request.

    Host field indicates the host name and port number, if the default port number 80 can not write.
    Accept field * / * represents any type of data received.
    User-Agent indicates that the user agent, this field is optional, but highly recommended plus, since it is the server statistics, tracking, and identifying client basis.
    Connection indication field close non-persistent connection.

    2.2 server response

    If the server receives the request succeed, and no error occurs, returns similar to the following data: 

    HTTP/1.0 200 OK
    Content-Length: 13057672
    Content-Type: application/octet-stream
    Last-Modified: Wed, 10 Oct 2005 00:56:34 GMT
    Accept-Ranges: bytes
    ETag: "2f38a6cac7cec51:160c"
    Server: Microsoft-IIS/6.0
    X-Powered-By: ASP.NET
    Date: Wed, 16 Nov 2005 01:57:54 GMT
    Connection: close

    Content-Length field is more important in a field that identifies the length of the data returned by the server, this length does not include the length of the HTTP header. In other words, we did not request Range field (we will discuss later), we represent the entire file is requested, the Content-Length is the size of the entire file. The remaining fields are some of the attributes and information about the file server.

    This is the same data return marks the end of the last line (carriage return) and an extra carriage return line feed as the end, that is, "\ r \ n \ r \ n". And immediately it is the contents of the file "\ r \ n \ r \ n" back up, so that we can find "\ r \ n \ r \ n", and start from the first byte following it, Yuanyuan continue to read, and then written to a file a.

    2.3 HTTP

    HTTP is very simple to achieve, as long as the request is added a Range field on it. If a file has 1000 bytes, which is the range 0-999, then:

    Range: bytes = 500- 500-999 byte indicates the read of the file, a total of 500 bytes. 
    Range: bytes = 500-599 500-599 byte indicates the read of the file, a total of 100 bytes. 

    Range There are several other wording, but the above are the two most commonly used for HTTP also sufficient. If the HTTP request including a Range field, the server returns a 206 (Partial Content), while there is also a HTTP header corresponding Content-Range field, similar to the following format: 

    Content-Range: bytes 500-999/1000 

    Content-Range field indicates that the server returned a total length of a range of documents and files. Then Content-Length field is not the size of the entire document, but rather the number of bytes corresponding to the file of this range, it must pay attention.

    2.4 Redirection

    Many file download link software download sites are redirected by the program, such as HTTP Download ACDSee's pchome are:
    
    http://download.pchome.net/php/tdownload2.php?sid=5547&url=/multimedia/viewer/acdc31sr1b051007.exe&svr=1&typ=0 

    This address does not identify the location of the file directly, but were redirected by the program. If such a URL request to the server, the server returns 302 (Moved Temporarily), means that redirection is required, while in the HTTP header contains a Location field, Location field value is the object of the redirected URL. Then you need to disconnect the current connection, and send this request to the server after redirection.


Three, HttpClient

    Although already it provides the basic functionality to access the HTTP protocol in the JDK java.net package, but for most applications, JDK functionality provided by the library itself is not enough rich and flexible. HttpClient is a subproject of Apache Jakarta Common, to provide efficient, new, feature-rich client support HTTP protocol programming toolkit, and it supports the latest version of the HTTP protocol and recommendations. HttpClient has been used in many projects, such as Apache Jakarta on the other two very well-known open source projects Cactus and HTMLUnit use the HttpClient. HttpClient project is very active, people are still using very much. Currently HttpClient version was released in 2005.10.11 3.0 RC4.

    The main function of HttpClient have some of the following:

    1) implements all the HTTP method (GET, POST, PUT, HEAD, etc.);
    2) support automatic steering;
    3) supports HTTPS protocol;
    4) supports proxy servers.

    3.1 environment to build and package required

    Requires Java development environment JDK, need access to the network. Android program needs to have "android.permission.INTERNET" of permission.

    Required package:

    1, commons-httpclient-3.1.jar: Http protocols include the required classes.
    2, commons-logging-1.1.jar: activity log recording includes a runtime class program.
    3, commons-codec-1.3.jar: comprising a codec class.

    These packages are Apache open source project can be found on Apache open source organization's official website http://www.apache.org/.

    3.2 HttpClient implement basic communication protocol HTTP Operation

    All operations must be achieved before the first instance of a HttpClient, i.e. a client initialization.

HttpClient client = new HttpClient();

    3.2.1 request

    GET request with an example.

    a, instantiating a request method.

HttpMethod method = new GetMethod("http://www.google.cn");

    Note:

    ① Although Google has moved out of the mainland server, but HttpClient can achieve automatic steering, which automatically redirected. When the server returns a status code 3 ××, it will be automatically redirected to know the actual location of the file reaches).

    ② GetMethod string constructor represents the URI address of the file. Here before just because you do not specify a server host address, so they need the full name. In fact, this may be:

client.getHostConfiguration().setHost("http://www.imobile.com.cn/", 80, "http");

    ……

HttpMethod method = new GetMethod("/simcard.php?simcard=1330227");

    b, add a message header information you need.

method.addRequestHeader("Range", "bytes=500-");

    HttpClient will have to build the message header information, if there are no special requirements can not be modified. But if you need to add some special information in the message header, such as the need HTTP download or the like, can be used to modify the method described above.

    c, a request (command execution).

int statusCode = client.executeMethod(method);

    In this case, the actual program request to the server, the connection is successful, the function returns, the return value of the status code.
    
      3.2.2 response

    Connected cases.

    a, returns a status code.

    "StatusCode" is the status code in the above example. In addition to this method, you can also:

int statusCode = method.getStatusCode();

    Note: There is a named "HttpStatus" class httpclient package, which defines the majority of the status code. Such as:

    HttpStatus.SC_OK
    HttpStatus.SC_FORBIDDEN 等。

    b, response header.

Header[] headers = method.getResponseHeaders();

    Get all the response header returned by the server.

Header header = method.getRequestHeader("Content-Type");

    Get the response headers specified in pairs.

    After the relevant information can be obtained by calling header.getName (), header.getValue ().

    c, response body.

byte[] bytes = method.getResponseBody();

InputStream inputStream = method.getResponseBodyAsStream(); 

String string = method.getResponseBodyAsString(); 

     The above three methods, optionally selected.

    3.2.3 Disconnect

method.releaseConnection();

    Disconnect.

    3.2.4 Other

    Including some other unrelated and download, but very basic and useful things

    a, POST data.

    POST request and GET request about the same, only caveat is that they need information on how to join the transmission of information in the POST.

postMethod.setRequestBody(InputStream body);

postMethod.setRequestBody(NameValuePair[] parameterBody);

postMethod.setRequestBody(String body);

    b, a proxy server.

    Examples httpClient simply specify the agent can be, all operations will be based on this example via this agent.
httpClient.getHostConfiguration().setProxy(hostName,port);

     c, character encoding.

    Encoding a target page may appear in two places:

    The first place is returned by the server in the http header (RequestHeader the Content-Type, Content-Encoding field);
    
    Another place is html / xml page obtained. Such as:

    <meta http-equiv="Content-Type" content="text/html; charset=gb2312"/>
    或者 <?xml version="1.0" encoding="gb2312"?>

    d, automatically jump.

    HttpClient GET requests can be automatically jumps. But for POST and PUT requests required to accept subsequent service does not support automatic jump.

    When the server returns a status code 3 ××, required to implement the jump address "Location" field of the message header. Note that, "Location" address field may be a relative address, they need to be treated yourself.

    Another possibility is that the page jump to achieve. For example, in HTML, <Meta HTTP-equiv = "Refresh" Content = ". 5; URL = http://www.ibm.com/us ">.

    e, Https protocol.

    See: "HttpClient Getting Started."

 

Guess you like

Origin www.cnblogs.com/zx-blog/p/11836524.html