HTTP Definitive Guide - Study Notes

[]

HTTP Definitive Guide - Study Notes

HTTP (Hypertext Transfer Protocol, Hypertext Transfer Protocol)

HTTP: Web foundation

  • Web clients and servers: Web content is stored on the server, Web server using the HTTP protocol.
  • Resource: Web Server is a Web resource (Web resource) hosts, Web resources is the source of Web content.
  • HTTP carefully to each transmission through Web objects are marked with the name MIME (Multipurpose Internet Mail Extension, Multipurpose Internet Mail Extensions) types of data formatting tags.
  • Uniform resource identifier (Uniform Resourse Identifier, URI), to uniquely identify and locate information resources.
    • URL (Uniform Resource Locator, a lot of use) and URN (Uniform Resource Names) two kinds of objects.
  • One thing by the HTTP request command and a response to a result of the composition. HTTP message (HTTP Message) the formatted data blocks.
    • Each HTTP request contains a method: This method tells the server what action to: get a Web page, run a gateway program, delete a file and so on.
    • GET: end sent from the server to the client named resource.
    • PUT: stores data from the client to the service from a named resource to go.
    • DELETE: delete the name from the server resources.
    • POST: The client sends data to a server gateway application.
    • HEAD: Send only named HTTP headers resource response.
  • Each HTTP response will carry a status code (a three digit code state code) of packets returned.
  • HTTP packets when a simple string line by line composed.
    • HTTP packets are plain text, not binary code that can be easily read and write.
    • Start line, a header, body. (Request packet, request message; response message, response message).
  • HTTP is an application layer protocol, HTTP no need to worry about the details of network communication, the details of which are networked to a common, secure Internet transport protocol TCP / IP.
    • TCP: error-free data transmission, transmitted in order (arrival order data is always transmitted), unsegmented data stream (an arbitrary size may be transmitted at any time data).
  • You can talk directly to the Web server via Telnet program.
  • Web of structural components:
    • Agent: HTTP intermediate entity located between the client and the server.
    • Cache: HTTP warehouse, used to make a copy of a page can be stored away from the client closer place.
      • Web caching (Web cache) or proxy cache (proxy cache) is a special HTTP proxy server, the transfer agent commonly used document copying saved.
    • Gateway: Connect special Web server for other applications.
      • Gateway (Gateway) is a special server, as an intermediate entity other servers, typically for converting HTTP traffic to other protocols
    • Tunnel: HTTP traffic packet forwarding special proxy blind.
      • Tunnel (tunnel) is established after the HTTP application, it will be in the original data connection between two blind forwarding.
      • HTTP tunneling is typically used on one or more non-HTTP connection forwarding HTTP data, not snooped data forwarding.
      • One common use by HTTP tunneling connection bearer encrypted HTTP Secure Sockets Layer (SSL, Secure Sockets Layer) flow rate.
    • Agent Agent: initiating a semi-automatic intelligent Web client HTTP request.
      • Web browsers.
    • When the HTTP proxy server Web security, application integration and performance optimization an important component modules. (Used as a proxy to forward all Web traffic trusted intermediate nodes)
      • Proxy can filter the request and response.

URL and Resources

  • URL name is the standardization of Internet resources.
  • URL syntax, URL character encoding nuclear rules, common URL schemes, URL's future (URN).
  • URI is a universal resource identifier, URL is actually a subset of it.
  • The first part of the URL is the URL of the program, the second part is the location of the server, the third part is the resource path.
  • URL resource can be accessed via HTTP, FTP, SMTP come.
  • <scheme>://<user>:<password>@<host>:<port>/<path>:<params>?<query>#<frag>.
  • The case has nothing to do with the program name, URL support the use of fragment (frag) components to white oh fragment is an internal resource.
  • http port 80 is the default port number, https default port number is 443, rtsp (Realtime Streaming Protocol, Real Time Streaming Protocol) to resolve audio and video media resource identifier.
  • Permanent uniform resource locator (persistent uniform resource locators, PURL): a process of searching the resource intermediate layer is introduced, the actual registration nuclear tracking URL resource through an intermediate server resource locator.

HTTP packets

  • HTTP packets corresponding to the parcel (data block transmitted between the HTTP application), (the main part of the starting line, and header entities) HTTP packets three components.
  • The term "inflow" and "outflow", "upstream" and "outflow" direction described packets.
    • All packets the sender in the recipient's upstream (upstream).
  • HTTP packets of three parts: a start line, a header, body.
    • The start line and the header line of ASCII text is segmented, the message body is optional data block (body can contain text or binary data, it can be empty, Content-Type explain what the subject is, Content-Length illustrates the main body how much).
  • All HTTP packets can be divided into two types: request message (request message) and a response packet (response message):
    • Request packet format:
     <method> <request-URL> <version>
     <headers>
    
     <entity-body>
    • Response message format:
     <version> <status> <reason-phrase>
     <headers>
    
     <entity-body>
  • HTTP common methods:
    • GET: get a document from the server.
    • HEAD: Gets only the first part of the document from the server.
    • POST: transmitting data to be processed to the server.
    • PUT: the body of the request is present on the server.
    • TRACE: possible up to the server message through a proxy server to track.
    • OPTIONS: decide which method can be performed on the server.
    • DELETE: delete a document from the server.
  • Status Code Category:
    • 100 to 199: Information only.
    • 200 to 299: success. (200- success)
    • 300 to 399: redirection.
    • 400 to 499: client error. (401 Unauthorized, 404 Not Found)
    • 500 to 599: Server Error.

Connection Management

  • HTTP optimization: parallel connections, keep-alive (persistent connections) of pipes and connections.
  • TCP provides a reliable transport pipeline bits is HTTP, connecting one end of the original will be filled byte order correctly transferred from the other end of the TCP.
  • TCP flow is segmented by the IP packet transfer: TCP data is transmitted by a small data blocks called IP packets (or IP datagram) is.
  • Performance depends largely on the performance of HTTP transactions underlying TCP channel, HTTP is made in many cases delay network delay.
  • The most common TCP-related delay:
    • TCP connection establishment handshake;
      • SYN / SYN + ACK (IP packet) will produce a measurable delay, the ACK packet is a TCP connection are usually large enough to carry the entire HTTP request, the HTTP server and many response messages into a direct IP packets go.
      • HTTP optimized for small things, because small things can build on HTTP spend 50% of time in establishing a connection on TCP.
    • TCP slow start congestion control;
      • Each segment has a TCP sequence number and the data is sent back a little when the integrity checksum, received intact segment acknowledgment packet sender does not receive the acknowledgment message within the specified time, it will retransmit the data;
      • The TCP packet acknowledgment information and the output data returned together, the network can be effectively utilized.
      • To increase the likelihood of finding the same acknowledgment message packet to transmit data, many TCP stacks implement a "delay" to confirm algorithms.
      • Confirmation algorithm output within a particular time window (100-200 ms) to confirm stored in the buffer, to find that it is possible to piggyback output data packet (acknowledgment information will not be found in a separate packet transmissions).
      • HTTP request with a bimodal feature - response behavior reduces the possibility of piggybacking information (HTTP delayed acknowledgment algorithm could have introduced considerable delay).
      • Before any parameters modify TCP stack, it must have a clear understanding of what they are doing.
      • TCP performance also depends on the use of data handed down period (age) TCP connections, TCP connection will be self-coordination over time, initially limit the maximum speed of the connection, if the data is successfully transferred, will over time increase the transmission speed - such coordination is called slow start TCP (slow start), for preventing sudden overload and Internet congestion.
      • It limits the number of TCP slow start TCP endpoint a packet can be transmitted at any time, since the tuned connection faster, so there are some tools existing HTTP connection can be reused (HTTP persistent connection).
    • Nagle algorithm for data gathering;
      • A TCP data stream interface, the application can be placed in the TCP stack (TCP segment but each loaded at least 40 bytes and a header of the data tag will arbitrary size, TCP send large amounts of data contain small amounts packet network performance will seriously decrease).
      • Nagle algorithm attempts before sending a packet to bind together a large number of TCP data in order to improve network efficiency.
      • Nagle algorithm encourage transmitting full-size (largest dimension to give the LAN packet is about 1500 bytes, on the Internet is a few hundred bytes) have segments.
      • Only when other packets are acknowledged, Nagle algorithm only allowed to send packets to obtain non-full size.
      • Nagle algorithm due to smaller HTTP packets wait for additional data to generate delay. Nagle Algorithm prevents transmission data until an acknowledgment packet arrived so far, it has been confirmed sublet itself be delayed acknowledgment algorithms 100 to 200 ms delay.
      • HTTP applications are often set in their own stack parameters TCP_NODELAY disable Nagle algorithm to improve performance.
    • TCP delayed acknowledgment for piggybacking algorithm;
    • TIME_WAIT delay and port exhausted.
      • TIME_WAIT port depletion is a serious performance issues will affect the performance benchmarks.
      • When a TCP endpoint close the TCP connection will be maintained in a small memory control block, for recording the recent closing the connection IP address and port number; this information will remain for a short time, usually estimated maximum points use segments.
  • HTTP allows a string HTTP intermediate entity (proxy, cache, etc.) exists between the client and the final end of the source server.
  • The Connection HTTP header field has a list of tags separated by commas connection with these tags specified for the connection of some options will not spread to other connections go.
    • Parallel connection: HTTP request to initiate multiple concurrent connections through TCP.
    • Persistent Connection: reuse TCP connections to eliminate the connection and closing delay.
    • Piped connection: HTTP request to initiate concurrent connections through shared TCP.
    • Multiplexed connections: alternately transmit a request and response packets.
  • Each HTTP response should have accurate Content-Length header, responsive to describe the size of the body.
  • Although the user may make the operator of Agent to select whether to retry the request, but must not be automatically retried idempotent method or sequence.
  • Is completely closed and semi-closed:
    • close () will TCP connections of input and output channels are closed, is called fully closed.
    • shutdown () Close separate input or output channel, which is called a half-close.
      • Using a semi-closed to prevent peer entity receives an unexpected write error is critical. Normally closed first application should close their output channels and then wait for the other end is connected to the peer entity off its output channel.
      • Close to half its output channel and then periodically check the status of its input channels (lookup data, or the end of the stream). If the end of the channel is not closed within a predetermined time, the application can be forcibly closed connection, to save resources.

HTTP structure

HTTP server, proxy, caching, gateway and application.

Web server

  • HTTP web server will process the request and provide a response. All HTTP Web server can receive a resource request the resource HTTP request, the content back to the client.
  • HTTP Web server implements the TCP connection processing and related. Responsible for managing the resources provided by the Web server, and Web server configuration management, control and expansion area.
  • Multi-threaded Web server will limit the maximum number of threads / processes.
  • Multiplexes I / O server, in the multiplexed configuration, to simultaneously monitor all activity on the connection, when the state of the connection changes, connection piece on the right amount of processing; workup, the connection is returned to an open connection list, a change in the state of waiting.
  • Multi-threaded Web server multiplexed multithreading in each of the observation open connections, and each connection to perform a small task.
  • Web server's file system will have a special folder dedicated to storing Web content, the root of the document (document root, docroot).
    • The server can not let the outside relative URL retreated docroot, will expose the rest of the file system.
    • Virtual hosting Web server to identify the correct document root directory to be used according to the URI or IP address or host name Host header.
    • Virtual hosting Web server provides multiple Web sites on the same Web server, each site on the server has its own separate document root directory.
  • Many Web server also provides support for the server that contains the item (SSI) is.
  • If a server is overloaded uh receive a request, the server can redirect the client to a server load is not too heavy up. 303 See Other status code and can be used to redirect 307 Temporary Redirect.

proxy

Web Proxy (proxy) server is an intermediate network entity, agency located between the client and the server acts as an intermediary, transmitting HTTP messages back and forth between the endpoints.

  • HTTP proxy server that is Web server and a Web client, HTTP client sends a request message to the proxy, the proxy server must be the same as the Web server, and connect correctly process the request, and then returns the requested message.
  • A single client-specific agent is called a private agency, many clients share a common agent is called the agent.
    • Caching proxy server, the request will use between users, import users with a proxy server, the more useful to the cache server.
  • Proxy connections are two or more applications use the same protocol, and the gateway is connected to two or more endpoints using different protocols.
    • Playing the role of a gateway protocol converter, even if the client and server are using different protocols, the client can also be completed by that thing between the inside and the server.
  • Commercial proxy service will achieve the gateway to support SSL security protocols, SOCKS firewall, FTP access, and Web-based applications.
  • The proxy server can see and have access to all HTTP traffic flows, so the agent can monitor and modify traffic in order to achieve a lot of useful value-added Web services.
    • Network security engineers often use proxy servers to improve security.
    • Reverse proxy to improve the performance of public access to the contents of the Web server's slow. Called reverse proxy server to accelerate (Server accelerator).
  • How to get traffic agent:
    • Modifying the client, manual configuration, pre-configured browser, proxy auto-configuration (Proxy Auto-Configuration, PAC), WPAD proxy discovery, Web Proxy Auto-Discovery Protocol.
    • Modify the network
    • Modify the DNS namespace
    • Modify the Web server
  • Acting as access control equipment, HTTP defines a mechanism called proxy authentication (Proxy authentication) This mechanism can prevent requests for content, up until the user provides a valid access credentials to the proxy.

Cache

Cache reduces redundant data transmission, cache alleviates the problem of network bottleneck, caching reduces the demands on the original server, caching reduces the distance delay (remote places page will load more slowly).

  • Caching can also alleviate the bottleneck of the network. When the cache is very important moment in the destruction of congestion (Flash Crowds).
  • Cache topology:
    • Caching can be dedicated to a single user, it can be thousands of users shared, dedicated cache is called a private cache (private cache), the shared buffer cache is called public (public cache).
  • Different Web server provides a number of different restricted to HTTP cache-control header and set the Expiration.
  • There are special algorithms to calculate the freshness lifetime of the document and cache: Cached copies and use of fresh lifetime of the cached copy (freshness lifetime)

Integration Point: Gateway, tunnels and relay

  • Web is a powerful content publishing tools (database content or dynamically generated HTML pages).
  • Play a role in the interface between HTTP and other protocols and application gateway.
  • Application program interface to allow different types of Web applications communicate with each other.
  • Tunneling allows users to connect non-HTTP traffic transmitted on HTTP.
  • As a simplified HTTP proxy, a hop relay forwards data.

Gateway

Can be used as a kind of gateway translator, it abstracts a method capable of reaching the resource, the gateway is an adhesive between the resources and applications.

  • The gateway can send a query to the database, or generate dynamic content.

tunnel

  • Another use of HTTP - Web Tunnel (Web tunnel), this way you can use the application to access non-HTTP protocol over HTTP applications.
  • SSL tunnel, the tunnel connection can be transmitted via an SSL traffic HTTP, HTTP through the firewall to port 80.

relay

  • HTTP relay (Relay) is not fully HTTP specification simple HTTP proxy, the relay processing section responsible for establishing HTTP connection, and then forwarded to the blinded bytes.
  • There is a blind relay common problem is that because they can not handle the header Connection correct, it may be a potential hang keep-alive connections.

Web robot

  • Web crawler is a robot, they will recursively various informational Web site is traversed get the first Web page and then get all the Web pages pointing to that page, and then point to All Web pages which pages, and so on .
  • The initial set of crawlers start URL access is called the root set (root set).
  • File system symbolic link will cause certain potential loop, because they will be in limited circumstances directory hierarchy depth, causing the illusion of infinite depth.
  • The more automated crawler (artificial regulatory fewer), the more likely in trouble, use some technology can make the robot perform better:
    • Planning of the URL: the URL into a standard form in order to avoid the alias syntax.
    • Breadth-first crawling, will affect the loop is minimized.
    • Choke: Limit the number of pages you can get the robot from a Web site over a period of time.
    • Limit the size of the URL, URL / site blacklist mode detection, content fingerprint (checksum calculation, Checksum), manual monitoring.
  • General pattern:
    • Web early stages of development, several pages one billion search engine is accessible on some fairly simple database, Web, search engines have been known as the Internet users to find information indispensable tool.
    • Now search engines have built a number of complex local database called full-text index, loaded with the contents of Web pages all over the world, and these pages contain. Full-text index is a database, to a word, it can provide all the documents that contain the word immediately after creating the index, you do not need to scan the document itself.
  • Relevance ranking (relevancy ranking), the search results for a series of scoring and ranking process.
  • With a gateway application to generate some fake pages on certain words to deceive search engines can better correlation algorithm.
  • HTTP-NG: Modular three layers: packet transport layer, remote operation layer and Web application layer functions.

Identification, authentication and security

Client identification and cookie mechanism

  • HTTP headers carry user identity information
  • Client IP address tracking, identification, user log in to their user's IP address, to identify the user with the authentication mode, fat URL, the identification information embedding technique in the URL.
  • A cookie is a recognition of the current user, the best way to achieve lasting sessions: session cookie and persistent cookie.
    • Session cookie sleep as a temporary cookie, it records the settings and preferences when the user visits the site, when the user exits the browser, the session cookie has been deleted.
    • Persistent cookie lifetime longer, they are stored on the hard disk, quit the browser, and then they either exist when the computer restarts, usually maintain profile or logon name of a site periodically visited by the user persistent cookie.
  • cookie is prohibited, but also by log analysis or other means to achieve most of the track record, so the cookie itself is not a big security risk.

Basic authentication mechanism

  • HTTP provides a primary challenge / response (challenge / response) frame, simplifying the authentication process for the user.
  • Basic authentication convenient and flexible, but extremely unsafe, user name and password are transmitted in clear text, without taking any measures to prevent tampering with the packets.
  • Digest authentication improvements:
    • Never send passwords in clear text over the network.
    • It prevents malicious users to capture and replay the authentication handshake process.
    • You can selectively prevent tampering with the contents of the message.
    • Other common focus on prevention of attacks.
  • Transport Layer Security (Transport Layer Security, TLS) and secure HTTP (Secure HTTP, HTTPS) protocol safer.
  • MD5 digest, prevent replay attack, digest authentication handshake with a random number:
    • Server will calculate a random number and a random number into the WWW-Authenticate challenge packets.
    • Client selects an algorithm to calculate the password and other data summary, a summary on a Authorization message back to the server.
    • Server receives the digest, the algorithm and the selected supporting data, the client calculates a digest the same.
  • Multiple question: If the server did not understand the client the ability to connect providing basic authentication challenge, but also provides a summary of the authentication challenge; client facing multiple questions must be the strongest challenge to the response mechanisms it supports.
  • Field value, standard root URL of the server to be accessed together, define a protected space.

Secure HTTP

Important matters need to be HTTP and digital encryption technology used in combination, in order to ensure safety.

  • HTTPS secure HTTP is the most popular form, the data are encrypted by a safe Layer (SSL or TLS).
  • Password: encode the text, the algorithm voyeur unrecognized.
  • Key: change the number of parameter password behavior.
  • Symmetric key encryption system: codec algorithms use the same key.
  • Asymmetric key encryption system: the codec algorithm using different keys.
  • Public key encryption system: for a portable computer capable of millions of packets sent confidential system.
  • Digital Signature: used to verify the message has not been forged or tampered checksum.
  • Digital certificates: authentication and issued by a trusted organization identification information.
  • Popular symmetric key encryption algorithm comprises: DES, Triple-DES, RC2 and RC4.

Content publishing and distribution

Web Hosting

  • Storage of content resources, coordination and management responsibilities is called Web hosting (hosting one of the main functions of the Web server).
  • Hosting service, dedicated hosting, web hosting

Redirection and load balancing

  • HTTP is not a stand-alone online, many protocols will manage their data during transmission of HTTP packets:
    • HTTP redirection
    • DNS Redirection
    • Either dial routing
    • PBR
    • IP MAC forwarding
    • IP address forwarding
    • WCCP (Web Cache Coordination Protocol)
    • ICP (inter-cache communication protocol)
    • HTCP (HyperText Cache Protocol)
    • NECP (NE Control Protocol)
    • The CARP (Cache Array Routing Protocol)
    • WPAD (Web Proxy Auto-Discovery Protocol)
  • Network redirection advantages: HTTP transactions performed reliably, minimizing delay, saving network bandwidth.

Guess you like

Origin www.cnblogs.com/longjiang-uestc/p/11404501.html