Apache Week
   

Copyright 1996-2005
Red Hat, Inc.

HTTP/1.1:

HTTP/1.1 is a major revision of the HTTP standard, which defines how browsers, servers and proxies communicate.

First published: 16th August 1996

The Hypertext Transfer Protocol

From version 1.2, Apache was be fully compliant with the new HTTP/1.1 specification. This is the protocol which tells browsers and servers how to communicate, and the features added here determine how Web pages can be accessed. We take a look at what HTTP/1.1 includes and what changes it will bring to browsers and servers.

Part of Apache Week issue 28 (16th August 1996).

Hypertext Transfer Protocol (HTTP) defines how Web pages are requested and transmitted across the Internet. Almost all servers and browsers currently use version 1.0 of this protocol, but a major update, version 1.1, has been released. HTTP/1.1 adds a lot of new features to HTTP, which in turn will lead to new capabilities in both servers and browsers. We look at what is new in 1.1 and how it is likely to affect the Web.

History of HTTP

HTTP was initially a very simple protocol used to request pages from a server. The browser would connect to the server and send a command like:

  GET /welcome.html

and the server would respond with the contents of the requested file. There were no request headers, no methods other than GET, and the response had to be a HTML document. This protocol was first documented as HTTP/0.9. All current servers are capable of understanding and handling HTTP/0.9 requests, but the protocol is so basic it is not very useful today.

Browsers and servers extended the HTTP protocol from 0.9 with new features such as request headers and additional request methods. The resulting HTTP/1.0 protocol was only officially documented in early 1996 with the release of RFC1945. Servers and browsers having been using HTTP/1.0 for several years.

Even while 1.0 was being documented, the next version was in serious development. This time the specification was developed first. This new version, 1.1, is now available as RFC2068. HTTP/1.1 will include a lot of new features, and will also document for the first time some features already found in servers or browsers.

A Quick Guide to how HTTP Works

Knowing how HTTP works is very useful for a server administrator. It lets you check out the operation of your server without having to fire up a browser, and gives you a very useful diagnostic tool to check in detail how the server responds to individual requests.

You can use telnet to emulate how a browser requests documents from a server. With telnet you can connect to the server, issue a request, and see what the server responds with. For example, to get the home page from www.apacheweek.com, you would use:

  % telnet www.apacheweek.com 80
  Connected to www.apacheweek.com.
  GET / HTTP/1.0           [RETURN]
                           [RETURN]

This assumes you are connecting from a Unix system, starting at the command prompt (%) and with a telnet command available. You could also use any other telnet program such as the one in Windows 95. The text in bold is what you type. The standard port for Web requests is port 80, so we connect to that port number. Once connected we can type in and send a HTTP request, followed by the request headers. In this case, the request is GET / HTTP/1.0. The / is the resource we want to obtain, and the HTTP/1.0 tells the server that this is a HTTP/1.0 request. After entering this line, press RETURN twice - the first ends the request line, and the second marks the end of the optional request headers (in this case, we did not enter any request headers). The server will respond by sending a number of response headers, followed by the text of the requested document.

It is often more convenient to send a 'HEAD' request instead of 'GET'. This makes the server behave exactly as if it was handling a GET, but it doesn't bother to send the actual document. This makes it much easier to see the response headers, and means you do not have to wait to download the document itself. For example, to see what response headers that www.apacheweek.com sends for /, use:

  HEAD / HTTP/1.0

  HTTP/1.0 200 OK
  Date: Fri, 16 Aug 1996 11:48:52 GMT
  Server: Apache/1.1.1 UKWeb/1.0
  Content-type: text/html
  Content-length: 3406
  Last-modified: Fri, 09 Aug 1996 14:21:40 GMT

  Connection closed by foreign host.

The first response line is the status - in this case '200' means the request is okay. The rest are response headers, which give information either about the server or the resource. For example, Server: gives the server version, and Last-Modified: is the last modification date of the file.


New in HTTP/1.1

The basic operation of HTTP/1.1 remains the same as for HTTP/1.0, and the protocol ensures that browsers and servers of different versions can all interoperate correctly. If the browser understands version 1.1, it uses HTTP/1.1 on the request line instead of HTTP/1.0. When the server sees this is knows it can make use of new 1.1 features (if a 1.1 server sees a lower version, it must adjust its response to use that protocol instead).

HTTP/1.1 contains a lot of new facilities, the main ones are: hostname identification, content negotiation, persistent connections, chunked transfers, byte ranges and support for proxies and caches.

Hostname Identification

Every request sent using HTTP/1.1 must identify the hostname of the request. For example, if the URL http://www.apache.org/ is used, the request must include the fact that the hostname part is 'www.apache.org'. In previous versions of HTTP, the server never knew the hostname used in the URL. Letting the server see the hostname allows the implementation of non-IP virtual hosts. For example, if two names, www.apache.org and www.someoneelse.com, point to the same IP address, a HTTP/1.1 server can use the hostname it receives to return different content for each request. HTTP/1.0 servers cannot differentiate between these two requests.

The hostname must be passed to the server either as a full URI on the request line, or on the new Host: header. For example, to test how www.apache.org responds to a HTTP/1.1 request, you could send

  GET / HTTP/1.1
  Host: www.apache.org

Note that the HTTP version on the GET request is now 'HTTP/1.1'. If the URI does not include the hostname on the Host: header the server will respond with an error.

Content Negotiation

Content Negotiation refers to the ability to have a number of different versions of a single resource. For example, a document might be available in English and French, with each of these available as either HTML or PDF. The possible responses are called representations or variants.

There are actually two sorts of content negotiation:

  • Server-driven Negotiation
    Here the server decides (or guesses) on the best representation to send to the browser, based on information the browser provides in the request
  • Agent-driven Negotiation
    Here the server does not guess on the best representation, but instead returns of list of the representations it has. The browser can then either automatically request one of these, or present a choice to the use.

The first type, server negotiation, has been implemented in Apache since the summer of 1995 and is explained in a special feature from Apache Week issue 25. However, the HTTP/1.1 specification is the first place it is officially documented.

The second type, agent negotiation, is not fully documented. The HTTP/1.1 specification just contains basic definitions of some of the headers to be used, but no details. The details of content negotiation are being specified in an Internet draft. This draft also expands on how server-driver negotiation works, and defines how caches can perform negotiation on behalf of either the server or the user agent.

Persistent Connections

Many pages today include inlined documents, usually images but increasingly also sounds and other types such as Shockwave presentations. These pages can be slow to download because each item needs to be requested separately from the server, each on a separate connection. Typically, for each inline document the browser needs to connect to the server, ask for the document, wait for it to be received, and disconnect from the server. (Although some browsers can do multiple requests in parallel).

This can be slow, especially across the Internet when there is a delay involved in each connection and disconnection. To help make pages with inline documents quicker to download, HTTP/1.1 defines persistent connections where a number of documents can be requested over a single connection, one at a time.

An early implementation of persistent connections was known as keep-alive, and Apache as well as a number of other servers and browsers support this sort of connection. However, persistent connections are first officially documented in HTTP/1.1, and will be implemented slightly differently from keep-alives.

For a start, in HTTP/1.1, persistent connections are the default. Unless the browser explicitly tells the server not to use persistent connections, the server should assume that it might be getting multiple requests on a single connection. Persistent connections are controlled by the Connection header. Unless a Connection: close header is given, the connection will remain open. This can be tested by connecting to www.apache.org and sending a simple request, for example:

  % telnet www.apache.org 80
  HEAD / HTTP/1.1
  Host: www.apache.org
  
  HTTP/1.1 200 OK
  Server: Apache/1.3.0
  ...

where the connection will remain open for a short period before closing (this is a server-configurable time out). If the same request is sent with a Connection: close header the connection will close immediately after the request headers have been sent.

Chunked Transfers

Normally, when sending back a response the sever has to know everything about the response it is about to send before it sends it. For instance, servers should set the Content-Length header on each response to the length of the response itself. This can be difficult for the server to do if the content is dynamically created (e.g. if it is the output of a CGI script). So in practice servers (including Apache) often do not send a Content-Length with dynamic documents. This has not been a problem with HTTP/1.0, but for persistent connections to work in HTTP/1.1, the Content-Length must be known in advance.

The server could find out the length of the output of a CGI script by reading it into memory until the script has finished, then setting the Content-Length and returning the stored content. This might be acceptable for small content, but could be a problem if the CGI produces a lot of output. One possible way around this is to use the new chunked encoding method. This lets the server send output a bit at a time. Each bit (or chunk) is small enough for its content-length to be known before it is sent. Using chunked encoding will let servers send out dynamic content that is either large or produced slowly without having to disable persistent connections.

In addition, after a chunked-encoded document has been completely sent, additional response headers can be transmitted. This could allow dynamically produced headers to be associated with the document, even if they are not available until after the script (or whatever produced the document) has finished.

Byte Ranges

Byte ranges allow browsers to request parts of documents. This can be used to continue an interrupted transfer, or to obtain just part of a long document (say, a single page).

Byte ranges are implemented by the Range header. For example, to request just the second 500-bytes of a document, the request would include:

  Range: bytes=500-999

A single request can also ask for more than one range at once (for example, it could ask for the first 500 bytes and the last 500 bytes of a file). When the server replies, it will send back each part in a single response, using MIME multipart encoding to distinguish the parts.

Proxies and Caches

HTTP/1.1 includes a lot of information and new features for people implementing proxies and caches. Until now, the operation of proxies and caches has been largely undocumented. In addition to documenting how they are supposed to work, HTTP/1.1 also includes a range of new features to make implementing proxies and caches easier, and in particular to reduce network traffic by allowing proxies and caches to send more 'conditional' requests and to do transparent content negotiation.

A conditional request is like a normal request, except the sender (the proxy or cache server) includes some information about whether it really needs the document. For example, a proxy or cache can send an entity-tag which identifies a document it already has, and the server only sends back the document if the cache does not already have this document. Conditional requests can also be based on the last-modified time of the document.

Other Changes

There are a lot of other changes between 1.0 and 1.1, including

  • More status response codes
  • New request methods: OPTIONS, TRACE, DELETE, PUT
  • Digest authentication
  • Various new headers such as Retry-After: and Max-Forwards:
  • Definition of the media types message/http and multipart/byteranges

How this will Affect Servers and Browsers

Users of the Web will notice the following major changes when browsers and servers are available which implement HTTP/1.1:

  • Non-IP virtual Hosts
    Virtual hosts can be used without needing additional IP addresses.
  • Content Negotiation means more content types and better selection
    Using content negotiation means that resources can be stored in various formats, and the browser automatically gets the 'best' one (e.g. the correct language). If a best match cannot be determined, the browser or server can offer a list of choices to the user.
  • Faster Response
    Persistent connections will mean that accessing pages with inline or embedded documents should be quicker.
  • Better handling of interrupted downloads
    The ability to request byte ranges will let browsers continue interrupted downloads.
  • Better Behaviour and Performance from Caches
    Caches will be able to use persistent connections to increase performance both when talking to browsers and servers. Use of conditionals and content negotiation will mean caches can identify responses quicker.


Comments or criticisms? Please email us at editors@apacheweek.com