Apache Week
   

Copyright ©2020 Red Hat, Inc

First published: 26th July 1996

Content Negotiation Explained

Content Negotiation is an often over-looked feature of Apache, but correctly used it can let you present documents in different languages and formats based on what the user wants. Apache is one of the few servers that actually implements content negotiation. However there are a few problems caused by browsers which do not do the right thing. We explain how to use negotiation correctly, and why some browsers make this difficult.

Why content negotiation is needed

Content negotiation is a very powerful tool where the browser says what type of information it can accept, and the server decides what (if any) type of information to return. The term type is used very loosely here, because negotiation can apply to several aspects of the information. For example, it can be used to choose the appropriate human language for a document (say, French or German), or to choose the media type that the browser can display (say, GIF or JPEG).

In order for the server to deliver the correct representation of the data, the browser must send some information about what it can accept. A browser used on a French-language machine, for instance, should indicate that it can accept data in French (of course, this should also be user-configurable).

The most common use of content negotiation at the moment is to select data based on media type. Here, the browser says what sort of data it can display. For example, when requesting an inline image, the browser could tell the server that it can accept GIF and JPEG images. Infact, the browser might prefer to JPEG over GIF images because they are quicker to download, so it can specify this as well. The ability to indicate what content types a browser can accept is particularily important now that plug-ins can extend the browser capabilities. Unfortunately many current browsers don't supply the correct information to the server.

Using Negotiation

To use negotiation, you need two things. Firstly, you need a resource that exists in more than one format (for example, a document in French and German, or an image stored as a GIF and a JPEG), and secondly you need to configure Apache to know that each of these files is actually the same resource. Apache has two methods for doing this: either using a special index file to identify the various versions of the information, or using the MultiViews facility where Apache gets the information it needs from file extensions.

Using a Variants File

The first method involves creating a variants file, usually referred to as a var file. This lists each of the files which contains the same resource, along with details of what representation it is. Any request for this var file causes Apache to return the best file, based on the contents of the var file and the information supplied by the browser.

To get Apache to use variant files, first uncomment the following line in srm.conf:

   AddHandler type-map var

and restart the server as normal.

As an example, say there is a file in English and a file in German containing the same information. The files could be called english.html and german.html (they are both HTML files). So create a var file listing each of these files, and specifying which languages they are in. Create a var file called (say) info.var containing:

   URI: english.html
   Content-Language: en

   URI: german.html
   Content-Language: de

This file consists of a series of sections, separated by blank lines. Each section contains the name of the file (on the URI: line) and header information used in the negotiation.

Now, when a request for info.var is received, the server will read the var file and return the best file, based on which languages the browser has said it can accept. Similarly, the var file could be used to select files based on content type (using Content-Type:) or content encoding (using Content-Encoding:), or any combination.

The Content-Type: line in a variants file can also give any other content type parameters, such as the subjective qualify factor. This will be used in the negotation when picking the 'best' match. For example, an image available as a JPEG might be regarded as having higher quality then the same image in GIF format. To tell this to the server, the following .var contents could be used:

  URI: image.jpg
  Content-Type: image/jpeg; qs=0.6

  URI: image.gif
  Content-Type: image/gif; qs=0.4

Here the qs parameters give the 'source quality' for these two files, in the range 0.000 to 1.000, with the highest value being the most desirable. A browser than indicates it can handle both GIF and JPEG files equally would see the JPEG version rather than the GIF.

Using variant files gives complete control over the scope of the negotiation, however it does require the file to be created and maintained for each resource. An alternative interface to the negotiation mechanism is to get Apache to identify the negotiation parameters (language, content type, encoding) from the file extensions.

Using File Extensions

Instead of using a var file, file extensions can be used to identify the content of files. For example, the extension eng could be used on English files, and ger on German files. Then the AddLanguage directive can be used to map these extensions onto the standard language tags.

To use this feature, the MultiViews option must first be turned on in the directory, either in access.conf or a .htaccess file. Note that Options All does not turn on multiviews.

After enabling multiviews, the directives which map extensions onto representation types can be given. These are AddLanguage, AddEncoding and AddType (content types are also set in the mime.types file). For example:

   AddLanguage en               .eng
   AddLanguage de               .ger
   AddEncoding x-compress       .Z
   AddType     application/pdf  pdf

(the last line is shown as an example only, this is actually set in the mime.types on recent Apache versions).

When a request is received, the server looks at all the files in the directory which start with the same filename. So a request for /about/info would cause the server to negotiate between all the files names /about/info.*

For each matching file, the server checks its extensions and sets the content type, language and encodings appropriately. For example, a file called info.eng.html would be associated with the language tag en and the content type text/html. The source quality is assumed to be 1.000 for all files (this can actually be set on the mime type, like "text/html;qs=0.5" but this confuses most browsers so is probably best not used).

The extensions can be listed in any order, and the request itself can include one or more extensions. For example, the files info.html.eng and info.html.ger could be requested with the URL info.html. This provides an easy way to upgrade a site to use negotiation without having to change existing links.

Of course, for negotiation to work browsers must send the correct information. While most make a reasonable attempt there are some problems.

What Browsers Do

For negotiation to work, browsers must send the correct request information. For human languages, browsers should let the user pick what lanuguage or languages they are interested in. Recent betas versions of Netscape let the user select one or more languages (see the Options, General Preferences, Languages section).

For content-types, the browser should send a list of types it can accept. For example, "text/html, text/plain, image/jpeg, image/gif". Most browsers also add the catch-all type of "*/*" to indicate that they can accept any content type. The server treats this entry with lower priority than a direct match.

Unfortunately, the */* type is sometimes used instead of listing explicitly acceptable types. For example, if the Adobe Acrobat Reader plug-in is installed into Netscape, Netscape should add application/pdf to its acceptable content types. This would let the server transparently send the most appropriate content type (PDF files to suitable browsers, else HTML). Netscape does not send the content types it can accept, instead relying on the */* catch-all. This makes transparent content-negotiation impossible.

In addition, most browsers do not indicate a preferences for particular types. This should be done by adding a preference factor (q) to the content type. For example, a browser which can accept Acrobat files might prefer them to HTML, so it could send an accept type list which includes text/html: q=0.7, application/pdf: q=0.8. When the server handles the request, it would combine this information with its source quality information (if any) to pick the 'best' content type to return.

HTTP/1.1

The new HTTP/1.1 specification defines how content negotiation works for the first time. It also adds some new facilities which are not yet available in any browser or server. This includes the ability for the server to return a list of possible matches if it cannot identify the best one to use. Apache implements the server end of HTTP/1.1 content negotiation.