Apache Week
   
   Issue 25, 26th July 1996:  

Copyright 1996-2005
Red Hat, Inc.

In this issue


Apache Status

Release: 1.1.1
Beta: None
Bugs in 1.1.1:

  • No major bugs reported.

Conditional Config Directives

When modules are omitted from the compilation, the associated directives have to be commented-out from the configuration files. If this isn't done, Apache will report the unknown directive and fail to run. To make the process of changing which modules are compiled-in easier, there will be a new directive which makes parts of the configuration files conditional, depending on what modules have been compiled in.

For example, the proxy module directives might be specified like this:

<IfModule proxy_module>
ProxyRequests On
CacheRoot /usr/local/cache
... other proxy module directives ...
</IfModule>

The directives inside the <IfModule> ... </IfModule> section will only be interpreted if the proxy module has been compiled into the executable. This will make it much easier to experiment with different module combinations.

Regular Expressions

A number of optional modules for Apache make use of "regular expressions", including the rewrite module (see below), the extended SSI module (mod_xssi) and the php module (mod_php). Each of these links in it's own regular expression code, which can make the Apache executable significantly larger. In future, Apache will come with an integrated regular expression library. This would let these modules, and others, use the Apache regular expression functions rather than including their own. In addition a number of standard Apache modules could make use of regular expressions. For example, the userdir module already has very basic expression matching (using just the * character), but it could be much more powerful with full regular expressions.

URI Re-writing module updated

A new version of mod_rewrite is now available. This module provides a very general method of mapping the requested URI onto an other URI. At it's simplest, it can replace the alias module (which also maps URIs), but the rewrite module can do much more complex rewrites. It supports full "regular expressions", and can also link to external databases (plain-text or DBM) to provide a mapping function.

There are many changes from the previous release, including ability to rewrite URLs from directives in .htaccess files, support for passing rewritten URLs to the proxy module, and a module which uses the rewrite functions to replace both the alias and userdir modules.


Content Negotiation Explained

Content Negotiation is an often over-looked feature of Apache, but correctly used it can let you present documents in different languages and formats based on what the user wants. Apache is one of the few servers that actually implements content negotiation. However there are a few problems caused by browsers which do not do the right thing. We explain how to use negotiation correctly, and why some browsers make this difficult.

Why content negotiation is needed

Content negotiation is a very powerful tool where the browser says what type of information it can accept, and the server decides what (if any) type of information to return. The term type is used very loosely here, because negotiation can apply to several aspects of the information. For example, it can be used to choose the appropriate human language for a document (say, French or German), or to choose the media type that the browser can display (say, GIF or JPEG).

In order for the server to deliver the correct representation of the data, the browser must send some information about what it can accept. A browser used on a French-language machine, for instance, should indicate that it can accept data in French (of course, this should also be user-configurable).

The most common use of content negotiation at the moment is to select data based on media type. Here, the browser says what sort of data it can display. For example, when requesting an inline image, the browser could tell the server that it can accept GIF and JPEG images. Infact, the browser might prefer to JPEG over GIF images because they are quicker to download, so it can specify this as well. The ability to indicate what content types a browser can accept is particularily important now that plug-ins can extend the browser capabilities. Unfortunately many current browsers don't supply the correct information to the server.

Using Negotiation

To use negotiation, you need two things. Firstly, you need a resource that exists in more than one format (for example, a document in French and German, or an image stored as a GIF and a JPEG), and secondly you need to configure Apache to know that each of these files is actually the same resource. Apache has two methods for doing this: either using a special index file to identify the various versions of the information, or using the MultiViews facility where Apache gets the information it needs from file extensions.

Using a Variants File

The first method involves creating a variants file, usually referred to as a var file. This lists each of the files which contains the same resource, along with details of what representation it is. Any request for this var file causes Apache to return the best file, based on the contents of the var file and the information supplied by the browser.

To get Apache to use variant files, first uncomment the following line in srm.conf:

   AddHandler type-map var

and restart the server as normal.

As an example, say there is a file in English and a file in German containing the same information. The files could be called english.html and german.html (they are both HTML files). So create a var file listing each of these files, and specifying which languages they are in. Create a var file called (say) info.var containing:

   URI: english.html
   Content-Language: en

   URI: german.html
   Content-Language: de

This file consists of a series of sections, separated by blank lines. Each section contains the name of the file (on the URI: line) and header information used in the negotiation.

Now, when a request for info.var is received, the server will read the var file and return the best file, based on which languages the browser has said it can accept. Similarly, the var file could be used to select files based on content type (using Content-Type:) or content encoding (using Content-Encoding:), or any combination.

The Content-Type: line in a variants file can also give any other content type parameters, such as the subjective qualify factor. This will be used in the negotation when picking the 'best' match. For example, an image available as a JPEG might be regarded as having higher quality then the same image in GIF format. To tell this to the server, the following .var contents could be used:

  URI: image.jpg
  Content-Type: image/jpeg; qs=0.6

  URI: image.gif
  Content-Type: image/gif; qs=0.4

Here the qs parameters give the 'source quality' for these two files, in the range 0.000 to 1.000, with the highest value being the most desirable. A browser than indicates it can handle both GIF and JPEG files equally would see the JPEG version rather than the GIF.

Using variant files gives complete control over the scope of the negotiation, however it does require the file to be created and maintained for each resource. An alternative interface to the negotiation mechanism is to get Apache to identify the negotiation parameters (language, content type, encoding) from the file extensions.

Using File Extensions

Instead of using a var file, file extensions can be used to identify the content of files. For example, the extension eng could be used on English files, and ger on German files. Then the AddLanguage directive can be used to map these extensions onto the standard language tags.

To use this feature, the MultiViews option must first be turned on in the directory, either in access.conf or a .htaccess file. Note that Options All does not turn on multiviews.

After enabling multiviews, the directives which map extensions onto representation types can be given. These are AddLanguage, AddEncoding and AddType (content types are also set in the mime.types file). For example:

   AddLanguage en               .eng
   AddLanguage de               .ger
   AddEncoding x-compress       .Z
   AddType     application/pdf  pdf

(the last line is shown as an example only, this is actually set in the mime.types on recent Apache versions).

When a request is received, the server looks at all the files in the directory which start with the same filename. So a request for /about/info would cause the server to negotiate between all the files names /about/info.*

For each matching file, the server checks its extensions and sets the content type, language and encodings appropriately. For example, a file called info.eng.html would be associated with the language tag en and the content type text/html. The source quality is assumed to be 1.000 for all files (this can actually be set on the mime type, like "text/html;qs=0.5" but this confuses most browsers so is probably best not used).

The extensions can be listed in any order, and the request itself can include one or more extensions. For example, the files info.html.eng and info.html.ger could be requested with the URL info.html. This provides an easy way to upgrade a site to use negotiation without having to change existing links.

Of course, for negotiation to work browsers must send the correct information. While most make a reasonable attempt there are some problems.

What Browsers Do

For negotiation to work, browsers must send the correct request information. For human languages, browsers should let the user pick what lanuguage or languages they are interested in. Recent betas versions of Netscape let the user select one or more languages (see the Options, General Preferences, Languages section).

For content-types, the browser should send a list of types it can accept. For example, "text/html, text/plain, image/jpeg, image/gif". Most browsers also add the catch-all type of "*/*" to indicate that they can accept any content type. The server treats this entry with lower priority than a direct match.

Unfortunately, the */* type is sometimes used instead of listing explicitly acceptable types. For example, if the Adobe Acrobat Reader plug-in is installed into Netscape, Netscape should add application/pdf to its acceptable content types. This would let the server transparently send the most appropriate content type (PDF files to suitable browsers, else HTML). Netscape does not send the content types it can accept, instead relying on the */* catch-all. This makes transparent content-negotiation impossible.

In addition, most browsers do not indicate a preferences for particular types. This should be done by adding a preference factor (q) to the content type. For example, a browser which can accept Acrobat files might prefer them to HTML, so it could send an accept type list which includes text/html: q=0.7, application/pdf: q=0.8. When the server handles the request, it would combine this information with its source quality information (if any) to pick the 'best' content type to return.

The Future

The new HTTP/1.1 specification defines how content negotiation works for the first time. It also adds some new facilities which are not yet available in any browser or server. This includes the ability for the server to return a list of possible matches if it cannot identify the best one to use. Apache will implement the server end of HTTP/1.1 content negotiation in its next release.


Hints and Tips

Location can affect more than intended

The new <Location> directive can be used to restrict access to URIs, including individual files. The example access.conf file with Apache shows how to restrict access to the new information module, using the URL /info to access this module. However, Location matches any URL which starts with the same text, so the example

  <Location /info>
  ...
  </Location>

would also match requests for /information.html and /info/about.html, for example. This might affect existing information on a site. The work around is to change the URI used to access the information module, to (say) /apache-info. In future versions of Apache the Location text will match only full URLs or directories.


And finally...

Apache in good health

Healtheon, a company specialising in integrating Internet technology into health care, used Apache as it's server until yesterday (25th July). Nothing remarkable about that, considering 100,000 other sites also use Apache. Except that Healtheon was founded by Jim Clark, the Chairman and founder of Netscape.

Apache goes for Gold

The results service of the Altanta Olympic Games uses Apache. While the main service (www.atlanta.olympic.org) is run by IBM on IBM hardware and software, they serve the live results from Apache on results.atlanta.olympic.org.


Comments or criticisms? Please email us at editors@apacheweek.com