Apache Week
   
   Issue 51, 7th February 1997:  

Copyright 1996-2005
Red Hat, Inc.

In this issue


Apache Status

Release: 1.1.3 (Released 14th January 1997)
Beta: 1.2b6 (Released 26th January 1997)

Bugs fixed in 1.2b6:
  • Configuration for HP MPE on HP3000, updates for QNX
  • Problem with negotiated documents where LanguagePriority was being ignored
  • Satisfy Any might not be applied in a directory if .htaccess file exists containing certain directives
  • Redirect from /index.html causes a core dump when a request for / is made (using DirectoryIndex)
  • Documentation says IdentityCheck and HostNameLookups are valid in .htaccess. This is not correct, and the docs have been updated

Patches to fix some Apache 1.2b6 bugs are available in the 1.2b6 patches directory on the Apache site.


Apache is currently in a 'beta release' cycle. This is where it is made available prior to full release for testing by anyone interested. Normally during the beta cycle no new major features will be added. The full release of Apache 1.2 is expected in February.

Performance Tweaks

Apache is designed for high performance sites. The use of dynamically changing number of pre-forked servers means it can cope with rapidly changing load levels. However this week has seen a number of changes designed to increase performance. This has been in the specific area of server side includes, response transmission and the Apache core code which gets used on every request. Server Side includes are being speeded up by reading the file in large chunks (buffering). Additional speed-ups are being considered but may not make it into 1.2. These include having a directive to turn off processing of SSI directives (so the rest of the file can be read quickly without searching for more <!-- sequences) and generating a valid "last-modified" date to allow clients to cache the pages.

The Apache core is also being updated. The number of network writes needed to send a document is being reduced (this applies when the document is being send in "chunks", a new HTTP/1.1 format). A number of other network-related speed-ups have already been applied since 1.1.1, and the use of HTTP/1.1 persistent connections (keep-alives) will also speed up network response.

Finally, other areas of the Apache core code have been modified. For example, there are some things which are fixed after the configuration file is read, so can be evaluated once in advance rather than for every request.

httpd_monitor Updated

Some systems still use a file to store the "scoreboard". This is where Apache records details of which child processes are running and what they are doing. On most systems this is stored in memory, and can be accessed by compiling in the optional status module. But where a file in used, the external program httpd_monitor (in the support directory) can be used to get roughly the same information. This program has been updated to know about the current scoreboard format.

Easier Compilation of Support Programs

Various support programs are available in the support directory. This includes the htpasswd program to create and modify http usernames and passwords. In previous releases, the Makefile in this directory has had to be edited by hand to select the correct compiler and other OS-specific options. In Apache 1.2, a Configure script is used to set these things automatically for the main server. This has been extended so it also configures the support directory as well. So after running Configure, the support programs can be made by going into the support directory and typing make.

Byteranges Workaround for Netscape

When Netscape Navigator requests an Adobe PDF file from a server, it uses HTTP/1.1 "byteranges" to get parts of the document in a particular order. However it does not recognise the response that Apache returns, because it is looking for a non-standard content type on the response. According to HTTP/1.1, a byterange response should be marked as type "multipart/byteranges". However Netscape Navigator only accepts "multipart/x-byteranges" (which, incidentally, is what Netscape servers send). Apache cannot be altered to send this since it is non-standard and would break clients which conform to HTTP/1.1. Fortunately, Netscape Navigator also sends an extra non-standard header. Apache will be updated to look for this header, and if present, return byteranges in a format the Netscape clients can understand.


Gathering Visitor Information: Customising Your Logfiles

Every time a browser hits you site it leaves a trail in your access log. This file is enough to tell you how many hits you received and gives you some basic information about the browser, such as their hostname. But there is a lot more information readily available that you could be gathering. Want to know which browser is most common on your site, or what languages your readers can understand? In Apache 1.2 logging information like this is easy.

Logging in Apache 1.1.1

If you've been using Apache 1.1.1 for a while, you've probably come across the customisable log file module. This is a replacement for the standard "common" log file module, and lets you select what to log. You can use this to log addition information, such as browser types. But because Apache 1.1.1 can only have one main log file, you have to store this information in your normal access log.

A better alternative is to put this additional information in a separate log file. This requires a new module for each additional log file. Apache 1.1.1 comes with two such modules: one to log the browser type and one to log the 'referrer' (the page the user came from before requesting the current page). While these modules are useful, they are limited. Changing the format of their log files, or logging other information, requires C programming and re-compiling your Apache.

Apache 1.2 provides a much neater solution: you can have any number of log files, each with it's own customised format. We'll explain first how to customise the format of your existing log file, then show how to create multiple log files. Finally we'll explain how logging works when you have virtual hosts, where you can chose whether to log a virtual host into the main log files or have separate log files for each host.

Customising the Log File Format

The traditional format for web log files looks like this:

jupiter.ukweb.com - - [03/Feb/1997:00:06:59 +0000] "GET / HTTP/1.0" 200 4571
jupiter.ukweb.com - - [03/Feb/1997:00:07:00 +0000] "GET /img/awlogo.gif HTTP/1.0" 200 12706

(There are two lines here, both starting with "jupiter.ukweb.com". If you see more than two, the lines have been wrapped on the screen).

This format is called the common log format and is standard across most web servers (although it is not very well documented). There are various tools to analyse data in this format, and it is not too difficult to write custom tools (in, say, perl) to extract the data. But the lack of a common field delimiter makes such tools more complex than necessary and prevents the use of simple Unix programs such as cut.

In Apache 1.2 (and Apache 1.1.1 if you are using the config log module) you can customise this format. There are probably two common reasons for doing this: firstly, to make the format simpler by using a common delimiter character, and secondly to log addition information such as the browser type at the end of each line (placing it at the end means the file can still be analysed by standard log analysis programs).

You customise the format by telling Apache a format to use. Special character sequences are used to represent specific information. For example, the sequence %h will be replaced with the name of the remote host. The common log format is defined like this:

  %h %l %u %t "%r" %>s %b

Additional sequences here are %l (the remote username, if using identd), %u (the HTTP authenticated username, if any), %t (the time in common-log format), %r (the request), %s (the returned status) and %b (the number of bytes in the document served).

Say, for example, you would prefer a file format with a common delimiter character between each field, so that you could use cut or write very simple perl scripts to extract the data. Using the common log format above as a guide, you could use

  %h|%l|%u|%t|%r|%>s|%b

Here the | character is being used as a delimiter. Note that this can cause problems if this occurs within a field (which is possible, if unlikely in the %r request field).

To set this format for your log file, you use the LogFormat directive. For example

  LogFormat "%h|%l|%u|%t|%r|%>s|%b"

Logging Browser and User Information

The % sequences introduced so far let you log various aspects of the request. There are some more sequences (covered below) that log additional aspects of the request. However one of the most important features of the custom log format is being able to log any of the request headers supplied by the browser. This lets you log things like the users language preferences, browsers type and the page they just came from.

Logging a request header is doing using the %{}i sequence. You put the name of the request header between the braces. For example, to log the browser type, you would use

  %{user-agent}i

This information is typically added to the end of the common log format in Apache 1.1.1 (in Apache 1.2, you can put it in a separate log file, which is much more convenient. This is explained later). To add the user-agent information to the end of the common log format, use

  LogFormat "%h %l %u %t \"%r\" %>s %b %{user-agent}i"

If the browser does not send a user-agent, the text "-" will be logged as the user-agent. Otherwise you will get the browser name, such as "Mozilla/3.0Gold (Win95; I)" or "Mozilla/2.0 (compatible; MSIE 3.01; Windows 95)" (the former is Netscape Gold version 3, the latter Microsoft Internet Explorer version 3, pretending to be Netscape 2).

In addition to %{...}i, there is a corresponding sequence %{...}o to log any of the response headers (in these sequences, the i means incoming and the o outgoing headers).

Multiple Logs in Apache 1.2

Adding extra fields onto the end of the common log file format is inconvenient. Luckily, Apache 1.2 offers a completely customisable log file interface: you can create any number of logs files each in a different format. It is now almost trivial to add a log file for (say) user-agents or requested languages, without needing to compile in a new module or modify the Apache source code. You can even log all the common log file information into both common log format (for existing analysers) and in a delimited format at the same time!

The interface to all this is via a single, simple directive: CustomLog. This directive takes both a file name to log to, and a custom format. For example, to log user-agents to a file called agents in the logs directory, you would use:

  CustomLog logs/agent "%{user-agent}i"

Other useful log files can also be created. This next two directives create a referrer log and a log of language preferences of your clients:

  CustomLog   logs/referer  "%{referer}i -> %U"
  CustomLog   logs/language "%{accept-language}i"

Advanced Configuration Options

You can tell the format to only log particular fields if the response status is (or is not) a particular value. For example, to only log the language preference for 200 or 304 statuses, use %200,304{accept-language}i. You can put a exclamation mark (!) straight after the % to reverse the condition (i.e. to only log if the status was not 200 or 304).

The time logged by %t is in common log file format. If you want to use another format, use %{format}t, where format is a date and time format as used by strftime (see man strftime for more information).

In some cases, the request will be handled by an internal redirect (this is common for things like requests satisfied by a DirectoryIndex file). In these cases, the configuration options can apply to either the original response, or the one actually delivered. The characters < and > after the % determine whether to log the original value, or the redirected value. For example, in %s you always want the value of the status actually returned, so %>s is used in the common log file definition. Each % sequence knows whether it should use the original response or the real response - for example, %r (the request line) uses the original response.

Logs and Virtual Hosts

The logging directives, TransferLog, LogFormat and CustomLog can be used inside virtual hosts. The way they interact with the logs setup outside the virtual hosts is like this:

  • If there are no TransferLog or CustomLog directives inside the virtual host, log requests for this host to the logs defined in the main server.
  • Otherwise log requests to the log files defined in this virtual host and do not use any of the log files defined in the main server.

  • If Logformat is used in a virtual host, the format it defines is used for all TransferLog files defined inside that virtual host
  • Otherwise the log format defined outside the virtual host is used by the TransferLogs defined inside the host, defaulting to the common log format if no LogFormat is defined in the main server.

Configurable Format Reference

Here are all the % sequences allowed in the configurable log format in Apache 1.2.

%b bytes sent, excluding HTTP headers
%f filename
%h remote host
%{Header}i The contents of Header: header line(s) in the request sent from the client
%l remote username (from identd, if supplied)
%{Note}n The contents of note "Note" from another module
%{Header}o The contents of Header: header line(s) in the reply
%p the port the request was served to
%P the process ID of the child that serviced the request
%r first line of request
%s response status. For requests that got internally redirected, this is status of the original request: use %>s for the returned status
%t time, in common log format time format
%{format}t The time, in the form given by format, which should be in strftime format
%T the time taken to serve the request, in seconds
%u remote user (from auth; may be bogus if return status (%s) is 401)
%U the URL path requested
%v the name of the server (i.e. the virtual host)

Comments or criticisms? Please email us at editors@apacheweek.com