Apache Week
   

Copyright 1996-2005
Red Hat, Inc.

First published: 7th February 1997

Gathering Visitor Information: Customising Your Logfiles

Every time a browser hits your site it leaves a trail in your access log. This file is enough to tell you how many hits you received and gives you some basic information about the browser, such as their hostname. But there is a lot more information readily available that you could be gathering. Want to know which browser is most common on your site, or what languages your readers can understand? In Apache 1.2 logging information like this is easy.

First published in Apache Week issue 51 (7th February 1997).

Logging in Apache

Apache uses the TransferLog command set create a single log file for storing details of every request. However Apache's logging capabilities are far more advanced: it can write the log file in any format, it can write multiple log files (each with a different format), and it can send log messages to an external process via a "pipe".

This feature will explain first how to customise the format of your existing log file, then show how to create multiple log files. Finally it will cover how logging works when you have virtual hosts, where you can chose whether to log a virtual host into the main log files or have separate log files for each host.

Customising the Log File Format

The traditional format for web log files looks like this:

jupiter.eu.c2.net - - [03/Feb/1997:00:06:59 +0000] "GET / HTTP/1.0" 200 4571
jupiter.eu.c2.net - - [03/Feb/1997:00:07:00 +0000] "GET /img/awlogo.gif HTTP/1.0" 200 12706

(There are two lines here, both starting with "jupiter.eu.c2.net". If you see more than two, the lines have been wrapped on the screen).

This format is called the common log format and is standard across most web servers (although it is not very well documented). There are various tools to analyse data in this format, and it is not too difficult to write custom tools (in, say, perl) to extract the data. But the lack of a common field delimiter makes such tools more complex than necessary and prevents the use of simple Unix programs such as cut.

You can customise this format. There are probably two common reasons for doing this: firstly, to make the format simpler by using a common delimiter character, and secondly to log addition information such as the browser type at the end of each line (placing it at the end means the file can still be analysed by standard log analysis programs).

You customise the format by telling Apache a format to use. Special character sequences are used to represent specific information. For example, the sequence %h will be replaced with the name of the remote host. The common log format is defined like this:

  %h %l %u %t "%r" %>s %b

Additional sequences here are %l (the remote username, if using identd), %u (the HTTP authenticated username, if any), %t (the time in common-log format), %r (the request), %s (the returned status) and %b (the number of bytes in the document served).

Say, for example, you would prefer a file format with a common delimiter character between each field, so that you could use cut or write very simple perl scripts to extract the data. Using the common log format above as a guide, you could use

  %h|%l|%u|%t|%r|%>s|%b

Here the | character is being used as a delimiter. Note that this can cause problems if this occurs within a field (which is possible in the %r request field).

To set this format for your log file, you use the LogFormat directive. For example

  LogFormat "%h|%l|%u|%t|%r|%>s|%b"

Logging Browser and User Information

The % sequences introduced so far let you log various aspects of the request. There are some more sequences (covered below) that log additional aspects of the request. However one of the most important features of the custom log format is being able to log any of the request headers supplied by the browser. This lets you log things like the users language preferences, browsers type and the page they just came from.

Logging a request header is doing using the %{}i sequence. You put the name of the request header between the braces. For example, to log the browser type, you would use

  %{user-agent}i

This information is typically added to the end of the common log format in Apache 1.1.1 (in Apache 1.2, you can put it in a separate log file, which is much more convenient. This is explained later). To add the user-agent information to the end of the common log format, use

  LogFormat "%h %l %u %t \"%r\" %>s %b %{user-agent}i"

If the browser does not send a user-agent, the text "-" will be logged as the user-agent. Otherwise you will get the browser name, such as "Mozilla/3.0Gold (Win95; I)" or "Mozilla/2.0 (compatible; MSIE 3.01; Windows 95)" (the former is Netscape Gold version 3, the latter Microsoft Internet Explorer version 3, pretending to be Netscape 2).

In addition to %{...}i, there is a corresponding sequence %{...}o to log any of the response headers (in these sequences, the i means incoming and the o outgoing headers).

Multiple Logs

Adding extra fields onto the end of the common log file format can be inconvenient, especially if you already have software which processes the log files in their current format. Luckily, Apache offers a completely customisable log file interface: you can create any number of logs files each in a different format. It is now almost trivial to add a log file for (say) user-agents or requested languages, without needing to compile in a new module or modify the Apache source code. You can even log all the common log file information into both common log format (for existing analysers) and in a delimited format at the same time!

The interface to all this is via a single, simple directive: CustomLog. This directive takes both a file name to log to, and a custom format. For example, to log user-agents to a file called agents in the logs directory, you would use:

  CustomLog logs/agent "%{user-agent}i"

Other useful log files can also be created. This next two directives create a referrer log and a log of language preferences of your clients:

  CustomLog   logs/referer  "%{referer}i -> %U"
  CustomLog   logs/language "%{accept-language}i"

Advanced Configuration Options

You can tell the format to only log particular fields if the response status is (or is not) a particular value. For example, to only log the language preference for 200 or 304 statuses, use %200,304{accept-language}i. You can put a exclamation mark (!) straight after the % to reverse the condition (i.e. to only log if the status was not 200 or 304).

The time logged by %t is in common log file format. If you want to use another format, use %{format}t, where format is a date and time format as used by strftime (see man strftime for more information).

In some cases, the request will be handled by an internal redirect (this is common for things like requests satisfied by a DirectoryIndex file). In these cases, the configuration options can apply to either the original response, or the one actually delivered. The characters < and > after the % determine whether to log the original value, or the redirected value. For example, in %s you always want the value of the status actually returned, so %>s is used in the common log file definition. Each % sequence knows whether it should use the original response or the real response - for example, %r (the request line) uses the original response.

Logs and Virtual Hosts

The logging directives, TransferLog, LogFormat and CustomLog can be used inside virtual hosts. The way they interact with the logs set up outside the virtual hosts is like this:

  • If there are no TransferLog or CustomLog directives inside the virtual host, log requests for this host to the logs defined in the main server.
  • Otherwise log requests to the log files defined in this virtual host and do not use any of the log files defined in the main server.

  • If Logformat is used in a virtual host, the format it defines is used for all TransferLog files defined inside that virtual host
  • Otherwise the log format defined outside the virtual host is used by the TransferLogs defined inside the host, defaulting to the common log format if no LogFormat is defined in the main server.

Configurable Format Reference

Here are all the % sequences allowed in the configurable log format in Apache.

%b bytes sent, excluding HTTP headers
%f filename
%h remote host
%{Header}i The contents of Header: header line(s) in the request sent from the client
%l remote username (from identd, if supplied)
%{Note}n The contents of note "Note" from another module
%{Header}o The contents of Header: header line(s) in the reply
%p the port the request was served to
%P the process ID of the child that serviced the request
%r first line of request
%s response status. For requests that got internally redirected, this is status of the original request: use %>s for the returned status
%t time, in common log format time format
%{format}t The time, in the form given by format, which should be in strftime format
%T the time taken to serve the request, in seconds
%u remote user (from auth; may be bogus if return status (%s) is 401)
%U the URL path requested
%v the name of the server (i.e. the virtual host)

Comments or criticisms? Please email us at editors@apacheweek.com