Apache Week
   
   Issue 55, 7th March 1997:  

Copyright 1996-2005
Red Hat, Inc.

In this issue


Apache Status

Release: 1.1.3 (Released 14th January 1997)
Beta: 1.2b7 (Released 22nd February 1997)

Bugs reported in 1.2b7:
  • Trailing slash after a reference to a type-map (.var) URL causes Apache child to enter a loop and consume resources
  • suExec bug allows CGI program to write to the suExec log file
  • Possible memory corruption of the local host name is longer than 128 characters
  • The default virtual host (as designated with fake address 255.255.255.255) does not work on all systems

Bugs fixed in next release:

  • OS specific updates for FreeBSD 2.2, MachTen
  • Proxy module could close files too early
  • File descriptor leak if client disconnects immediately after connection is established
  • Server could use wrong virtual host when using non-ip vhosts

Patches to fix some Apache 1.2b6 bugs are available in the 1.2b6 patch directory


Apache is currently in a 'beta release' cycle. This is where it is made available prior to full release for testing by anyone interested. Normally during the beta cycle no new major features will be added. The full release of Apache 1.2 is expected this month.


Apache At 350,000 Sites

The March server survey shows that over 356,000 sites now use Apache, almost 43% of the servers surveyed. The percentage change in share for Apache (+1.67%) was higher than both Netscape (-0.16%) and Micrsoft (+0.72%).


Apache Performance In The News

PC Magazine in the UK recently published performance comparisons of several web servers. Apache was reported as the slowest server on test, handling 100 requests per second, compared to Microsoft's IIS which could handle 900 requests per second. While they no doubt did get these figures from their tests, there are a number of reasons why these figures may not reflect the true capabilities of the servers. The article is not available online.

Firstly, they tested Apache version 1.4.1. This is not an Apache version number, and the URL they give is for NCSA httpd, so they appear to have tested NCSA http instead. Apache was originally based on NCSA 1.3, but is now very significantly different, with many changes designed to increase performance over NCSA. Figures from a test of NCSA httpd cannot be applied to Apache. Even assuming that they did test Apache, their figures may not be valid, for the following reasons.

Secondly, web server software is complex and highly configurable. It is probably possible to get a wide variety of performance figures by changing the configuration. In particular, Apache will require configuration is get best performance. If this configuration was not done poorer results will be obtained. They gave no details of if or how they configured to servers.

Thirdly, they turned on the local caching option on IIS and Netscape Enterprise, but not on Apache. For a valid comparison they should have used either Apache's proxy module, or a local cache such as Squid. (Of course, since they were not even testing Apache they could not use the module).

Fourthly they used very different hardware for the Windows and Unix based tests: for Windows, they used a dual processor pentium pro system while they run the Unix server on an SGI O2. A better test would have been to use the same hardware, with a PC version of Unix (Linux, BSD or similar).

Fifthly they used a local 100Mhz ethernet for the testing. Apache contains a number of features designed to make it work more efficiently over real-world Internet connections where the setup and transmission times are not negligible.

Lastly the report did not clearly identify what was being measured. There are several ways to measure server performance (such as hits per second, concurrent requests processed or overall response time per request), all of which are valid in some situations. They appear to have tested hits per second, which on a very fast local network would tend to reduce the number of concurrent requests the servers have to process. In a real-life situation servers would have to cope with requests that take longer to transmit, so they would have more concurrent requests.

By comparison, a Benchmark Report by BSDI compared MS IIS and Apache in August 1996 and came out with very different results. While the versions of the software they used are now out a date, this report explains in detail how the servers were configured and what tests were performed. They also used the same hardware for both tests.


Feature: Dynamic Page Languages

The easiest way to serve up pages is to store them in HTML format files. This is simple and efficient. However there are many cases where you might want to generate a page 'on-the-fly': to add information which changes on each request, or to get information fromo a database. There are many different ways that you can add this sort of "dynamic" page to your site. In this feature we look at the range of options, from simple in-line HTML to full programming languages and CGI.

When choosing how to generate dynamic pages there are server things to consider:

  • Performance: dynamic pages require more work on the server, so are less efficient than static files, but some types of dynamic pages are more resource efficient than others.
  • Complexity: dynamic features can be generated from relatively simple code build into HTML pages (called "embedded"), through to self contained programs written in C or perl, using the CGI interface.
  • Security: some methods of generating dynamic pages allow you to use a programming or scripting language on your server. There is a risk of letting users access things on your system that they should not do if the pages are poorly written.

Traditionally there were three ways of getting dynamic pages on your site: use "server side includes" (SSI) inside HTML pages, use a scripting language such as Perl or PHP, or use a compiled programming language such as C or Pascal. Both scripts and compiled programs were accessed using "CGI". But the distinctions are becoming more blurred. SSI as implemented in Apache 1.2 now has variables and conditional execution, making it more like a scripting language, while the PHP scripting language can be embedded into HTML pages. There is even a module to embed perl commands into HTML pages.

Also, many scripting languages can be built into Apache as Apache modules, rather than using CGI. This makes executing the scripts much more efficient, since an interpreter does not need to be started for very request.

Complexity

There are two ways to get the server to run your programs: either embed a script into an HTML document, or create a standalone program which makes use of the CGI interface. Embedded scripts are easier to write but restrict you to the languages available for embedding, while CGI can be used with any language.

The traditional embedded language is "Server-Side Includes" (SSI) but other scripting languages are available which can be embeded. Embedded commands are executed by the server before it serves the page to the client (so serving HTML pages containing embedded commands is slower than serving straight HTML pages). Embedded pages can be processed either by an Apache module or a CGI program. Using a module will be much faster. Languages available for embedded use include SSI, PHP, Perl and NeoScript (of these, SSI is built into Apache by default, while the others require a new module to be compiled in).

The alternative to embedding the commands into HTML is to write self-contained programs. These usually use the CGI, or Common Gateway Interface, to work with the server. The CGI specification says how servers should talk to the script or program and how the script or program formats its reply for use by the server. CGI is not a language itself. If you know the CGI protocol you can write programs for use with a web server in any language.

Performance

If you want better performance from your pages (by performance we mean low use of resources, resulting in more pages served more quickly), you should use either a pre-compiled language (such as C) and CGI, or a scripting language which is available as an Apache module. In the case of the perl and python modules, preload scripts or data that will be used often.

Of course the best performance can be obtained by using static pages instead of dynamic ones. You might consider pre-generating HTML files, rather than serving up dynamic pages if possible. For example, if your readers access pages from a database, it might be faster to export those pages into HTML every so often, rather than lookup the records in the database for every request.

Alternatively (or in addition) consider using a local cache in front of your Apache server. The client would connect to the cache first, and if that page has already recently been requested, the cache would return it without calling the server. This sort of local cache is also called a "server accelerator". Your dynamic pages will have to be setup to allow them to be cached though (SSI pages, for example, are not cacheable).

Security

Security is a very important considerable when thinking about dynamic pages. All CGI programs, both scripted and compiled, are potentially insecure. You have to be very careful when writing CGI programs, for instance, to ensure that Internet users cannot execute programs on your server or read files they should not have access to.

Another security issue which might be important is related to other local users. For example, you might want to let your customers or colleagues use a dynamic language. But if you let them write CGI programs they could write a program which accesses other people's files (since by default all CGI programs run as the same user). More limited scripted languages (such as SSI) might be safer in this situation.

Dynamic Page Languages

Finally, here is a reference list of ways of including dynamic pages on your site.

Language Embedded? Apache Module? Description
SSI Yes Yes Traditional "Server Side Includes" allow simple dynamic pages. Apache 1.2 extends SSI to include variables and conditional code. Already part of Apache. Because of the restricted range of commands this can be more secure than other languages, and Apache has the ability to turn off some less secure features.
PHP Yes Yes A more comprehensive embedded language than SSI, with built-in support for various databases (such as mSQL, mySQL, DBM), page counters.
NeoScript Yes Yes An embedded scripting language based on Tcl.
Meta-HTML Yes No An extended version of SSI.
Python No Yes Python is an interpreted object-orientated language. This module builds the Python interpreter into Apache for better performance than normal CGI.
embedded Perl (ePerl) Yes No Perl is a powerful general purpose interpreted (scripting) language. This module lets you embed arbitrary Perl commands into your HTML.
Perl Module No Yes Perl is an advanced interpreted language. This very powerful module integrates Perl into Apache, letting you write Apache modules in perl. This gives you much more access to and control over the server than CGI programs in Perl (which this module also supports). The ability to write modules in perl makes it possible to extend the server's functionality relatively easily, without the complexity of writing a module in C.
Compiled languages (C, Pascal, Fortran, etc) No No Facilities available depend on language. Usually more efficient than scripted or embedded languages. Has to be written to use CGI protocol.
Scripting languages (Perl, Python, shell, etc) No* For Some Languages Facilities available depend on language. Unless an Apache module is available, has to be written to use CGI protocol. When using CGI is less efficient that compiled languages or scripting languages using an Apache module. (Note: * Perl can be embedded if the eperl module is used).

Summary

It is impossible to recommend the "best" dynamic page language since what is best will depend on your needs. However some general conclusions can be drawn.

If you do not already know a scripting or programming language, use one of the embedded languages. SSI is probably the simplest, but PHP has some useful extra features.

If you want a language than is quick to develop in and efficient, use an embedded language such as PHP or embedded perl, or use perl with the perl module. If you prefer other scripting languages, use one with an Apache module (e.g. python). If you already use perl CGI programs, consider moving over to using the perl module, which will give you much better performance and more control over the server.

If you want a "full" programming language for arbitary programs, either use any compiled language (e.g. C) or use perl with the perl module. If you've been put off Perl because of concerns about performance, think again. The module makes it very efficient, and the ease of development and large range of add-on perl modules (packages) make developing applications more convenient.

The final way to make a top-performance dynamic page is to write an Apache module. This is complex and requires care to ensure that you do not "leak" resources or affect the rest of the server, but will give the best performance. Modules have to be written in C (although it might be possible to link in other languages). An alternative to writing modules in C is to use the perl module, which lets you develop Apache modules in perl.


Comments or criticisms? Please email us at editors@apacheweek.com