Apache Week
   
   Issue 173, 24th September 1999:  

Copyright ©2020 Red Hat, Inc

In this issue


Apache 2.0: The Next Generation

It has been about a year since Apache 1.3 was released, and the core Apache members are now working on version 2.0. The new version will be significantly different to the current one, which raises issues such as "Why update Apache at all?" and "What does this update mean for Apache administrators?"

We hope to answer those and many other questions in this article and, as the release of 2.0 approaches, provide more up to date information.

It is important to note that presently there is only development code available for 2.0 and that downloading it now is not advised for anybody other than those who are already familiar with the Apache internals. The code in its current state is not guaranteed to compile from day to day or to work on many platforms.

Apache Week will announce any upcoming alpha or beta versions and the details of the 2.0 release as soon as they are ready.

Why Go Beyond 1.3?

Apache 1.3 is a great web server which serves pages for the vast majority of the web, but there are things it can't do. Firstly, it isn't particularly scalable on some platforms. AIX processes, for example, are very heavy-weight and a small AIX box serving 500 concurrent connections can become so heavily loaded that it can be impossible to telnet to it. In situations like this, using processes is not the right solution: we need a threaded web server.

Apache is renouned for being portable as it works on most POSIX platforms, all versions of Windows, and a couple of mainframes. However, like most good things, portability comes with a price which in this case is ease of maintenance. Apache is reaching the point where porting to additional platforms is becoming more difficult. In order to give Apache the flexibility it needs to survive in the future, this problem must be resolved by making Apache easy to port to new platforms. In addition, Apache will be able to use any specialised APIs, where they are available, to give better performance.


Multiple-Processing Modules (MPM)

The original reason for creating Apache 2.0 was scalability, and the first solution was a hybrid web server; one that has both processes and threads. This solution provides the reliability that comes with not having everything in one process, combined with the scalability that threads provide. The problem with this is that there is no perfect way to map requests to either a thread or a process.

On platforms such as like Linux, it is best to have multiple processes each with multiple threads serving the requests so that if a single thread dies, the rest of the server will continue to serve more requests. Other platforms such as Windows don't handle multiple processes well, so one process with multiple threads is required. Older platforms which do not have threads also had to be taken into account. For these platforms, it is necessary to continue with the 1.3 method of pre-forking processes to handle requests.

There are multiple ways to deal with the mapping issue, but the cleanest is to enhance the module features of Apache. Apache 2.0 sees the introduction of 'Multiple-Processing Modules' (MPMs) - modules which determine how requests are mapped to threads or processes. The majority of users will never write an MPM or even know they exist. Each server uses a single MPM, and the correct one for a given platform is determined at compile time.

What MPMs are available?

There are currently five options available for MPMs. Their names will likely change before 2.0 ships, but their behaviours are basically set. All of the MPMs, except possibly the OS/2 MPM, retain the parent/child relationships from Apache 1.3. This means that the parent process will monitor the children and make sure that an adequate number are running.

PREFORK
This MPM mimics the old 1.3 behaviour by forking the desired number of servers at startup and then mapping each request to a process. When all of the processes are busy serving pages, more processes will be forked. This MPM should be used for older platforms, platforms without threads, or as the initial MPM for a new platform.
PMT_PTHREAD
This MPM is based on the PREFORK MPM and begins by forking the desired number of child processes, each of which starts the specified number of threads. When a request comes in, a thread will accept the request and serve the response. If most of the threads in the entire server are busy serving requests, a new child process will be forked. This MPM should be used on platforms that have threads, but which have a memory leak in their implementation. This may also be the proper MPM for platforms with user-land threads, although there has not been enough testing at this point to prove this hypothesis.
DEXTER
This MPM is the next step in the evolution of the hybrid concept. The server starts by forking a static number of processes which will not change during the life of the server. Each process will then create the specified number of threads. When a request comes in a thread will accept and answer the request. At the point where a child process decides that too many of its threads are serving requests, more threads will be created. This MPM should be used on most modern platforms capable of supporting threads. It should create the lightest load on the CPU while serving the most requests possible.
WINNT
This MPM is designed for use on Windows NT. Before Apache 2.0 is released, it will also be made to work on Windows 95 and 98 although, just like Apache 1.3, it is unlikely to be as stable as on NT. This MPM creates one child process, which then creates a specified number of threads. When a request comes in it is mapped to a thread that will serve the request.
OS/2
This MPM is designed for use on OS/2. It is purely threaded, and removes the concept of a parent process altogether. When a request comes in, a thread will serve it properly, unless all of the threads are busy, in which case more threads will be created.

Multi-processing modules are designed to work behind the scenes and do not interfere with requests in any way. In fact, its only function is to map the request to a thread or process. One advantage of this technique is that each MPM can define its own directives. This means that if you are using a PREFORK MPM, you won't be asked how many threads you want per server, or if you are using the WINNT MPM, you won't need to specify the number of processes.


Will Apache 1.3 Modules work?

Modules written for 1.3 will not work with 2.0 without modification. There are many changes which will be documented by the time 2.0 is released.

In Apache 1.3, each module uses a table of callback routines and data structures. Instead of using this table to specify which functions to use when processing a request, 2.0 modules will have a new function to register any callbacks needed.

In the past, new features have been added to subsequent releases of Apache which required the callback table to be expanded causing existing modules to break. In 2.0, each module is able to define how many callbacks it wants to use instead of using a statically defined table with a set number of callbacks. If the Apache Group decides to add callbacks in the future, the changes are less likely to affect existing modules.

Many things have been abstracted in Apache 2.0 and there are many new functions available. This means it will no longer be possible to access most of the internals of Apache data structures directly. For example, if a module needs access to the connection in order to send data to the client, it will have to use the provided functions rather than access the socket directly.


The Apache Portable Run-Time (APR)

APR was originally designed as a way to combine code across platforms. There are some sections of code that should be different for different platforms as well as sections of code that can safely be made common across all platforms.

Apache on Windows currently uses POSIX functions and types that are non-native and non-optimised for communicating across a network. By replacing these functions and types with the Windows native equivalent there has been a significant performance improvement. For example, spawning CGI processes is very confusing in Apache 1.3 because Unix, Windows, and OS/2 all handle spawning in different ways. By using APR, the logic can be combined for spawning CGI processes, decreasing the number of platform-specific bugs that are introduced later.

APR will make porting Apache to additional platforms easier. With a fully implemented APR layer any platform will be able to run Apache. APR is small and well defined and once it is fully integrated into Apache, will change very little in the future. Apache has never been well defined for porting purposes as there was too much code to make porting a simple task. In addition, the code was originally designed for use on Unix, which made porting to non-POSIX platforms very difficult. With APR, all a developer needs to do is implement the APR layer. APR was designed with Windows, Unix, OS/2, and BeOS in mind and is more flexible as a result.

APR acts as the abstraction layer in Apache 2.0. To allow the use of native types for the best performance, APR has unified functions such as sockets into a single type which Apache will then use independently of the platform. The underlying type is invisible to the Apache developer, who is free to write code without worrying about how it will work on multiple platforms.


When, When, When?

Apache 2.0 is a major re-working of Apache that will hopefully result in a web server that can continue to grow and serve the web. As has been traditional with previous Apache releases, the 2.0 upgrade will be made available when it is ready and stable. There is no promised release date although it is hoped that a beta version will be available either late in 1999 or early in 2000.

This article covers some of the major changes in Apache 2.0, such as MPMs, module callbacks, and the abstraction layer. Future editions of Apache Week will report on the progress of Apache 2.0 and highlight any major developments.