Apache Week Reviews

Apache Week Reviews The essential resource for anyone running an Apache server, or anyone responsible for running Apache-based services. en-gb http://www.apacheweek.com/ Copyright n th, Red Hat Europe editors@apacheweek.com (Mark Cox) webadmin@apacheweek.com (Mark Cox) Apache in the News 2000 All the important news stories about Apache from the year 2000 Apache in the News 2000 Since becoming the #1 Web server, Apache has featured in a number of reviews and articles. Here are the ones for the year 2000 If you have seen a story about Apache on the Web or in the press let us know so that we can include it here. InfoWorld.com, "Brian Behlendorf: Apache co-founder talks about open source" "the fact that we don't have a multibillion-dollar marketing organization means that, sure, Microsoft is going to be able to claim things or do things that we can't, but that hasn't hurt us so far." InfoWorld.com, "Apache founders hit Vegas in search of cash" "Behlendorf said the ASF may need to look for a little cash to keep up with the demands that developing the leading Web server requires" Apache Week, "Report from ApacheCon Europe 2000" " As in all conferences, there were various technical glitches when presentation laptops froze and batteries ran out, some inexperienced speakers, and not enough seats but these were all minor issues considering the excellent detailed technical knowledge that was imparted by the speakers." Apache Today, "Apache Guide: ApacheCon Europe" "Last week, I was in London for ApacheCon 2000. In a break from my usual subjects, this will be a brief overview of the conference, touching on the highlights and some of the things that were talked about there." NetworkWorldFusion, "Tips on pitching Apache to the big wigs" "Apache cares about trademarks and it's helped us maintain a pretty good product," Behlendorf said. NetworkWorldFusion, "IBM pitches its open source side" "IBM Tuesday set out its open source agenda at ApacheCon Europe 2000. The message seemed to boil down to the notion that in a networked world, open source is good and IBM not only knows that but embraces the open-source programming community." NetworkWorldFusion, "Sun says Java moving towards full open source" "Sun is moving toward making its Java technology fully open source, a company executive said Tuesday, addressing an audience of programmers here at the ApacheCon Europe 2000." Network Computing, "The 10 Most Important Products of the Decade" "...Apache Web Server earns its place for changing the rules on the server side. The future of Apache hinges on its ability to function as an e-commerce server. If the past five years are any indication, Apache Web Server will deliver the whole shopping cart--and probably sooner than its competitors do." InfoWorld, "E-business innovators" "By general acclaim, it has done more to stimulate Web development -- and therefore e-commerce -- than any other Web-based server." Edd Dumbill's Weblog (O'Reilly) , "Dynamics of the Apache XML Project" Edd Dumbill, editor of XML.com, writes about the "Dynamics of the Apache Group" in his Weblog. The focus of the article is on news that the Apache XML project could create another parser and looks at the the internal dynamics of the group members and some of the conflicts. "IBM and Lotus in particular are responsible for the XML parser, Xerces, and the XSLT processor, Xalan. Sun also play a significant part in Apache's Java projects. Though nobody has suggested that Apache is in any way in the sway of these organizations as a consequence of their donations, it seems inevitable that the corporate and hacker cultures may well clash. This weekend seems a good example of this." Qube Corner, "AOLserver faster than Apache?" Qube Quorner reveal that Apache 1.3.12 comes second to AOLserver 3.0 in terms of requests/second and transfer speeds. Benchmarks do not give a true picture of the speed of a web server, since they provide an environment unlike the real use of the software. Commercial software is often tuned to perform well in benchmarks, so a good performance simply indicates that the software works well for that benchmark, not that it has good real-world performance. News Alert, "US Toyota and Lexus dealers adopt Apache technology" Over the last week, there have been a large number of stories about Internet Appliances for both home and business use. An increasing number of these units are now being run on open source platforms such as Linux. Dell have announced that Toyota in the US are to be equipped with Dell PowerApp.web servers to provide customised content to their dealer network. C|Net News.com, "IBM donates Net communications technology " As reported by C|Net, the Apache Software Foundation has received technology from IBM which will help developers create services using an open, vendor-neutral process. IBM's Java-built Simple Object Access Protocol (SOAP) will be contributed to the open source Apache XML project. The system provides a simple method of using XML to send message and access web services across distributed networks. "We want to move at Internet speed and respond to the needs of the developer community by making it available to the open-source community," said Marie Wieck, IBM's director of e-markets infrastructure. "It's valuable to further adoption." CNet Investor, "Apache Software Foundation join Java commmittee" CNet Investor reported that Sun Microsystems have set up two executive committees to oversee their Java Community Process(SM) community-based Java technology development programmes. The first committee will oversee the Java technologies for the desktop/server space and the other will oversee the Java technologies for the consumer/embedded space. "As is evident by the depth, diversity and strength of the JCP program's Executive Committee members, the future of Java technology specifications is in capable and caring hands," said George Paolini, vice president of Java Community Development at Sun Microsystems, Inc. ZD Net, "Red Hat Leads The Way To IA-64 Itanium Linux" Red Hat Inc. this week released public alpha code of a full version of Linux for Intel's new IA-64 Itanium processor. The release of the software combined with the release of Intel's "Itanium Processor Microarchitecture Reference" gives developers access to all the information they need to start working on Itanium development. "On May 17, Red Hat Inc. released an alpha version of a complete IA-64 Linux distribution to developers. This edition, built within the Trillian Project, is the first alpha public code release of a full IA-64 Linux from kernel to drivers to such popular applications as Apache." ZDNet, "Picking The Right Web Server Is Key" ZDNet examine web server platforms in their article, "Picking the Right Server is Key". They compare Windows 2000 Advanced Server, Netware 5.1, Red Hat Linux using Apache, Solaris using iPlanet, and Solaris using Apache. "There are other compelling reasons to choose Linux/Apache. For one thing, you'll never find a back door, as with the recent IIS debacle, in open-source code. And it's getting so easy to install that the hardcore Linux gurus are grumbling about dumbing down." SecuritySpace.com, "April Web Server Survey" If you are a regular reader of Apache Week you'll know that Apache has been the top web server in all the probe-based web surveys for some time, now with over 60% market share. The April survey from E-Soft also gives some other interesting statistics for modules in use; the most popular being the PHP scripting language in use on 29% of Apache sites. "The Apache module report documents the market share of Apache, internet's most popular web server, for a variety of add-on modules. Since most add on modules modify the web server "signature" that is returned on each web page, we are able to see who's using PHP, perl, SSL mods, language converters, language mods, etc." Userland, "Scripting News / Manila" UserLand hosts an interesting open forum about commercial software, which originally started as an email discussion between Dave Winer and Brian Behlendorf. In Dave's own comments he picks out some of the discussion and his own point of view, accusing Apache of being boring. "Apache is like MS-DOS. Lots of people use it, we do too. But where's the Lotus 1-2-3? Apache is boring! Where's the revolution for writers and thinkers?" Linux Today, "VNU Net: Apache Server Commentary [Book Review]" A short review of the new book "Apache Server Commentary" is available. The book is aimed at developers and contains source code listings of the Apache server. "This is one in a series of books which sets out to give an insight into the various Open Source products currently on the market. It is aimed at those who either want to write extension modules to Apache or customise the underlying code. In fact, Apache Server Commentary appears to be little more than a reference guide for those who already understand the concept of Apache and just want help on specific modules. It certainly isn't the architectural document I was expecting." InformationWeek.com, "Open Source Moves To The Mainstream" The article discusses the secure server survey from e-soft which shows Apache with 63% market share but notes that the "battle over E-commerce territory has been a little more difficult for open source, perhaps an indication that security-minded companies prefer to use commercial products". "One of the leading open-source success stories is the Apache Web server, which for many sites is the backbone of Web applications. Apache is a flagship open-source project, continually developed by a self-selected group of coordinated volunteer programmers. It costs nothing to use. As of March, Apache is deployed on more than 7.8 million domains, or some 60% of Internet Web sites." INRIA, "Elliptic Curve Discrete Logarithms: ECC2K-108 - SOLVED!" Apache Week reported in issue 180 on the attempt to solve the Elliptic Curve Challenge from Certicom. The solution was found at the end of March, and the Apache Software Foundation will receive a donation of US$8000 from the prize. "The biggest public-key crypto crack ever has just finished! Certicom have confirmed that the solution is correct." Linux Magazine, "Brian Behlendorf on the Apache name" Linux magazine have an interview with Brian Behlendorf, one of the initial Apache group founders. In addition to talking about the founding and sucess of Apache, Brian explains that the Apache name never meant "A patchy server", instead it "just sort of connoted: 'Take no prisoners. Be kind of aggressive and kick some ass.'" "While there would still be a World Wide Web without the Apache Web server, pundits have suggested that it would belong to Microsoft. Since drawing up the plan for the Apache project in 1993, Apache Software Foundation President Brian Behlendorf has helped lead the volunteer development team that proved that you can take on Microsoft and win -- just so long as you change the rules." Linux Magazine, "A Conversation With the Man Behind the Animal Books" The article discusses the evolving open source industry and pays particular attention to Apache. "I think Apache plays an enormously important role here. Because it has dominant market share, it keeps the Internet open. I think it's more important for Apache to have dominant market share than for Linux. If Linux is dominant too, that's better, but I'd hate to see us lose Apache. That's a really important battleground." ZD Net - EWeek, "Solaris 8 weds reliability to must-have upgrades" PC Week mention Apache being bundled with Solaris in Solaris 8 weds reliability to must-have upgrades. "Apache Web server is also bundled with Solaris 8, but neither PC Week Labs nor Sun recommends its use in high-transaction environments." Slashdot, "Reflections On ApacheCon 2000" ASF member Jim Jagielski gives his personal opinion of ApacheCon 2000 in "Reflections on ApacheCon 2000". "It's been a week now since ApacheCon 2000 ended. There's been some discussion over the events, with the release of Apache 2.0a being the main topic of conversation. But AC2K was more than just the venue that 2.0a was announced. It was an important and noteworthy conference in it's own right." NetWorldFusion, "The Netware Version Of Apache" The NetWare version of Apache is examined in a Network World Fusion Newsletter. Over the past few years Novell have shipped a couple of different Web servers with NetWare, but now Apache is available for this system. "The NetWare version of Apache 1.3 is still in the "experimental" stage, and it (so far) only runs on NetWare 5 or 5.1. Nevertheless, if you support a major Web site and ... if you want to take advantage of the hundreds of Web server applications available (also for free) for Apache - it would be worth your effort to download and test the new Apache in your environment." Apache Week, "Report from ApacheCon 2000" " In total, just over 1000 people attended the conference and this included a large number of Apache Software Foundation members. At the very first session of the conference, the opening plenary, the previous record for the most Apache developers in the same place at the same time was broken." Melbourne Linux Users Group Inc, "ApacheCon 2000" The Melbourne Linux Users Group posted a number of pictures from the conference. "The ApacheCon show was very well done. The exhibit floor featured many cool companies and the keynote and PHP presentations I attended were very informative. Here are some pics of the event." Open Source IT, "The Buzz At Apache Conference: World Domination" ApacheCon 2000 is still in the news as Open Source IT reports on ApacheCon 2000 in "The Buzz at Apache Conference: World Domination". "More than 1,000 Apache developers and users gathered at ApacheCon 2000 in Orlando last week to discuss -- among other things -- the progress the Apache Web server is making towards World Domination." O'Reilly, "ApacheCon 2000: Day One, Day Two, DayThree" O'Reilly published a detailed report on each day of the conference; Wednesday, Thursday, and Friday. "The conference is being held at the Caribe Royale Resort Suites, which despite a strong conference turnout, is mainly inhabited by lots of parents and their young children, due to the proximity to Disney World." LinuxPlanet, "ApacheCon: Fuelling The Web Revolution" The article gives a brief overview of the conference and highlights one of the popular talks on open source from IBM. "ApacheCon is the yearly convention dedicated to Apache and Apache products. There are over 1,000 visitors this year, and the show creators were sitting around saying things to me like, "Wow, this is going so mainstream so fast." God, I hope so. It'd be a terrible thing for something that has captured 60 percent of the Internet Web-server market share to not be mainstream." Wired.com News, "A Patchy Start: Apache's Strong" The article examines why Apache is not as well known as other projects such as Linux and finds that the companies providing support and services based on Apache are not as visible. "Apache is the Web's most widely used and -- outside of the Nerd Zone -- its most unknown application. It has achieved dominance in a crucial market that Microsoft and Netscape have struggled mightily to conquer. Both companies have invested massive amounts of money and programming skills into server software programs -- and yet it's Apache, a freeware application, that is installed on just over half of all publicly accessible Web servers." Security levels Cox, Mark J A quick summary of security levels that Apache Week apply to Apache web server vulnerabilities Security levels Cox, Mark J A quick summary of security levels that Apache Week apply to Apache web server vulnerabilities Apache Week rates the impact of each security flaw that affects the Apache web server. We've chosen a rating scale quite similar to those used by other major vendors in order to be consistent. Basically the goal of the rating system is to answer the question "How worried should I be about this vulnerability?". Note that the rating chosen for each flaw is the worst possible case across all architectures. In the past for example we've had flaws that have a Critical impact on some BSD architectures, whilst no real impact on others. To determine the exact impact of a particular vulnerability on your own systems you will still need to read the security advisories to find out more about the flaw. We use the following descriptions to decide on the impact rating to give each vulnerability: A vulnerability rated with a Critical impact is one which could potentially be exploited by a remote attacker to get Apache to execute arbitrary code (either as the user the server is running as, or root). These are the sorts of vulnerabilities that could be exploited automatically by worms. A vulnerability rated as Important impact is one which could result in the compromise of data or availability of the server. For the Apache web server this includes issues that allow an easy remote denial of service (something that is out of proportion to the attack or with a lasting consequence), access to arbitrary files outside of the document root, or access to files that should be otherwise prevented by limits or authentication. A vulnerability is likely to be rated as Moderate if there is significant mitigation to make the issue less of an impact. This might be because the flaw does not affect likely configurations, or it is a configuration that isn't widely used, or where a remote user must be authenticated in order to exploit the issue. Flaws that allow Apache to serve directory listings instead of index files are included here, as are flaws that might crash an Apache child process in Apache 1.3 All other security flaws are classed as a Low impact. This rating is used for issues that are believed to be extremely hard to exploit, or where an exploit gives minimal consequences. Vendor patches to Apache 1.3 Cox, Mark J We take a peek inside ten popular vendor distributions of Apache 1.3 to find out what has been added Vendor patches to Apache 1.3 Cox, Mark J We take a peek inside ten popular vendor distributions of Apache 1.3 to find out what has been added We decided to take a look at what custom patches vendors add to the versions of Apache 1.3 they ship. The Apache Software Foundation would rather that vendors of Apache didn't add any third-party modifications to Apache at all - it adds to brand confusion. You might think you are getting a copy of the Apache web server but you're actually getting something that is based on the Apache web server. There are hundreds of distributions and hundreds of vendors so in order to make this manageable we started out by looking at just Linux vendors that have publicised security updates for Apache in the first few months of 2003 to the bugtraq mailing list. Where a vendor has multiple versions of products we tried to look at the most recent version of Apache 1.3 (since most vendors do not yet ship Apache 2). Our survey consisted of Conectiva, Debian, EnGarde, Gentoo, Mandrake, OpenPKG, Red Hat, SCO, SuSE, and Trustix. At the time of the survey, not all the Linux vendors were shipping Apache 1.3.27. Several shipped older versions for which they had backported security fixes. Mandrake, Debian, and Conectiva included Apache 1.3.26 with backported patches for , , and . SuSE included Apache 1.3.23 with backported security fixes for only and . SuSE also add a backported patch for mod_proxy () All the vendors shipped with EAPI, the interface that links Apache to mod_ssl, and most bundled some selection of extra modules. All the vendors shipped a custom httpd.conf file or made patches to the default file. Examining the configuration file changes was outside the scope of this survey since these are things that can be easily changed by the user. All the vendors except OpenPKG and SuSE pointed the magic mime types file at the system /etc/mime.types file, with many adding additional types using AddType directives in httpd.conf. SysV init is a standard process used by Linux distributions to control which software the init command launches or shuts off on a given runlevel. These sometime get confused with the apachectl command which provides similar functionality. All the vendors except OpenPKG included custom init scripts or patches with their Apache packages. All the vendors provided patches to help build Apache on their particular Linux distribution and to customise it to their environment. Conectiva, Gentoo, and Mandrake added a serverroot configuration option and then used that to help build Apache. Most vendors patched apxs and changed file and directory locations. Debian, Gentoo, Mandrake, Red Hat, and SuSE added dbm patches to ensure that the files created for dbm-based authentication from Perl tools like dbmmanage are in a format that Apache can understand. Conectiva, Debian, EnGarde, Gentoo, Mandrake, Red Hat, and SCO all included a patch for , a vulnerability in htpasswd and htdigest that could allow local users to overwrite arbitrary files via a symlink attack. This vulnerability is not yet fixed in Apache, as it's tricky to get right cross-platform. The vendors patching this themselves only have to worry about the Linux architecture so can add a specific fix. Altering the server version string can help users determine that they are running a vendor-modified version of Apache. It can also help the vendor track market share through surveys like those from Netcraft. Four of the distributions had patches to make sure that they added a customised string to the server version string. These distributions were quite well behaved and did not add their customised string if the ServerTokens directive is set to 'product only' or 'minimum'. Debian GNU/arch (Gentoo/Linux) (Red-Hat/Linux) (Trustix Secure Linux/Linux) Conectiva and SCO were a little more invasive, with Conectiva adding (Conectiva/Linux) to the server version string no matter what the ServerTokens directive was set to. SCO did a similar thing, with their extra string giving the version of an acceleration patch they add. Finally, Mandrake changed the base product name altogether, renaming from Apache to Apache-AdvancedExtranetServer. In Apache 1.3, a compile-time constant defines the maximum possible number of server processes, defaulting to 256. Only three vendors changed this default: Debian set it to 512 processes via a build-time define, EnGarde patch it to 1024, and SuSE set it to 2048 via a define. Debian, Mandrake, SuSE, and SCO build Apache with Large File support, so that on 32-bit systems Apache can use files larger than 2 gigabytes - this is particularly useful for log files. Enabling LFS does slightly change the Apache 1.3 binary module ABI, which can cause problems if using binary modules built against a different version of Apache. After taking account of all the patches and modifications above, we're left with only four vendors that add additional patches. SuSE added: A patch to change the ap_set_content_length API function to accept a length of type off_t instead of long, to improve the support for Large Files mentioned above. Gentoo added: A patch to make the regexp library work with Large File Support on 32-bit systems. This is a modification the affects the ABI. A patch to fix a segmentation fault when using a custom response in a module, () A patch to fix a problem when using server-parsed HTML with suexec where an <--#exec tag with a cmd attribute contains more than one word. (Debian bug 47951) A patch to allow SSL environment variables to be accessible when using mod_ssl and suExec. (similar to ) A patch to cause Apache to not run if user or group directives are found within a VirtualHost but suExec is not configured correctly. (Debian bug 21525) Debian added the same patches as Gentoo and additionally: A fix for a htdigest buffer overflow if arguments passed to it are too long. This is only a security issue if htdigest is used setuid Changes to ApacheBench to support round-robin DNS SCO added: A patch to mod_proxy needed for mod_backhand A patch to add a new API function, ap_call_execute, needed by the old mod-frontpage-VR module the "Accelerating Apache" performance patches from SGI. The "Accelerating Apache" performance patches were first submitted to the Apache Group by SGI in 1999. We reported that they were designed to improve the performance of Apache when measured specifically by the SPECweb96 benchmark. The patches were named after the ten fold increase in speed they gave over regular Apache on a dual processor SGI IRIX machine. Some of the patches were folded in to Apache in 2000, but other parts were rejected by the Apache developers. The Accelerating Apache project was dropped by SGI in February 2001. In March 2003 a vulnerability was found in the Oracle modifications to mod_dav. This was not the first security hole that has been introduced by third party modifications to Apache by vendors. However our own research based on issues listed in the CVE dictionary shows that the majority of these vulnerabilities are due to poor configuration defaults rather than patches for new functionality that went wrong: CVE Type of Issue Severity Affected Remote attacker can run arbitrary commands High Oracle Remote attacker can run arbitrary commands High SCO (briefly) Remote attacker can run arbitrary commands High IBM Remote attacker can see files in /usr/doc Low SuSE Linux Remote attacker can see files in /perl Medium Mandrake Linux Remote attacker can read and write any file in docroot High SuSE Linux Remote attacker can obtain the source to CGI scripts Medium SuSE Linux Remote attacker can read .htaccess files Medium Cobalt Remote attacker can see files in /usr/doc Low Debian Linux What we found in our survey was that no two of the ten vendors were alike; some vendors like OpenPKG made only the expected build and configuration changes, whilst others made fairly substantial changes including affecting the ABI. ABI changes mean that you can't reliably take a module precompiled for one distribution and start using it on another. Third party modifications to Apache have been known to cause bugs and security issues. This is often frustrating for the Apache Software Foundation who end up receiving all the bug reports for issues that don't even exist in the official Apache releases. This is one of the reasons why the Apache Software Foundation insists that when vendors make modifications to Apache that they change the name of their version so it is not confused with official Apache releases. One thing that impressed us was how easy it was to identify the changes that the vendors had made. In almost all cases the vendor's source package contained a pristine copy of Apache along with one or more patch files for the various changes. Working out what those changes did and where they came from was another issue though, vendors could do a much better job of labelling the origin of, and reason for, each of the patches they make. Apache 2.0.44 Released Orton, Joe Apache 2.0.44 was released on the 21st January 2003. This release addresses recent security issues in Apache 2.0.43 Apache 2.0.44 Released Orton, Joe Apache 2.0.44 was released on the 21st January 2003. This release addresses recent security issues in Apache 2.0.43 Apache 2.0.44 was released on 21st January 2003 and is now the latest version of the Apache 2.0 server. The previous release was 2.0.43, released on the 3rd October 2002. See what was new in Apache 2.0.43. Apache 2.0.44 is available for download. This is a security, bug fix and minor upgrade release. Due to security issues, any sites using versions prior to Apache 2.0.44 on Windows should upgrade to Apache 2.0.44. Read more about the other security issues that affect Apache 2.0. Apache was vulnerable to a denial of service attack via a request for MS-DOS device name on Windows 9x and Me. Apache allowed arbitrary code execution via crafted POST request containing MS-DOS device name on Windows 9x and Me. Apache could be forced to serve unexpected files on Windows platforms by appending illegal characters such as '<' to the request URL. The following bugs were found in Apache 2.0.43 and have been fixed in Apache 2.0.44: Allow escaping % sign in CustomLog format strings mod_setenvif: fix BrowserMatchNoCase for non-regex patterns. Return appropriate MIME response headers for negotiated responses from a body embedded in a type-map Prevent 416 "Range not satisfiable" response in place of a redirect Prevent files being left open for the duration of a keepalive connection, which could cause a "Too many open files" error mod_ssl: several fixes for memory handling and leaks mod_proxy: fix invalid Content-Length from pages fetched during server-side include processing. LDAP modules: ensure correct load order in httpd.conf (); fix compatibility with Netscape LDAP libraries; fix Win32 build mod_deflate: fix a memory leak when compressing dynamic content; always emit Vary headers mod_isapi: fix several compatibility problems (, ), and fix bug which caused invalid responses or log entries () CGI modules: fix streaming output from "nph-" scripts, for example CGI::IRC (); fix construction of command line from query strings (), handle environment variables which contain newlines in mod_cgid (); terminate CGI scripts when connection is dropped () Caching modules: many bug fixes (including ), and an HTTP compliance fix () Add an --enable-v4-mapped configure option to allow or disallow connections from IPv4-mapped addresses to IPv6 addresses, on applicable platforms (, ) Add IndexOptions IgnoreCase option to mod_autoindex () Add EnableSendfile directive to disable use of sendfile() when necessary (for instance when serving an NFS share) Add ProxyBadHeader directive to dictate handling of invalid HTTP responses headers Add SERVER_ADDR keyword to mod_setenvif, to represent the server IP address for a particular request Performance improvements Add -S command-line option to httpd, equivalent to -t -DDUMP_VHOSTS Apache Related Links This document contains a set of pointers of interest to people using or developing with Apache. From here, you can link to all the relevant standard definitions, documentation on most aspects of using Apache, module information, and even some links to how Apache is reported by the media. Organisations W3C who maintain W3 standards development Apache project page Document Access HTTP is the protocol for transfering Web pages. Current version is 1.1, which is now an RFC on the standards track. It replaces the widely implemented 1.0. Note: this is not related to Apache version numbers! HTTP 0.9 (of historical interest only) HTTP 1.0 [RFC1945] (or in HTML PS format) HTTP 1.1 [RFC2616] Use and interpretation of HTTP version numbers [RFC2145] Basic and Digest Access Authentication [RFC2617] PEP: an Extension Mechanism for HTTP [Internet Draft] Transparent Content Negotiation [RFC2295] and Remote Variant Selection Algorithm 1.0 [RFC2296] See also: other HTTP Internet drafts, the W3C HTTP specifications Uniform Resource Identifiers or Names (URI, URN) are the generic names for Uniform Resource Locators (URLs), used to identify resources on the WWW and Internet. Uniform Resource Identifiers (URI): Generic Syntax [RFC2396] A Trivial Convention for using HTTP in URN Resolution [RFC2169] URN Syntax [RFC2141] Uniform Resource Locators [RFC1738] Relative Uniform Resource Locators [RFC1808] Cookies let you maintain state with the client, or track 'clickstreams'. HTTP State Management Mechanism [RFC2109] Internet Draft intended to replace RFC2109 Netscape's Original Cookie specification (no longer available) Content Hypertext Markup Language is the protocol used to design Web hypertext pages. Current widely used version is 2.0, often with extensions. Version 3.2 summarises the current practise. HTML 3.2 W3C Reference Specification, more information HTML 4.0 more information Cascading Style Sheets (CSS) W3C Recommendation, more information. Internationalization of HTML [RFC2070] Hypertext Markup Language 2.0 [RFC1866] HTML Tables [RFC1942 experimental] Netscape extensions to HTML 2.0 and HTML 3.0 Microsoft HTML, DHTML and CSS information See also: HTML Internet drafts, W3C HTML specifications CGI is the common gateway interface, which specifies how web servers can call external applications (scripts, programs or other gateways). CGI information and tutorials (NCSA) CGI specification CGI provides a simple way of running programs on the server when a request is received. However they can be inefficient because they need to be started each time a request is made. There are various ways of creating more efficient dynamic responses. JServ module for Java programs mod_perl for efficient Perl scripts and modules FastCGI: a faster version of CGI (Apache module available) Server-Side Includes are a way of writing commands into normal HTML files. When the HTML file is served to the user, the SSI commands are parsed and executed. Apache implements standard SSI, or you can use an alternate module for more advanced SSI implementations Using Server Side Includes Apache Week feature Dynamic Page Langaues Apache Week feature Apache SSI commands NCSA tutorial PHP: a full programming language, available as CGI or Apache module NeoScript: scripting language module Meta-HTML: scripting language CGI ePerl: CGI which allows perl to be embedded into HTML Imagemaps come in several flavours: old-style NCSA cgi-bin program, new Apache imagemap module and client-side imagemaps. Using Imagemaps Apache Week feature Apache imagemap module NCSA imagemap cgi-bin program: NCSA imagemap tutorial Client-side imagemaps [RFC1980] RFC1766 Language Tags (Specification of tags to identify content language) RFC1700 IANA Assigned Numbers (IANA allocates MIME types, character set identifiers) RFC2279 UTF-8, a transformation format of ISO 10646 (An expanded character set compatible with US-ASCII) RFC2046 MIME Media Types (MIME types are used to identify content type) RFC2083 PNG Specification (A portable, lossless, compressed format for graphics) All RFCs Inclusion of a link from this document to an external site does not imply endorsement by Apache Week or Red Hat, who cannot be held responsible for the contents of the remote site. Lists of resources may not be exhaustive. ApacheCon 2002 Las Vegas Weinstein, Paul Paul Weinstein visited the Las Vegas ApacheCon in November 2002 and gives his highlights of the interesting news and events ApacheCon 2002: Day 2 Paul Weinstein visited the four day Apache conference in Las Vegas in November and gives his highlights. The first day of the conference was taken up by tutorials, the presentations started on the second day Some 500 miles and 19 months after the last conference on the state of the world for Apache, developers and users gathered in Las Vegas to converse again about the world's most popular web server. After a day of tutorials, Coar, Ken, Apache Software Foundation member and Conference Chair introduced this year's conference to the over 300 attendees. The conference included 60 presentations, 16 Birds of a Feather, 3 keynotes, and free access to the Comdex convention floor. After a brief break, Ken Coar introduced Tim O'Reilly, Founder and President of O'Reilly and Associates and his topic "Watching the Alpha Geeks." O'Reilly opened with a quote from Sci-Fi writer William Gibson, "The future is here, it's just not evenly distributed yet." saying that Gibson describes exactly how one can understand the ever evolving world of computer technology. O'Reilly's premise is that the evolution of technology follows a simple pattern that can be seen with the adoption and evolution of the personal computer: Hackers such as those who formed the famous Homebrew Computer Club started tinkering and developing computers for personal use as they pushed the technological envelope; These explorations evolved into businesses such as Apple and Microsoft as entrepreneurs start to make the new technology easier for ordinary users; As dominant players emerge that integrate the new technology into a platform such as the Wintel platform where barriers can be raised to keep other entrepreneurs from integrating into the new platform or a healthy ecosystem of corporations can evolve to help the new platform develop; And finally the hackers and entrepreneurs turn their attention to new areas, looking for new frontiers such as that of the Internet and its growth into a new computing platform. O'Reilly moved on to what he sees going on now within the world of hackers and the next group of entrepreneurs, with the growing world of wireless networks, web services and the open source world. So why then have companies struggled with trying to bring the wireless world to the public or struggled to build a model around open source software? Because according to O'Reilly, these companies are still trapped thinking in the old model of cheap hardware and proprietary software that defined the growth of the PC world, and that just as companies such as IBM had to shift from their world of mainframes and other proprietary hardware, the business leaders of today need to change their point of reference in order to fare better in these new, emerging worlds. But most importantly, O'Reilly noted, was that the programmers who build these new technologies, define these emerging technologies, are designing the architecture of the next iteration of the computing world. This, O'Reilly feels, is where the world of Apache can help: by showing what models work in the evolving computer industry, that of adhering to standards, of building a small, but robust application with a modular design. In other words, what the hackers and programmers have succeeded in doing with the Apache server, related projects and how it is done, shows exactly what can and does work in the technological world of tomorrow. The schedule of sessions about Apache on Tuesday included a talk by Mark Cox on Revealing Apache Security Secrets, Jim Jagielski's talk on Migrating to Apache 2.0, a presentation on the new Proxy module for Apache 2.0 by Graham Leggett, along with Theo Schlossnagle and George Schlossnagle who put together a session on deploying scaleable network architectures. The evening ended with a welcoming reception giving food and drinks for attendees to enjoy while they socialized and viewed the exhibit floor. ApacheCon 2002: Day 3 Paul Weinstein visited the four day Apache conference in Las Vegas in November and gave his highlights from the third day of the conference. Wednesday's late morning keynote featured John Fowler, CTO of Software for Sun whose speech "Sun and Open Source: A Bright Future" allowed Fowler to discussed Sun's commitment to Open Standards and the Open Source community. Fowler noted that since Sun's founding over two decades ago, the use of open standards and community participation has been of major importance. Fowler believes that since the founding of Sun there has been an overall shift within the computer industry from developing and selling new technology to that of building solutions that implement open standards. This shift is allowing technology that might originate from competing vendors to work together, providing an overall solution a customer can use, instead of having various vendor components that might solve one problem or another, but overall don't communicate or work together. Moreover, Fowler believes that the Apache project is a prefect example of open standards at work since the server is widely used and of such a benefit because of what standards it implements and how it handles those implementations. In relation to the open source community at large, Fowler noted the major contributions Sun has made not only to Apache and related projects such as Tomcat, but also in non-Apache related projects such as the Gnome desktop and OpenOffice.org. Fowler feels that the work Sun has done with projects such as Apache have fundamentally changed how Sun operates, noting that open source communities can magnify the impact of a software project, not just in how many developers contribute or what is contributed but also in actual deployment of a project's technical solutions, because of the overall openness of the community. A number of large and small companies shared their unique view of Apache and the open source world on the expo floor during the three days of talks. AMD and Covalent took the most advantage of the conference by announcing a co-development project that includes Red Hat to port the Apache code base from the 32-bit architecture that allows it to run on the most commonly found x86 microprocessors to the 64-bit architecture that AMD is developing for its Opteron line of processors To help highlight John Fowler's speech the Sun booth was dedicated to the various open source projects, both Apache and non-Apache as well as exhibiting the versatility of it's Java programming language again in conjunction with the Apache server as well as on its own. Apple highlighted its Apple Developer Connection, which assists developers in deploying desktop and server systems based on Apple's Macintosh OS X platform. Apple of course has a number of web and network related tools available and includes the Apache Web Server by default in both the desktop and server versions of OS X. Sams Publishing and BreakPoint Books were on hand to sell Apache and other web related books for the conference attendees. The books available covered just about any subject, from basic CGI programming to Java Servlets to Apache 2.0. A few other retail vendors filled out the low key expo floor including Daemon News which was featuring BSD Mall and Hackerthreads.com. Wednesday, the busiest of the three days, brought Derek Ferguson's talk on Integrating Apache with Microsoft's .Net and a session on the next version of the XML parser Xerces given by Andy Clark. The afternoon sessions included George Schlossnagle's discussion about how to get the best performance from PHP, a talk by Gerald Richter on Embperl as well as talk by me, Paul Weinstein, on how to use and run a private certificate authority for authentication with Apache. ApacheCon 2002: Day 4 Paul Weinstein visited the four day Apache conference in Las Vegas in November and gave his highlights from the final day of the conference. Thursday, the final day for ApacheCon featured a keynote from Richard Thieme whose speech, "New Ways of Thinking About Security: Open Source Thinking in a Bunged-up World" picked up where Tim O'Reilly left off by reiterating the idea that open source is more than just about code, but in reality is a way of living and thinking. This open source way of thinking is at its fundamental level based on the methods of communication that are commonly used within open source projects. Thieme also noted that, these projects and more importantly those that contribute and use open source technology, have become fluid individuals who's own identity is more modular, less ridged than of past generations, primarily because of the modular, distributed communication systems that are now are commonly used. Just as O'Reilly sees his 'Alpha Geeks' as the early adaptors of technology, Thieme sees these early adaptors of open source and the open source ethic as a new social network emerging from preexisting boundaries. Because of this, Thieme thinks that security issues from around the world need to be seen in this new distributed world view. He noted that ApacheCon was indeed about a community coming together in a physical location, but really is about sharing secrets and how the Apache community shares its secrets, or chooses not to, can help those who are charged with building the next generation of security policies and laws. In other words issues of security, privacy and even intellectual property need to be built based on these new emerging communities and boundaries, thus being beneficial instead of building policies and laws that enforce old political and social boundaries that no longer make sense in the new world based on modular, world of networked communities. Presentations on Apache for Thursday included Greg Stein's session introducing WebDAV and Apache as well as Rob McCool's presentation on the Stanford University's project to deploy machine readable content on the web. Mads Toftum's session on doing URL manipulation using mod_rewrite, Mark Wilcox's session on implementing LDAP along with presentations on data management in Apache 2.0 by Cliff Woolley and performance turning Apache by Thomas Wouters helped round out afternoon. No doubt the highlight for many at this year's ApacheCon attendees was the Closing Session where Ken Coar raffled off a number of goodies supplied by the conference vendors including books, AMD processors and other wonderful swag. But most importantly to those in attendance and to the Apache community at large came the announcement that 2003 will see two ApacheCon conferences, the return of ApacheCon Europe which will occur in the spring at a location yet to be determined and ApacheCon US which will return to Las Vegas in November. Overall most attendees seemed impressed with the return of ApacheCon. While the production of the event was modest compared to previous conferences the quality of the presenters and the presentations where of the same high quality one would expect. Indeed, with so many interesting talks it was easy to find people cutting out of one presentation to hear the end of another and this report only mentions the more typical Apache topics available for attendees. Most importantly, ApacheCon has shown that it is still The Apache Event for Apache developers and users to come together and discuss everyones favorite web server. Photos from ApacheCon 2002 Apache 2.0.43 Released Orton, Joe Apache 2.0.43 was released on the 3rd October 2002. This release addresses recent security issues on non-Unix platforms, some minor bugs found in the 2.0.40 release, and adds some new features. Apache 2.0.43 Released Orton, Joe Apache 2.0.43 was released on the 3rd October 2002. This release addresses recent security issues on non-Unix platforms, some minor bugs found in the 2.0.40 release, and adds some new features. Apache 2.0.43 was released on 3rd October 2002 and is now the latest version of the Apache 2.0 server. The previous release was 2.0.42, released on the 24th September 2002. See what was new in Apache 2.0.42. Apache 2.0.43 is available in source form for compiling on Unix or Windows, for download from the main Apache site or from any mirror download site. This is a security, bug fix and minor upgrade release. Due to security issues, any sites using versions prior to Apache 2.0.43 should upgrade to Apache 2.0.43. Read more about the other security issues that affect Apache 2.0. Fix the security vulnerability regarding a cross-site scripting vulnerability in the default error page when using wildcard DNS. Fix the exposure of CGI source when a POST request is sent to a location where both DAV and CGI are enabled. Fix the security vulnerability regarding some possible overflows in ab.c which could be exploited by a malicious server. The following bugs were found in Apache 2.0.42 and have been fixed in Apache 2.0.43: The UserDir directive has been fixed to again take a list of user names to enable userdir access for, as per 1.3. Flushing behaviour has been improved, to ensure that available response output is flushed when no new output is pending; helping streaming CGIs and other dynamically-generated content mod_auth_ldap has been fixed to retry connections to the LDAP server if it becomes unavailable. Fix for a locking problem in mod_ssl's session cache code which could cause infinite loops on some platforms Fixes for mod_cache to prevent a segfault when attempting to cache some combinations of content (for instance, when using SSI tags which execute CGI scripts), and to correct the CacheMaxStreamingBuffer directive for virtual hosts The default server root directory in suexec has been fixed to match the default install root mod_proxy was fixed to not strip WWW-Authenticate headers on 4xx error responses which prevented server authentication to be performed via the proxy A new module, mod_logio, has been added which allows logging of the number of bytes sent and received by the server. A -p option has been added to apxs to allow programs to be be compiled using this tool. Apache 2.0.40 Released Cox, Mark J Apache 2.0.40 was released on the 9th August 2002. This release addresses recent major security issues on non-Unix platforms, some minor bugs found in the 2.0.39 release, and adds some new features. Apache 2.0.40 Released Cox, Mark J Apache 2.0.40 was released on the 9th August 2002. This release addresses recent major security issues on non-Unix platforms, some minor bugs found in the 2.0.39 release, and adds some new features. Apache 2.0.40 was released on 9th August 2002 and is now the latest version of the Apache server. This is the fourth stable release of Apache 2.0, following up on 2.0.39 which was released on 18th June 2002. Read our special feature for more information about the history of Apache 2.0. Apache 2.0.40 is available in source form for compiling on Unix or Windows, for download from the main Apache site or from any mirror download site. This is a security, bug fix and minor upgrade release. Due to security issues, any sites using versions of Apache 2 on Unix prior to Apache 2.0.39 should upgrade to Apache 2.0.40. Sites using any versions of Apache 2 on other platforms should upgrade to 2.0.40. Certain URIs will bypass security and allow users to invoke or access any file depending on the system configuration. () A path-revealing exposure is present in multiview type map negotiation (such as the default error documents) where a module would report the full path of the typemapped .var file when multiple documents or no documents could be served. () A path-revealing exposure in cgi/cgid when Apache fails to invoke a script. The modules would report "couldn't create child process /path-to-script/script.pl" revealing the full path of the script. () The new features in this release (added since 2.0.39) are: mod_rewrite can now set cookies using the CO extension Performance improvements for the code that reads request headers Proxy FTP now works over IPv6 Changes to the internationalized error documents; they are no longer included by default in the sample configuration file. Add a new directive, MaxMemFree. MaxMemFree makes it possible to configure the maximum amount of memory a particular childs allocator will hold on to for reuse. This directive is useful when uncommon large peaks occur in memory usage. Support the -w flag on to keep the Win32 console open on error Add the ability to enable or disable a filter via an environment variable. Apache on Netware will now pull requests off of the listen queue as fast as winsock will allow without latency introduced by the accept mutex During installation Apache will preserve existing installation directories. Binaries, the build directory, the headers, and the man pages are all copied. Everything else, the config, htdocs, manual, error, icons, and cgi directories are not installed if the directories already exist The bugs fixed in this release include: Fix a long-standing bug in 2.0, CGI scripts were being called with relative paths instead of absolute paths. Apache 1.3 used absolute paths for everything except for SuExec, this brings back that standard Restore the ability to specify host names on Listen directives. Accept multiple leading /'s for requests within the DocumentRoot. Fixed a mod_include error case in which no HTTP response was sent to the client if an shtml document contained an unterminated SSI directive Prevent infinite recursion if an ErrorDocument gets an error Fix segfault in mod_mem_cache most frequently observed when serving the same file to multiple clients on an multi-processor machine Various fixes to the experimental module mod_ext_filter including: Look in the main server for filter definitions when running in a vhost if the filter definition is not found in the vhost, . Fix a segmentation fault if the content-type was not set, , and ignore any content-type parameters when checking if the response should be filtered. Fix infinite loop due to two HTTP_IN filters being present for internally redirected requests. Fixed the Content-Length filter so that HTTP/1.0 requests to CGI scripts would not result in a truncated response. Fix proxy so that it is possible to access ftp: URLs via a proxy chain. Fix perchild to work with apachectl by adding -k support to perchild. Fix the long-standing bug in ab where ab -t10 would loop for 10000 seconds instead of 10 as documented. Also fix an off-by-one-second error Fixed parsing of strings to longs which allows HTTPD to deal with larger files correctly mod_deflate now checks to make sure that 'gzip-only-text/html' is set so that BrowserMatch can be used to control the module Add a filter_init parameter to the filter registration functions so that a filter can execute arbitrary code before the handlers are invoked. This resolves a problem where mod_include requests would incorrectly return a 304. A problem with the keepalive enumeration caused problems when mod_dav sends error responses Various minor fixes to the htpasswd utility including The following platform-specific changes have been made: Solved the reports of .pdf byterange failures on Win32. Support WinNT CGI invocation through ScriptInterpreterSource 'registry' for script interpreter paths and names with non-ascii characters in the executable filepath Fix WinNT cgi 500 errors when QUERY_ARGS or other strings include extended characters (non US-ASCII) in non-utf8 format. This brings Win32 back into CGI/1.1 compliance, and leaves charset decoding up to the cgi application itself When deciding on the default address family for listening sockets, make sure we can actually bind to an AF_INET6 socket before deciding that we should default to AF_INET6. This fixes a startup problem on certain levels of OpenUNIX. O'Reilly Open Source convention in San Diego Weinstein, Paul Paul Weinstein visited the five day O'Reilly Open Source Conference in San Diego this week and gave his highlights of the interesting news and events O'Reilly Open Source Conference: Day 3 Paul Weinstein visited the five day O'Reilly Open Source Conference in San Diego this week and gave his highlights. The first two days of the conference were taken up by tutorials. Tim O'Reilly introduced (photo) the first Keynote speaker for this year's Open Source Conference, Lawrence Lessing, as "my favorite keynoter." Lessing, a professor of law at Stanford Law School, is a vigilant defender of freeing content from the growing limitations of copyright law within the United States. He began by confessing that this would be his second to last keynote and therefore wanted to leave a four-part refrain with the audience: Creativity and innovation always builds on the past The past always tries to control the creativity that builds on it Free society tries to protect the future by limiting the control of the past Ours is less and less of a free society It seems history has shown that creativity and innovation always build on the past. A prefect example of this property of culture, according to Lessing, can be seen in Walt Disney's "Rip, Mix and Burn[ing]" of fairy tale classics in the Twentieth Century. Yet, at the same time the past always tries to control what can be created. Again Lessing sighted that Disney, or in this case the Walt Disney Corporation, has successfully lobbied a number of the 11 total extensions of copyright law, imposing limitations on creative works from 17 years to 95 years. Thus the Walt Disney Corporation has kept others from doing to Mickey Mouse what Disney did to the Brothers Grimm. Worst, according to Lessing, is that technology has helped in the expansion of control that the reworking of copyright law has started. A perfect example is Adobe's E-Reader which limits the ability to cut-and-paste text, making it difficult for someone to even quote text for a research paper, something that is not only possible with print, but has been legally upheld as a "fair-use." A silver lining would be that within a free society one could take a stand against these abuses. After all who wants to live with Hollywood's "insane rules being applied to the whole world?" The problem is that applying direct pressure for change within the United States can be difficult. As retiring US Congressman JC Watts described it "If you're explaining you're losing." Lessing then asked, "What have you done? How many of you have given the EFF more than you've given to the other side [for music CDs or movies on DVD?]" This last refrain of Lessing's points to exactly why members of our free society should care about the limitations of our laws and technology, "never in our history has so few people controlled so much of our culture." What needs to be done is to "Free Culture" and "Create like it's 1790" when copyrights only extended to a narrow number of years and when copyright was understood to be a limitation on businesses a not a limitation on what individuals could do in creating culture. A perfect stage was now set for the second keynoter of the morning, Richard Stallman who Tim O'Reilly admitted to butting heads with on occasion, but who had a "very creative way to deal with the problems of today." RMS took right to telling everyone, "Unlike some of you, I am not an open source developer. I'm an activist in the free software movement." In the 1980s RMS was dealing with the death of the free community that he knew in the 70s. What choice did he have while all the operating systems where proprietary? His solution, he started the Free Software Foundation, "This was the only thing I could do," he conceded. RMS sees "a possibility of freedom" if "you make sure all of your software is free." While the strides with GNU/Linux have been great, the "job isn't done till all the software is free." But what does RMS mean when he says the software has to be free? To this he listed four conditions that have to be meet: Freedom Zero is the right to be able to run the software any way you want Freedom One is the ability to understand and change the software Freedom Two is the ability to share the software, changes or no, with friends Freedom Three is the ability to help build your community using the software "Geeks like to think that they can ignore politics, you can leave politics alone, but politics won't leave you alone," RMS noted, echoing Dr. Lessing, "we have to reject" efforts by politicians just as DRM - Digital "Rights" Management. According to RMS, the DRM isn't about rights; it's about theft, theft of our freedoms. RMS then took the rest of his time to poke fun at the image that some people have about his attitude of being "holier than thou." After dressing himself in an outfit appropriate for a holy figure, RMS pronounced himself "Saint iGNUcius of the Church of Emacs" and provided a prayer to bless one's computer. One should "exorcise evil proprietary operating systems" doing so would put one on the road to sainthood. (photo) During lunch on Wednesday Tim O'Reilly took time to ask questions of RealNetworks Chairman and CEO Rob Glaser about Real's announcement that they will be providing parts of their code for their next generation media platform Helix to the open source community. Glaser first reviewed the announcement for the audience: Helix is a platform for streaming media Helix Community has been created for work on the components for this new platform The client application source code will be available in 90 days with the encoder and server source code to come out at the end of 2002. Helix Universal Sever, a commercial product from Real, delivers all types of media formats such as Windows Media, mp3, even Ogg Vorbis. When asked "Why Now" Glaser replied that within RealNetworks there has always been strong support internally for open source, but Real need to make sure that open sourcing part of their code-base worked such that Real could still provide a value-added business to their base technology. Moreover embracing the open source community helps make sure open standards such as RTP and RTSP are implemented properly. Glaser continued by discussing the dual-licensing approach of using a GPL-inspired license called the RealNetworks Public Source License along with a Java-style license called the RealNetworks Community Source License saying, "We studied a lot not just how to connect with the community, but also how to build a licensing model that would allow our commercial partners to build and maintain compatible applications." O'Reilly Open Source Conference: Day 4 Thursday started with two keynotes about the role of open source technology in the world of Bioinformics. Ewan Birney, of the European Bioinformatics Institute, started by giving a crash course on how Bioinfomatics is a fusion of Biology, data gathering, and computer science and computer technology. As an example Birney noted that one of EBI's projects is to provide the Human Genome data for all to see. In doing so EBI uses a combination of open source technologies such as mySQL, Linux, Perl, Python, Apache and mod_perl. While, the code developed to run the site is available under a BSD-style license, the greater result is that the 3 Gigabytes of information that details how to make a Human is open to anyone, without restriction. Jim Kent, a research scientist at University of California, Santa Cruz continued by noting "I don't think you can have science without open source." Kent observed that the practices of science and those of the open source community are virtually the same, "People can't do [reproduce meaningful results] unless they can see your source" and peer review helps generate better science as well as better software. O'Reilly Open Source Conference: Day 5 Friday, no doubt, was the day that made the conference for many attendees as they saw how open source can assist in the production of movies such as The Lord of the Rings trilogy, heard Bruce Sterling rant about the computer industry and watched Bruce Pernes keep himself from being fined half a million dollars for breaking the DMCA - Digital Millennium Copyright Act. Milton Ngan from Weta Digital, the special effects house created by Peter Jackson, helped open the final day by discussing how open source tools are used to produce the Lord of the Rings. First, however, he entertained the audience by providing a preview of the next Lord of the Rings release, The Two Towers In creating effects for a movie the first step for Weta is to scanning in the whole file for digitalization, "a process that takes two weeks," according to Ngan. The production system consists of 125 SGI machines running Irix, 200 Linux machines and 25 NT boxes. Rendering an effect completely takes around 20 hours and is then played back one a handful of Macintosh for review. Once finished it takes another 2 weeks to transfer back to film. The open source tools Weta uses included Perl and mySQL for data storage and manipulation. Ngan also noted "Apache and PHP are used for running [Weta's] Intranet." Using open source tools in such a rugged environment "pushes the boundaries, which helps solidify the tools." Weta Digital indeed tries to give back to the open source community when possible, but Ngan noted that there is little sharing of tools within the Computer Graphics Imagery industry, "everyone has created their own solution." Moreover, while Weta does own the tools it created and New Line Cinema owns the images created by those tools, the focus and dedication of resources is in the post-production work for Lord of the Rings. If Weta Digital is not selected to for any other production work it will simply cease to exist, thus limiting the resources available to prepare their code for release to the community. Bruce Sterling started his talk on "A Contrarian Position on Open Source" by conceding that he was the token novelist, a non-programmer, talking to programmers about how to program, something akin to "a non-miner going down a mine and asking, 'Why don't you take some time to plant something down here and brighten the place up?" Sterling took an opposing view to the "Cathedral and the Bazaar" metaphor of relating the open source methodology or "bazaar" to commercial "closed-source cathedral." "It's not really about a bazaar. Open Source is about hanging out with the cool guys - very tribal and very fraternal." Which means the price for using open source software such as Linux is "having to spend time with Linux Geeks." In fact if open source technology is analogous to anything it's "just like in a refugee camp, one puts in a long amount of time for nothing." But then again, what is the alternative? Foreshadowing Bruce Pernes' talk Sterling observed that a computer running Microsoft Windows is more akin to an airport. There are "men with automatic weapons, surveillance cameras all over the place. You can't sob as you kiss your mother goodbye at the airport, because it's all on videotape. Then a security check, assumes you've swallowed dynamite and will kill any one you see. All the while attendants ask you snidely 'Where do you want to go today? As if they're doing you some sort of favor." The real problem is that "the computer industry wants to be hot and sexy." 'Information wants to be Free' or 'Information is the Economy' are slogans heard all the time. Yet this isn't what computers are about, freeing information or making money. "Computers are about relationships," they are an enabling technology not an end unto themselves. Days before, Bruce Perens, who currently works as a Senior Strategist and Evangelist of Linux and open source software with Hewlett-Packard, was scheduled to talk; Perens started making the news with his plan to violate the DMCA by describing how to work around DVD player controls. Since the DMCA prohibits making information available on how to circumvent copyright controls, HP asked Perens to take a pass at opening himself and HP to litigation. "I care more about this than getting myself fired," Perens stated, "but the fact is that getting myself fired today would hurt Hewlett-Packard's Linux program." With the disclaimer that the talk he was about to present was his own personal opinion and not that of HP, Perens vocalized some of the problems he sees in the computer industry. His desire to discuss how to work around DVD controls such as the 'Zone Coding' constraint systems that limit what geographical region a DVD can be viewed in, was designed to highlight how the DMCA, "has no exception for fair use" and removes the personal choice of allowing someone to "purchase a DVD in England on vacation and watch it at home in America." Perens continued by stating his concerns with Microsoft's Palladium initiative which "is built on the assumption that the computer user can't be trusted, thus your own computer must prevent you from doing harm" and could be the "end of open computing." After all how can one run a system akin to Linux when a "chip on the motherboard mediates your access to information" and "all digital content is encrypted for mediation by the chip." People may not even be able to print out information from a web page for use away from one's computer without paying a fee. The "unpleasant sociopolitical implications are that this Supply-Side Thinking that dominates politics today devalues the customer, citizen, individual." Perens then picked up the common theme from those before him. "What Can You Do?" his presentation slide asked. Since "policy effects all of us and since we as individuals don't get the choice of voting with our wallets," we need to make our voice heard the 'old fashion way'. "Become pen pals with your politician - use paper not email, vote" and probably most importantly, "talk about this to the people around you." Apache 1.3.26 Released Cox, Mark J Apache 1.3.26 was released on the 18th June 2002. This release addresses a recent security issue, some minor bugs found in the 1.3.24 release, and adds some new features. Apache 1.3.26 Released Cox, Mark J Apache 1.3.26 was released on the 18th June 2002. This release addresses a recent security issue, some minor bugs found in the 1.3.24 release, and adds some new features. Apache 1.3.26 was released on 18th June 2002 and is now the latest version of the Apache 1.3 server. The previous release was 1.3.24, released on the 22nd March 2002. See what was new in Apache 1.3.24. Apache 1.3.25 was never released. Apache 1.3.26 is available in source form for compiling on Unix or Windows, for download from the main Apache site or from any mirror download site. This is a security, bug fix and minor upgrade release. Due to security issues, any sites using versions prior to Apache 1.3.26 should upgrade to Apache 1.3.26. Read more about the other security issues that affect Apache 1.3. Fix the chunked encoding security vulnerability. () The main new features in 1.3.26 (compared to 1.3.24) are: Add text/xml, application/xhtml+xml, audio/mpeg, and video/quicktime mime types to the mime types magic file. Added a -F flag which causes the supervisor process to no longer fork down and detach and instead stay attached to the tty. This allows integration with daemontools. The following bugs were found in Apache 1.3.24 and have been fixed in Apache 1.3.26: Allow child processes sufficient time for cleanups but making ap_select in reclaim_child_processes more "resistant" to signal interrupts. In Darwin, place dynamically loaded Apache extensions' public symbols into the global symbol table. This allows dynamically loaded PHP extensions. Fix for a problem in mod_rewrite which would lead to 400 Bad Request responses for rewriting rules which resulted in a local path. Note: This will also reject invalid requests as issued by Netscape-4.x Roaming Profiles (on a DAV-enabled server) Recognize platform-specific root directories (other than leading slash) in mod_rewrite for filename rewrite rules. Disallow anything but whitespace on the request line after the HTTP/x.y protocol string to prevent arbitrary user input from ending up in the access_log and error_log. Also control characters are now escaped. A large number of fixes in mod_proxy including: adding support for dechunking chunked responses, correcting a timeout problem which would force long or slow POST requests to close after 300 seconds , adding "X-Forwarded" headers, dealing correctly with the multiple-cookie header bug, ability to handle unexpected 100-continue responses sent during PUT or POST commands, and a change to tighten up the Server header overwrite bug-fix. Apache 1.3.24 Released Cox, Mark J Apache 1.3.24 was released on the 22nd March 2002. This release addresses a security flaw on Windows, some minor bugs found in the 1.3.23 release, and adds some new features. Apache 1.3.24 Released Apache 1.3.24 was released on the 22nd March 2002. This release addresses a security flaw on Windows, some minor bugs found in the 1.3.23 release, and adds some new features. Apache 1.3.24 was released on 22nd March 2002 and is now the latest version of the Apache server. The previous release was 1.3.22, released on the 24th January 2002. See what was new in Apache 1.3.23. Apache 1.3.24 is available in source form for compiling on Unix or Windows, for download from the main Apache site or from any mirror download site. This is a security, bug fix and minor upgrade release, with a few new features. Users should upgrade if they are running on Windows, will be affected by the particular bugs mentioned below, or would like to use any of the new features. Due to security issues, any sites using versions prior to Apache 1.3.22 should upgrade to at least Apache 1.3.22. Read more about all the security issues that affect Apache 1.3. Apache for Win32 before 1.3.24 allows remote attackers to execute arbitrary commands via parameters passed to batch file CGI scripts. More details in Apache Week issue 288 or The problem occurs because the input is not properly validated; it is possible to append commands as parameters to the batch file CGI script and have the shell interpreter execute them The characters % and \r have been added to the dangerous Win32/OS2 characters list, and the command line is now passed to the interpreter double quoted. In addition Apache now introduces earlier identification of command.com vs cmd.exe, and treats command.com as a 16-bit application As additional protection in case future CGI argument vulnerabilities are discovered, a new directive CgiCommandArgs off has been added to allow administrators to completely disable the query argument passing mechanism in Apache A bug was found that could cause invalid hostnames to appear in Apache log files. If a double-reverse lookup was performed (for example for Allow from .example.com) but failed, then a spoofed dns-reverse-address could appear in the logs. Note this bug doesn't give any access to protected resources, it only affects what gets written to the log file The main new features in 1.3.24 (compared to 1.3.23) are: Add IgnoreCase keyword to the IndexOptions directive to allow filename listings to ignore case The proxy code read chunks from the backend server in a hardcoded amount of 8192 bytes. A new directive ProxyIOBufferSize has been added to specify the size of the read buffer from the remote server Previously the proxy would wait until the response had been delivered to the client completely before closing the backend connection. Now the backend connection is closed as soon as the last byte is read from it, freeing up resources mod_alias writes a warning to the error log if it fixes up a incomplete redirection target (such as turning /foo into http://host/foo). Since this is a supported operation the message has been demoted so that it will only show up at LogLevel Debug When using mod_proxy to access FTP sites it was impossible to reach a higher directory than the logged in directory, as combinations of /../ are interpreted by the browser and not sent to the server. This problem affects other proxies as well. The Squid proxy uses a "Squid %2f hack" which has been adapted to work in Apache. By prepending /%2f to the path of your request, you can make the proxy change the FTP starting directory to / instead of starting at the home directory for the logged in user The main new features that apply to specific platforms are: Provide new logging to assist Win32 users debug CGI scripts. When at LogLevel info the cgi command invoked is logged. When at LogLevel debug the environment variables are also logged Added a logging module for NetWare, mod_log_nw, as NetWare is unable to use the RotateLog utility Added a -e command line directive for NetWare to force all fatal configuration file errors to the logger screen. This allows Apache to shutdown cleanly and completely on an error condition The following bugs were found in Apache 1.3.23 and have been fixed in Apache 1.3.24: Fix a segfault condition in mod_include which could be triggered by improper termination of conditional directives such as #if Fix a problem in mod_proxy where the Server header from the backend system would be replaced by one from Apache. This violated RFC2616. This fix has introduced a further issue which allows modules to override the Server header, but this will be fixed in the next release There is a problem in mod_proxy where each entry of a duplicated header such as Set-Cookie would overwrite the previous value of the header, resulting in multiple header values (like cookies) going missing. A fix was committed to 1.3.24 but doesn't fix the problem Fixes to apxs to allow the -S option to contain quotes, and to rebuild apxs when options have been changed The Location response header, used for external redirects, must be an absolute URI. The Redirect directive tested for that, but RedirectMatch did not and would allow almost anything through Fix a longstanding bug that errors returned by src/Configure would not be noticed by the top level configure script. That was bad for automated production environments, as errors would pass through unnoticed mod_proxy would send a HTTP/1.0 request even though it is now compliant with HTTP/1.1 A number of other changes have been made to FTP handling in mod_proxy including properly escaping file names from directory listings, a cleanup to the output HTML, the output of directory listings in ASCII to avoid issues with EBCDIC servers, and the closing of the data and control channels to the server properly Previous fixes to mod_rewrite in Apache 1.3.23 broke the ability to do random balancing. , The following bugs relate to specific platforms: The Win32 port has had the remaining cases of blocking network IO eliminated A change has been made on TPF to make make the ap_open_logs call the same as other platforms and prevent a possible SIGPIPE in standalone_main Work around a bug in Windows XP that caused data corruption on writes to the network The support for enabling pthreads-based accept() serialization using the AcceptMutex configuration directive suffered from a serious problem on Solaris platforms as the pthreads library was not being linked into the httpd executable. This meant stub versions of the mutex functions are used from the C library, which resulted in no serialization being enforced Apache 1.3.23 Released Cox, Mark J Apache 1.3.23 was released on the 24th January 2002. This release addresses some minor bugs found in the 1.3.22 release, and adds some new features, including HTTP/1.1 support for mod_proxy Apache 1.3.23 Released Apache 1.3.23 was released on the 24th January 2002. This release addresses some minor bugs found in the 1.3.22 release, and adds some new features, including HTTP/1.1 support for mod_proxy Apache 1.3.23 was released on 24th January 2002 and is now the latest version of the Apache server. The previous release was 1.3.22, released on the 12th October 2001. See what was new in Apache 1.3.22. Apache 1.3.23 is available in source form for compiling on Unix or Windows, for download from the main Apache site or from any mirror download site. This is a bug fix and minor upgrade release, with a few new features. Users should upgrade if they will be affected by the particular bugs mentioned below, or would like to use any of the new features. Due to security issues, any sites using versions prior to Apache 1.3.22 should upgrade to at least Apache 1.3.22. Read more about security issues that affect Apache 1.3. The main new features in 1.3.23 (compared to 1.3.22) are: HTTP/1.1 support has been added to mod_proxy after being backported from the Apache 2.0 updates started last April. The updates include support for Cache-Control, content negotiation using Vary, persistent connection handling, and much more. A new directive, FileETag allows the format of the ETag to be controlled via runtime directives. Find out more about this new feature. Addition of a 'filter callback' function to enable modules to intercept the output byte stream for dynamic page caching The following bugs were found in Apache 1.3.22 and have been fixed in Apache 1.3.23: Fix incorrect Content-Length header in 416, "Range Not Satisfiable" responses Revert mod_negotiation handling of path_info and query_args to the 1.3.20 behavior. , , Prevent an Apache module from being loaded or added twice due to duplicate LoadModule or AddModule directives Add run-time validation of the Group directive, to catch invalid but syntactically correct values. The following bugs relate to specific platforms: Versions of FreeBSD from August 2000 include a feature called "accept filters" which delay the return from accept() until a condition has been met. Apache will now use the "httpready" accept filter rather than "dataready" on FreeBSD after 4.1.1-RELEASE where it works correctly. More details of accept filters are available. Some fixes for Netware including link problems with mod_vhost_alias, file locking updates to get mod_auth_dbm to work, and a problem when accessing an empty directory which has option indexes specified producing an access forbidden message On HPUX 11, an ENOBUFS, No buffer space available error occurs when an accept() cannot complete. This error is now ignored so that child processes don't get incorrectly terminated Win32 platforms would incorrectly always return forbidden in response to a OPTIONS * request Unixware 7.0 and later did not have a default locking mechanism defined. This bug was introduced in apache 1.3.4 A number of fixes for Cygwin including a better default mutex as well as better proxy and DBM support A bug on Win32 could cause Apache to stop responding to requests for a period of time if the MaxRequestsPerChild directive was set to anything other than 0. MaxRequestsPerChild of 0 is the recommended setting Win32 will now output an error message if the server hits the ThreadsPerChild limit. This is useful for administrators to detect when their server is running out of threads to handle requests Tsan, Min Min Featured Articles 2001 Our selection of the best featured articles from our weekly newsletters. Read about everything from "Apache and Tomcat" to "Apache Security" Featured Articles 2001 Each week Apache Week brings you our pick of the best Apache related articles from around the web. In this special feature we select our favourites from each category. The Developer Shed kicks off the new "Getting More Out Of Apache" series with virtual hosts and Server-Side Includes. In part 2 of "Getting More Out Of Apache", the Developer Shed shows you how to implement basic user authentication and set up access control groups. It also talks about Apache logging capabilities and the powerful URL rewriting module. "Setting up Apache with mySQL, Frontpage 2000 Extensions, and PHP NHF" is a Newbieized Help File (NHF) written by Dallas Engelken for newbies to get Apache up and running with Frontpage support in no time at all. In "Linux for Newbies, part 22", Gene Wilburn stresses on the benefits of compiling Apache and any related modules by hand. Instructions are given for removing existing Apache and PHP from one's system before compiling them again from source. By doing this, users control how the packages are built and choose the locations for the various parts. If you prefer to build Apache from source manually, you may be interested to refer to Apacompile which basically is a set of instructions and examples for compiling Apache and other common modules such as mod_ssl, mod_auth_ldap and mod_php. There are still some configuration samples yet to be completed. For those using Mac OS, here's a straightforward step-by-step tutorial on building Apache 1.3.22 and PHP 4.0 for Mac OS X 10.1 However, the instructions don't include integrating mod_perl or mod_ssl. The Developer Shed presents step-by-step instructions for building Apache, MySQL, WebDAV and PHP on Mac OS X. All these programs compile and run on Mac OS X due to its BSD-based UNIX core known as Darwin. To avoid confusion, the Apache Web server built is not enhanced with mod_ssl. Noel Davis looks at how to overcome an Apache on Mac OS X security issue which only involves those who store files on Mac OS X's HFS+ file system. Three workarounds are available for this problem. Kevin Hemenway unravels the mystery of the built-in Apache web server that comes with Mac OS X in his first article of a new series about serving web pages from a Mac. You'll learn how to start up Apache, access your personal home page, locate Apache's DocumentRoot, and customise the default web page. This is just the appetiser - there are more to come in the next installment when Kevin gets down to the crux of maintaining a full-fledged web site. Apache on Windows NT, how does it compare to Apache on UNIX or other web servers such as IIS? Apache Today has the answer. Windows users who are interested in using Apache but are discouraged by the apparent lack of online information about this topic may like to check this out. "A Feather in Your NT Cap" persuades users running Microsoft's Internet Information Server (IIS) on Windows NT to migrate to Apache on NT. It lists the three limitations of Apache's ISAPI implementation, describes two main ways of installation, gives an overview of the configuration, and shows you how to start Apache as an NT service. At WebTechniques.com, Jim Jagielski has a few tips for those who are providing web-hosting services in "Customer Number One". He looks at two methods for Apache on how to provide every customer with dedicated server performance and quality guarantees in a shared server environment as if he or she is the only customer. The first uses mod_throttle to control various parameters, such as the number of requests or the total bandwidth used on a per server, virtual host, location, directory or user basis. The second allows CGI scripts to execute under its own user and group ID using suExec. He also discusses the pros and cons of running multiple instances of Apache simultaneously. "Save Your Site from Spambots" teaches you how to use mod_rewrite to redirect "spambots", software packages that crawl the Web harvesting e-mail addresses and adding them to bulk e-mail lists, to a specific page that has "special" messages just for them. Since this method uses the content of the User-Agent: HTTP header to identify the "spambots", it won't prevent "spambots" that masquerade as other browsers from scraping e-mail addresses from your web site. Other solutions are presented as well and the one recommended is "spamtraps" - special addresses that are solely used for catching spammers. The author concludes that the best way to combat unwanted bulk e-mail is to immediately report spam to the ISP from which it originates as many times as it takes until the ISP takes the necessary actions. The administrators at evolt.org are "Using Apache to stop bad robots". In a short article they show how they capture robots that not only ignore the robots.txt file, but deliberately try to index files they are told not to. Morbus Iff develops a "Search Engine Friendly SSI Image Gallery" in his article on evolt.org. The article shows how to create a dynamic image gallery, using only the features built into a core distribution of Apache. WebmasterBase.com looks at the pros and cons of three methods of passing information to your web pages without the use of a query string so that your web site has search engine-friendly URLs. The methods are the implementation of PATH_INFO, .htaccess error pages, and the ForceType directive, and have been tested using PHP with Apache on Linux but they should also work on other platforms. Information Security Magazine presents an article on improving Apache and a case study on companies that swear by (not at) Apache in its April issue. It starts off by refuting the mindset that running Apache guarantees security although it readily admits that Apache deserves its reputation for being a secure Web server. Then it provides the steps for installing Apache and mod_ssl, securing the underlying Linux server, and testing Web applications for vulnerabilities. Sys Admin magazine presents Apache::Motd, an Apache module based on the "Message Of The Day" utility found on UNIX systems. It intercepts user's initial request and displays the contents of the motd file before serving the requested page. Carlos Ramirez, its creator walks us through the installation and configuration process. Linux Gazette provides three different options to redirect a request to another virtual host running on the same webserver. If you want to distinguish yourself from the boys, the solution is to use mod_rewrite under a Virtual Host container. It also shows you how to achieve the same results using a Perl script or the Redirect directive. "Apache CodeRed Countermeasures with PHP: codeRedKiller!" provides a solution on how to prevent Code Red requests from reaching your Apache Web server by using PHP and bash. Basically it uses a PHP script to record the source IP address of the request and then runs a shell script to set up a filter in your firewall to block any further requests from the same source. You could use a simple shell script to parse your Apache error log to obtain the source IP address instead of using PHP. This article also advises you to ensure that the source IP address is not spoofed. The drawback is that all other valid requests from the source IP address will be stopped from reaching your web server permanently until you remove the filter. Fancy a role in Episode 2, Attack of the Code Red 2 Worm? No, this is not a new B-grade movie but how you can be a good internet citizen and let people know that their server has been infected by the Worm. One way is by using Apache::CodeRed written by Reuven M. Lerner. In this article, he explains how the module intercepts requests for /default.ida, determines the host name of the HTTP client, sends only one warning e-mail message in a 24-hour period to SecurityFocus and the administrator of that client, and keeps a list of IP addresses to be ignored. Interested in setting up your own Net radio stations? Start then by reading this introduction to mod_mp3, a module that optimises the Apache Web server for streaming MP3s. Although mod_mp3 is still in its infancy, it already supports file-sharing and all the basic webcasting functions, with many more ambitious features in the pipeline. Chris Bush explains the basics of Tomcat configuration and includes instructions for integrating Tomcat with Apache in "Linux as an Application Server - The Tomcat Way". A good read for those interested in supporting Java Servlet 2.2 and JSP 1.1 with Apache Web Server. "JSP Quick-Start Guide" has been updated recently for use with Apache 1.3.22, Tomcat 4.0.1, and mod_webapp which is the new Apache connector module for Tomcat 4.x. This step-by-step tutorial shows you how to set up and run a JSP-enabled server under Windows. By the end of this, you'll have a basic JSP page working smoothly. This week, it's Apache and Tomcat again as Robert Eksten shows us how to set up Tomcat as an Apache add-on using mod_jk instead of mod_jserv. It is relatively simple as it only installs prebuilt components and the steps do not involve compiling source code. In "The Apache XML Project: How To Get Read All Over", Software Development magazine walks you through a project that uses Java, Jakarta Tomcat and Cocoon to serve XML documents. Lawrence Teo explains how to set up a web-based archive for a mailing list in Issue 72 of Linux Gazette. He uses Apache as the web server, Hypermail to convert the e-mail messages stored in a UNIX mailbox file to a set of cross-referenced HTML files, and cron to update the web-based archive periodically. He assumes that those three components have been installed on your system so only the instructions on how to configure them are provided. At LinuxWorld.com, Joshua Drake gives a guide on "How to save an Apache log file in a PostgreSQL database". The article gives a step by step guide to using the pgLOGd program with Apache. Introduction to WML, Apache, and PHP is a good starting point for developing PHP-enhanced WML applications on the Apache Web Server. Instructions are given on configuring Apache to accept and serve WML enabled decks. By the end of this, you will have your first 'simple' wireless page. PHPBuilder take a look at "using Webalizer to analyze Apache logs". Webalizer is a freely available log analysis tool written in C that is designed for speed; even on a modest machine it can handle tens of thousands of log lines a second. However it can be tricky to get Webalizer installed, so this article takes you step by step through how to get it installed and running. "You Can Get There from Here" part 1 and part 2 show you how to install, configure, and use Squirrelmail on your PHP4 enabled Apache web server. For better security, you can run Squirrelmail on a SSL-enabled Apache web server or implement Apache's basic authentication. "You Can Get There from Here, Part 5" shows you how to install, configure, and use Rolodap on your PHP4 enabled Apache web server. You need to compile PHP4 with LDAP support for this. In case you hadn't guessed it from the name, Rolodap is an electronic version of the traditional desktop rotary file of cards, usually used for registering contact information. John Lim presents his compilation of 22 tips on "Tuning Apache and PHP for Speed on Unix" in PHP Everywhere. The tips can even be applied to Perl and Python too. In "Tuning Your Apache Web Server", Don MacVittie shows us how to configure the directives in the httpd.conf file to achieve maximum performance. Users have to ensure that their hardware can support the volume of connections they are aiming for, before starting with the optimisation. As there are no hard and fast rules for tweaking the settings, the best configuration is obtained by trial and error - benchmarking the server after changing the directives each time. Ibrahim F. Haddad explains the results he got for testing the performance of three open-source web servers: Apache, Jigsaw and Tomcat on his experimental Linux cluster platform. He performs four type of tests, each with a different server and on 1, 2, 4, 6, 8, 10, and 12 CPU systems but only presents three comparison cases: Apache 1.3.14 vs. Apache 2.08a on one CPU, Apache 1.3.14 vs. Apache 2.08a on eight CPUs and Jigsaw 2.0.1 vs. Tomcat 3.1 on one CPU in this report. His conclusion is that Apache is considerably faster and more stable than the other web servers. Are your Web servers up to the strain of real-world usage? "HTTP Benchmarking" describes a sample benchmarking setup and shows you how to use httperf and Autobench to stress-test your systems. Joe "Zonker" Brockmeier walks you through the process of setting up and running a few benchmark tests against Apache using autobench and httperf in "HTTP Benchmarking, Part 2". The tests are performed on both the Debian x86 and SPARC distributions but will apply to any UNIX-based OS running Apache. In "HTTP Benchmarking, Part 3: Tips and Tweaks", Joe "Zonker" Brockmeier shows you how to tweak the Apache Web server to improve performance. Although he focuses on Linux systems, some of the tips can be applied on other systems as well. In "Performance Tuning by Tweaking Apache Configuration", Stas Bekman demonstrates how to fine-tune the MinSpareServers, MaxSpareServers, StartServers, MaxClients, and MaxRequestsPerChild directives to maximise the usage of your system resources and to ensure good performance. He uses the ApacheBench (ab) utility to benchmark the Apache Web server with around ten different combinations of parameter settings in the tweaking process. Jeffrey Carl gives a few tips on handy tools to use when troubleshooting server problems in "The Web Server First Aid Kit". Its approach can be applied to most Unix and Linux systems but it occasionally refers specifically to the Apache Web Server. Some of the problems it tackles are: figuring out the cause of slow response from server, unauthorized entry, and network misconfiguration. eWEEK Labs' latest Web server benchmark tests show that Apache 1.3.19 running on Linux displayed a huge 2.5 factor speedup in just two years of development time. Sys Admin magazine describes how to build an affordable load balancing cluster using the Apache HTTP server and the Apache JServ Java application server. It also provides some interesting benchmark test results. Last November (Apache Week issue 224), we mentioned that APR (Apache Portable Run-time) has spinned off into a separate project. In "Aid From APR", Ryan Bloom explains about its advantages and illustrates his point by comparing a APR segment of code with the native code. In CNet Builder.com, it's Ryan Bloom again as he talks about how Apache 2.0 is more than a web server as it has the potential to serve any protocol. He reveals the benefits of using a single server for multiple protocols and the way to implement it using Apache 2.0. Ryan Bloom kicks off a new series of columns about Apache 2.0 for O'Reilly Network readers with his first column - "Installing Apache 2.0". This piece proves to be merely a rehash of his previous Apache 2.0 articles except for a mention of mod_tls. In "Migrating from Apache 1.3 to Apache 2.0", Ryan Bloom shares his experience of porting the apache.org web server to Apache 2.0 with O'Reilly ONLamp.com's readers. He gives some tips on which Multiprocessing Module (MPM) to use, implementing filters, and how to solve the problem of IPv6 support. O'Reilly ONLamp.com brings you the latest information about filters for Apache 2.0 in Ryan Bloom's column. This article is just an introduction to the subject, covering some of the basic concepts of filtered I/O which is the ability for one module to modify the output of an earlier module, listing three standard filters included in the basic Apache distribution, and explaining what filter types are. Meanwhile, "Writing Apache 2.0 Output Filters" gives enough information for a developer to be able to write an output filter from scratch. According to Ryan, developers have improved the interface over the past few releases so that the complex task of writing filters becomes easier. Moving on from output filters, Ryan Bloom explains about writing input filters in his latest article in the Apache 2.0 series. He highlights three differences between input and output filters, covers the ap_get_brigade function, and walks readers through an example input filter in detail. After reading this, you can start writing your own input filters. In Ryan Bloom's swan song for the Apache 2.0 Basics series, he talks about one of the least publicised new features in Apache 2.0 which is allowing one module to call into another module to execute an operation. In Apache 1.3, for two modules to execute the same operation, the feature has to be implemented in both of the modules, making synchronisation of changes a tedious task. He uses the mod_include and mod_cgi modules to illustrate his points. In "Apache 2.0: The Internals of the New, Improved A PatCHy", Ibrahim F. Haddad gives an overview of Apache 2.0 and shares with us the results of his Apache 2.0.8 performance tests. In conclusion, he highly recommends that current Apache 1.3.x users upgrade to Apache 2.0 once the release version is available. Please refer to "Apache Portable Runtime Project" and multiprocessing modules (MPMs) if you require more information about these two subjects. "Learning PHP: The What's and the Why's" is the first article in a new series that aspires to teach everything about PHP, beginning with the basics of PHP to advanced subjects such as databases and XML support. This introductory piece briefs us on what PHP is, its history, and the reasons for choosing it over other languages. Make a trip down memory lane with Rasmus Lerdorf, creator of PHP as he guides us through PHP's origin, usage, syntax, and features in "Scripting the Web with PHP". It provides a good overview on all that PHP has to offer with simple examples that illustrate the concepts clearly. The topics covered are the four different PHP tag styles, ways to install PHP, how PHP handles variables and errors, manipulates strings, connects to relational databases, generates content in formats other than HTML, and manages session. He advises that the best way to learn PHP is to use it. While PHP is easy to learn, it is another story when it comes to getting it right. In his three part article series, Sterling Hughes imparts some advice on how to prevent 21 common mistakes made by PHP programmers. It is worthwhile to read through the list of textbook, serious, and deadly mistakes, and give yourself a pat on the back if you have managed to avoid all of them. "Best Practices: PHP Coding Style" stresses the importance of having a coding standards and sheds some light on the PHP PEAR Project. Find out more about mod_perl in the first of a series of updated articles by Stas Bekman. "Why mod_perl?" intends to entice you to give it a try by revealing mod_perl's popularity and presenting a few well-known sites that are powered by it. Now that you're hooked, you'll be glad to know that it only takes 30 minutes to get started with mod_perl and here's how to do it. Take23 shows us how to use Apache::PortCorrect (a Perl module) to redirect users from a nonsecure port over to a secure SSL port based on the URL that they are trying to access. This article is for those who are more at home using mod_perl with the Apache Web Server and mod_ssl than setting up a set of mod_rewrite rules to perform the same task. Stas Bekman talks about improving mod_perl performance. He starts off with choosing the right operating system and hardware in part I, comparing various benchmarking tools in part II and now in part III, he continues with code profiling and memory measurement techniques. In "Improving mod_perl Driven Site's Performance - Part IV", Stas Bekman delves into the benefits of using shared memory, and calculates the size of a process' shared memory and the real memory used. Stas Bekman continues with other techniques on saving even more memory in "Improving mod_perl Driven Site's Performance"". It does pay to be frugal. In Apache Today, "Improving mod_perl Driven Site's Performance Part VI" is haunted by zombie and ghost. Of course Stas is referring to "orphan" processes as he explains in technical terms why it is bad to fork subprocessess from mod_perl. The administrator at cgisecurity.com looks at some common fingerprints used in port 80 exploits with a few examples on how each attack signature may be implemented. It covers common malicious requests, commands which may be executed by worms, files which may be requested by attackers, buffer overflows, and hex encoding. Although it is not meant to be an exhaustive list, it is sufficient to help web server administrators identify attack patterns in their logs, and to add the appropriate rules to their Intrusion Detection Systems (IDS). In "Freeware Security Web Tools", Gary Bahadur talks about a few freeware Linux tools that can be used to perform footprint and vulnerability analysis, the first two phases of a web server security assessment. Among the tools mentioned are Nmap, Netcat (nc), Whisker, Cgichk.pl (a Perl-based scanner), Malice (also a Perl-based scanner), and Md-webscan. In "Safer CGI Scripting", Charles Walker and Larry Bennett cover methods to fix various CGI scripts vulnerabilities and touch on developing a CGI security strategy. Although the examples are written in Perl and C, they can also be applied to the scripting language of your choice. In PHP DevCenter, Darrell Brogdon looks at security issues relating to PHP when running PHP as either an Apache module or a CGI binary, and the ways to remedy them. PHP, a server-side HTML-embedded scripting language, offers web developers the convenience of generating dynamic page content, and supports a wide range of databases but PHP programs are vulnerable to security compromises if they are poorly written. "On the Security of PHP, Part 1" aims to minimise this risk by offering some guidelines on secure PHP programming practices. It begins with an overview of PHP, and then examines some of the most common security issues with PHP programs. "On the Security of PHP, Part 2" wraps up this two-parter by showing us how to secure PHP scripts with a combination of safe programming practices and PHP settings. It talks about how to use PHP safe mode, how to avoid the risks posed by files with a .inc extension, how to filter user input, and how to prevent scripts from changing PHP configuration options. "Avoiding security holes when developing an application - Part 6: CGI scripts" explores a few examples of poorly written Perl scripts which are vulnerable to security compromises. Before delving into the code, it gives an overview of how a web server works and explains about server-side includes (SSIs) for Apache. Perl developers are advised to use the "warning" option, "taint mode" option, and to specify "use strict" at the beginning of their Perl scripts. In the wake of the Code Red worm, Joe "Zonker" Brockmeier warns Unix and Linux administrators running the Apache Web Server not to let their guard down in this tongue-in-cheek but apt piece entitled "Thinking about Security". I'm sure many of you will find his advice on how to stop your boss from embarrassing himself useful. Apache 1.3.22 Released Cox, Mark J Apache 1.3.22 was released on the 12th October 2001. This release addresses some security flaws, fixes minor bugs found in the 1.3.20 release, and adds some minor new features. Version 1.3.21 was not released. ********************************************************** Apache 1.3.22 Released Apache 1.3.22 was released on the 12th October 2001. This release addresses some security flaws, fixes minor bugs found in the 1.3.20 release, and adds some minor new features. Version 1.3.21 was not released. Apache 1.3.22 was released on 12th October 2001 and is now the latest version of the Apache server. The previous release was 1.3.20, released on the 22nd May 2001. Version 1.3.21 was never released. See what was new in Apache 1.3.20. Apache 1.3.22 is available in source form for compiling on Unix or Windows, for download from the main Apache site or from any mirror download site. This is a security fix, bug fix and minor upgrade release, with a few new features. Users should upgrade if they will be affected by the security problems, have noticed particular bugs mentioned below, or would like to use any of the new features. Due to security issues, any sites using versions prior to Apache 1.3.14 on Unix, or all versions on Windows or OS2, should upgrade as soon as possible. A vulnerability was found in the Win32 port of Apache 1.3.20. A client submitting a very long URI could cause a directory listing to be returned rather than the default index page. A 403 Forbidden will now be returned. A vulnerability was found in the split-logfile support program. A request with a specially crafted Host: header could allow any file with a .log extension on the system to be written to. , A vulnerability was found when Multiviews are used to negotiate the directory index. In some configurations, requesting a URI with a QUERY_STRING of M=D could return a directory listing rather than the expected index page. The security issues above have been assigned standardized names, CAN- by the Common Vulnerabilities and Exposures project The main new features in 1.3.22 (compared to 1.3.20) are: The user manual has been updated. As well as a number of small fixes these updates include new translations into French and Japanese, a guide to using Apache httpd on Cygwin, a lexicon of Apache error messages, updated TPF documentation, and a comprehensive guide to using log files The user manual can now be moved out of the htdocs DocumentRoot during installation by invoking configure with the --manualdir= switch, to allow separation of on-line docs from regular contents. The supplied icons are now also distributed in PNG format A significant overhaul to the the Apache Bench program, ab has taken place, as first reported in April. The new Apache Bench includes fixes, additional statistics, csv and gnuplot output, and SSL support New directives have been added to the mod_usertrack module, The first, CookieDomain, can be used to customise the Domain attribute. The patch to add the CookieDomain directive was first submitted over two years ago. Historically mod_usertrack has used the obsolete Netscape cookie syntax. The new CookieStyle directive allows use of the RFC2109 or RFC2965 syntax instead. ,,. The server will now display a warning if line-end comments (#) are found in the configuration file. Not all directives are able to handle comments on the same line A new directive, AcceptMutex, allows run-time configuration of the mutex type used for accept serialization, currently a compile-time only setting in 1.3. Since different types of mutex have different performance characteristics on different platforms, this directive will allow administrators to tune their Apache server more easily. The current list of possible methods is: uslock, pthread, sysvsem, fcntl, flock, os2sem, tpfcore, none. Not all platforms support all methods mod_auth has been enhanced to allow access to a document to be controlled based on the owner of the file being served. Require file-owner will only allow files to be served where the authenticated username matches the user that owns the document. Require file-group works in a similar way checking that the group matches New features that relate to specific platforms: A new directive, AcceptFilter, has been added to control BSD accept filters at run-time. This should make it easier to move server binaries across different BSD machines without requiring recompilation. Support for accept filters was first added to version 1.3.14, the functionality can postpone the requirement for a child process to handle a new connection until an HTTP request has arrived, therefore increasing the number of connections that a given number of child processes can handle On Win32 mod_unique_id, mod_mime_magic, and the mod_vhost_alias modules are now enabled On Win32 the code to allow the server to run under Cygwin has had a number of fixes and updates. Cygwin support was first added to version 1.3.20 On Windows NT or 2000, the service display names can now be modified by the user (use the service control panel applet) On Win32 add a new option -W that can set up a service dependancy The server will now take advantage of recent improvements to the TPF operating system which include an enhanced system fork and exec, updates to allow non-blocking file descriptors, and an update to shutdown processing The server has been ported to a new OS, Atheos The following bugs were found in Apache 1.3.20 and have been fixed in Apache 1.3.22 Under certain circumstances a child may crash due to a bug in mod_include. If a server uses an ErrorDocument for 404 (request not found) errors which points to a server-parsed HTML file which uses a  section, then a request containing %2f will result in a segfault. The segfault is harmless and does not cause a security problem, but is being triggered by the recent IIS worm The Multiviews functionality has been fixed to prevent mod_negotiation from serving any multiview variant that contains unknown filename extensions. Apache will prefer installed version of the Expat library over the bundled version. This fixes conflicts when multiple copies of the Expat library get loaded (notably when using mod_perl and XML::Parsers::Expat) UnsetEnv now works from the main body of a configuration file. When used as a reverse proxy any headers set by other modules (such as mod_usertrack or mod_securid) now get passed on to the back-end server. Server response headers can now be logged via the proxy. mod_proxy will now pay attention to HTTP headers that specify the request is not to be cached. When a client making a request via mod_proxy died unexpectedly, mod_proxy did not close its connection. The CacheForceCompletion directive has been fixed , , A memory leak has been fixed in the mod_mime_magic module A Satisfy All option has been added to the default container designed to stop access to .htaccess files. Without this directive, these files could still be fetched if they were within the scope of a Satisfy Any directive. The following bugs relate to specific platforms: A number of fixes for NetWare have been added. These include: enabling long file names in htpasswd and htdigest, protection against ill behaved modules, better handling of abnormal shutdowns, dealing with the limited stack space during server side includes, and recognising special filenames such as proxy:http:// correctly A shutdown hang could occur on Solaris when using lots of piped TransferLogs and at least one piped ErrorLog On EBCDIC platforms a bug in the proxy module stopped SSL proxying working On Win32, mod_unique_id did not guarantee a unique ID due to threading The Win32 Makefiles are now 100% compatible with the Microsoft Visual C++ compiler versions 5,6,7 Cox, Mark J Code Red requests for /default.ida Don't panic if you see requests for the default.ida file in your Apache access logs. These requests are from the Code Red Worm designed to seek out vulnerable IIS servers. Code Red requests for /default.ida We continue to get a large number of messages from system administrators who see requests for default.ida in their Apache access logs. These requests are from the Code Red Worm designed to seek out vulnerable IIS servers. We receive a large number of messages from system administrators who see requests for /default.ida in their Apache access logs. The requests look similar to this: 192.168.2.12 - - [19/Jul/2001:16:55:47 +0100] "GET /default.ida?NNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNN%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3% u7801%u9090%u9090%u8190%u00c3%u0003%u8b00%u531b%u53ff%u0078%u0000%u00=a HTTP/1.0" 400 252 - If you are running Apache there is nothing to worry about, these requests are part of the Code Red Worm designed to search out vulnerable IIS servers running on Windows. You can quite happily ignore these requests Other common log entries you might see include: Requests for robots.txt in the root directory. These requests are normally automatically made by robots which will analyse the contents of this file to see what files and directories they are not allowed to access. The format of the robots.txt file is given in the HTML 4 Specification. Requests for favicon.ico in various directories (first seen in April 1999). Microsoft Internet Explorer version 5 and above can display a site-defined icon when a site's URL is displayed in a favourites list. This icon is obtained by asking the site for favicon.ico. If the URL contains slash characters (normally used to represent a directory hierarchy), MSIE 5 will request "favicon.ico" in each parent directory until it finds one or reaches the root. The format of the favicon.ico file is the Microsoft icon format. To see this 'feature' in action, bookmark this page using MSIE. Requests for cmd.exe in various directories. These are usually attempts to exploit various security vulnerabilities that affect Microsoft IIS servers. Apache 2 Release Apache 2 was released on the 6th April 2002; we look at the history of development on Apache 2.0 and features to help you use this new release Apache 2: General Availability Apache 2 was released for General Availability on the 6th April 2002 After many years and several betas the Apache group were proud to announce the general availability of Apache 2.0 on the 6th April 2002. The general availability release was based on Apache 2.0.35. Apache 2: Brief History Plans for Apache 2.0 were discussed back in the summer of 1996. We look at the significant events in the history of Apache 2 The Apache group started discussing plans for Apache 2.0 as far back as the summer of 1996, just after the release of Apache 1.1.1. In July 1996, Apache Week issue 24, covered the first discussions of multithreading. One month later, in August 1996, Apache Week issue 128 looked at how useful filtering would be, and some possible ways of implementing it. Filtering was finally added to Apache 2.0 four years later, in August 2000. In February 1997 the Apache group looked at plans for the server after version 1.2 was released. The plans included a considerable rewrite to include support for multithreading, filters, and OS abstractions to allow for versions of Apache on Windows NT and other systems. Apache Week issue 54 covered these plans and predicted that Apache 2.0 is "likely to take some time" May 1997 saw the group decide that Windows releases would be outside of the main Apache development effort. The aim for 2.0 was to ensure that the same code is used for all operating systems with a set of platform-specific routines to handle anything that varies between operating systems. (Apache Week issue 65) . In June, Apache 1.2 had been finally released and work started on the requirements for the redesign of the core Apache code. Apache Week issue 69 discussed the need for additional processing phases, and the plans for a graphical configuration manager. All the plans for Apache 2.0 were summarised in February 1998, Apache Week issue 102. The major changes being discussed were multithreading, filtering, new process models, better system configuration, API changes, and changes to the configuration syntax. Some thought was also given to rewriting Apache in C++, but this idea was later dropped. In June 1998 the Apache core developers met for the first time to discuss the organisational structure of the Apache group as well as the plans for Apache 2.0. Apache Week issue 121 covered this meeting. Over a year later, in September 1999, we revisited Apache 2.0 development in a special feature, "Apache 2.0 preview". At this stage a beta was expected in late 1999 or early 2000. In January 2000, Apache Week issue 181, there was discussion within the group about how to deal with feature additions to the stable Apache 1.3. It was decided that all attention should be placed on 2.0 development and that no major new features would be accepted into the Apache 1.3 tree. The first Apache 2.0 alpha was launched at the final session of the ApacheCon 2000 conference in March 2000. A number of ASF members on stage updated the website and copied the distribution files into the correct locations live of in front of the audience. At ApacheCon Europe in November 2000, a meeting took place between Ben Laurie (the author of Apache-SSL), Ralf Engelschall (the author of mod_ssl), Mark Cox (Red Hat), and Randy Terbush (Covalent). The meeting was held to decide the fate of SSL support for Apache 2.0, aiming to avoid the current situation of parallel module development for Apache 1.3. The first Apache 2.0 beta was launched at the ApacheCon 2001 conference in March 2001, one year after the first alpha. Between April and November a large amount of internal code changes have taken place, with a few alpha-quality releases. The second Apache 2.0 beta was released in mid November 2001 Apache 2: Release History The first Alpha release of Apache 2.0 was released in March 2000, with the first beta just over a year later 2.0.52 September 2004 Security fix release correcting a new issue introduced in 2.0.51 2.0.51 September 2004 Security and bug fix release. See Apache Week Issue 348 2.0.50 July 2004 Minor security fix release. Includes new mod_log_forensic module. See Apache Week issue 347 2.0.49 March 2004 First release under Version 2.0 of the Apache License; includes fix for a denial of service attack on some platforms, and a substantially rewritten version of mod_include. See Apache Week issue 344 2.0.48 October 2003 Minor bug and security fixes See Apache Week issue 337 2.0.47 July 2003 Security fix release; includes performance fix for PROPFIND response handling in mod_dav. See Apache Week issue 331 2.0.46 May 2003 Security fix release. See Apache Week issue 329 2.0.45 Apr 2003 Security fix release. See Apache Week issue 324 2.0.44 Jan 2003 This release included fixes for two security issues affecting Apache on Windows platforms. It also included bug-fixes. See Apache Week issue 319 2.0.43 Oct 2002 Security fix release. See Apache Week 2.0.43 special feature 2.0.42 Sep 2002 This is primarily a bug-fix release, including updates to the experimental caching module, the removal of several memory leaks, and fixes for several segfaults, one of which could have been used as a denial-of-service against mod_dav. See Apache Week issue 310 Sep 2002 Red Hat include Apache 2 in Linux distributions 2.0.40 Aug 2002 This release fixes a serious vulnerability in Apache 2 on non-Unix platforms such as Windows. See Apache Week 2.0.40 special feature 2.0.39 Jun 2002 This release fixes a denial of service vulnerability in Apache 2. See Apache Week issue 299 2.0.36 May 2002 Second stable release (Apache Week issue 285) 2.0.35 Apr 2002 First General Availability release 2.0.32 Feb 2002 Third beta release (Apache Week issue 284) 2.0.28 Nov 2001 Second beta release Nov 2001 Covalent include Apache 2.0.27 alpha in a commercial product Aug 2001 IBM include Apache 2.0.18 alpha in a commercial product 2.0.16 Apr 2001 First beta release 2.0.14 Mar 2001 Improvements to mod_include and the start of abstracting HTTP specific protocol functions 2.0.11 Feb 2001 This was the first version to use a new release procedure, where the tree would be tagged and depending on the outcome of testing would later be distributed as a alpha, beta, or stable release. An early prototype of SSL support by Ben Laurie was added (Apache Week issue 235) as was a port of mod_proxy to Apache 2.0 2.0a9 Dec 2000 Coverage in Apache Week issue 226 2.0a8 Nov 2000 For this release APR was moved into separate project. Coverage in Apache Week issue 224 2.0a7 Oct 2000 For this release, mod_dav was added. During this alpha cycle, RSA encryption was released into the public domain, removing one of the obstacles for including SSL in Apache. 2.0a6 Aug 2000 This alpha saw the first support for filtering (using bucket brigades). Coverage in Apache Week issue 212 2.0a5 Aug 2000 Coverage in Apache Week issue 211 2.0a4 Jun 2000 The release of the alpha was covered in Apache Today, slashdot, and Apache Week issue 202 2.0a3 May 2000 The release of the third alpha was covered in Linux Today, slashdot, and Apache Week issue 197 2.0a2 Apr 2000 Coverage in Apache Week issue 194 2.0a1 Mar 2000 The first Apache 2.0 alpha was launched at the final session of the ApacheCon 2000 conference. A number of ASF members on stage updated the website and copied the distribution files into the correct locations live of in front of the audience. Announcements were then sent to a number of key sites such as Slashdot and Freshmeat. Coverage in Apache Week issue 190 Apache 2: In the news We highlight some of the news stories from around the web that mentioned Apache 2 In this section we highlight some of the news stories on the web that mentioned Apache 2. May 2001: CNET reviews Apache 2.0.16 Beta and suggests that administrators who are interested to upgrade to Apache 2.0 prepares for the stable release by installing the beta on a development machine. Then test the new features and benchmark its performance in order to speed up the eventual upgrade process. January 2001: Ryan Bloom discusses why the Apache 2.0 beta was delayed October 2000: Apache 2.0 was still in alpha release when an eagle-eyed subscriber in Apache Week issue 219 noticed that the high-profile Napster web site was running Apache 2.0a6. August 2000: C|Net reported on Apache 2.0 in "Apache Web software on verge of major revision". The article interviews a few of the Apache developers and highlights some of the advantages that version 2.0 will bring. They quote that "The final version should be out by the end of the year" (2000) July 2000: In the same week two projects not related to the Apache group announced that they would be integrated into Apache 2.0, even though this was not planned and didn't happen. Apache Week issue 206 looks at the claims from the TUX web server and BXXP protocol projects. Apache 2: Featured Articles We highlight some of the articles on the web that are of interest to Apache 2 users In this section we highlight some of the articles on the web that are of interest to Apache 2 users. In Ryan Bloom's swan song for the Apache 2.0 Basics series, he talks about one of the least publicised new features in Apache 2.0 which is allowing one module to call into another module to execute an operation. In Apache 1.3, for two modules to execute the same operation, the feature has to be implemented in both of the modules, making synchronisation of changes a tedious task. He uses the mod_include and mod_cgi modules to illustrate his points. O'Reilly ONLamp.com brings you the latest information about filters for Apache 2.0 in Ryan Bloom's column. This article is just an introduction to the subject, covering some of the basic concepts of filtered I/O which is the ability for one module to modify the output of an earlier module, listing three standard filters included in the basic Apache distribution, and explaining what filter types are. According to Ryan, developers have improved the interface over the past few releases so that the complex task of writing filters becomes easier. In "Apache 2.0: The Internals of the New, Improved A PatCHy", Ibrahim F. Haddad gives an overview of Apache 2.0 and shares with us the results of his Apache 2.08 performance tests. In conclusion, he highly recommends that current Apache 1.3.x users upgrade to Apache 2.0 once the release version is available. Ryan Bloom kicks off a new series of columns about Apache 2.0 for O'Reilly Network readers with his first column - "Installing Apache 2.0". This piece proves to be merely a rehash of his previous Apache 2.0 articles except for a mention of mod_tls. eWEEK Labs tests Apache 2.0.16 Beta and provides a brief review about its features and shortcomings. In CNet Builder.com, Ryan Bloom explains how Apache 2.0 is more than a web server as it has the potential to serve any protocol. He reveals the benefits of using a single server for multiple protocols and the way to implement it using Apache 2.0. In "Filtering I/O in Apache 2.0", Ryan Bloom explains how filtering in Apache 2.0 works, how modules can make use of it, and the basic concepts for writing filters. Ryan Bloom investigates writing an input filter for Apache 2.0 and shows the power of input filters with mod_apachecon as an example. Ryan Bloom tells C|Net Builder.com readers how to download, build, and install the Apache 2.0 alpha releases. Apache Today gives a concise guide on how to setup and compile Apache 2.0. Apache Today explains some of the new technology that is inside the fourth alpha of Apache 2.0. "Looking at Apache 2.0 alpha 4" takes a detailed look at reliable piped logging and the issues of running CGI scripts from a threaded web server. LinuxPlanet have a feature about Apache, "Using the Apache CVS Repository". The article explains how the Apache developers use a master code repository for the work on Apache 1.3 and 2.0. Anyone interested in keeping up to date with the cutting edge developments of Apache can use the described methods to maintain their own copy of the source tree, whilst easily keeping with the changes being made by the Apache developers. O'Reilly Open Source convention in San Diego Cox, Mark J Orton, Joe Apache Week visited the five day O'Reilly Open Source Conference in San Diego this week and found an overwhelming source of Apache information. O'Reilly Open Source Conference: Day 1 Apache Week visited the O'Reilly Open Source Conference in San Diego this week and found an overwhelming source of Apache information. Day one comprised of a selection of tutorials including several Apache tracks, and a showing of the film "Revolution OS" It is exactly a year ago that we had the pleasure of visiting Monterey California to report on the 4th O'Reilly Open Source software convention (Apache Week issue #208). When we managed to get invited back to San Diego in July 2001 we thought we'd been given the ideal assignment; we get to fly to California in July, avoiding the British rain, and spend a week right on the West Coast with other open source gurus and advocates. In fact with only one direct flight a day from England we were unsuprised to find a large number of delegates on the plane; wearing Penguin badges and snapping pictures of the clear views over Greenland with a variety of digital cameras. To accommodate feeding over a thousand delegates, the conference had erected a huge tent outside the hotel with views overlooking the harbour. It was there we started off Monday morning with the complimentary breakfast. The conference was split over two buildings, with a 10-15 minute walk between the two. With 16 simultaneous tutorial sessions on the first day and with only two Apache Week staff we found it really hard to choose between the talks. We spoke to other delegates who had been similarly overwhelmed by the choice. Apache Week has reported on the ApacheCon and O'Reilly conferences over the last few years, so this time we wanted to avoid the talks that were copies of ones we've already covered. We decided to mix Apache talks with others that seemed new or interesting. Matt Sergeant gave the first tutorial we visited on his XML application server for Apache, AxKit. AxKit performs a similar function to the Apache Cocoon project, but is written in Perl and C rather than Java. Matt even describes AxKit as "the C version of Cocoon". AxKit was born to as a way of collecting together the various Perl XML technologies and using them to deliver the same XML data in different formats. The use of XML allows for the separation of content, presentation, and logical site management. The tutorial focussed on the various Perl XML tools available, the evolution of AxKit, and ways to use the result to power both static and dynamic sites. Matt highlighted some exciting and powerful features of AxKit: the intelligent compression of pages being returned to the client (gzip), the ability to parse and serve OpenOffice files on the fly, and AxPoint which powered his presentation by converting an XML outline to PDF. AxKit allows any number of ways to process the XML for output; from the well known (but steep learning curve of) XSLT to XPathScript which has been designed to allow easy dynamic functionality and is also found within Cocoon. Future plans for AxKit were covered, these included a port to Apache 2.0 and a complete Content Management System. **************************************** not used potential use on Apache Week, powerful blah 24 people Ran through the Perl tools and modules used for parsing, including the pyx tools that we had not come across before. SAX is exciting as it allows a streat of XML to be chained with no tree stored in memory. Q"PHP XML solutions are not very strong" Q"Programming in C always makes you feel a bit more powerful" AxKit leverages Apache for efficient caching Can use Apache::Filter to get XML from a previous handler XSLT; Q"best to get a stylesheet guru to convert that" ************************************************************ After the provided lunch we headed over to the Perl for System Administrators talk. The presenter, David Blank-Edelman, played music and danced around the hall to get into the mood for the tutorial. The talk had a heavy bias towards security, giving reasons why administrators should be paranoid and numerous stories and anecdotes about hacks and security vulnerabilities. David suggested some best practices that can help protect your scripts; for example there is no need to run a log analysis script as root. Other areas where users can overlook potential security problems are when appending to files, or creating temporary files in Perl. Although this talk was primarily about Perl, David made the important point that "a cutting sysadmin is platform agnostic", and his tips applied as much to sysadmin scripts as to CGI programs. Also that afternoon, Jim Whitehead presented a tutorial on WebDAV and Apache. Jim, the chair of the IETF's WebDAV working group, began by giving a brief overview of authoring over HTTP, and gave examples of how collaborative web authoring can take place using WebDAV. The current state of client and server support was described, and an insight into some of the future extensions of the DAV protocol was given (including versioning, searching and access control). The talk continued giving a detailed description of the DAV protocol, explaining the support for properties, and the overwrite prevention mechanisms. The tutorial finished up with a guide to setting up the WebDAV module for Apache, mod_dav, covering the basic operation of the module and the usual configuration issues. Jim noted that Apache 2.0 bundles mod_dav inside the source tree, making it easier to set up than Apache 1.3, where mod_dav must be compiled as an external module. In the evening we took a coach to a local multiplex cinema for the west coast premier of the film "Revolution OS" by director J.T.S. Moore. The aim of the film was to document the history of the open source movement from Richard Stallman's founding of the GNU project, through the VA Linux IPO, to events taking place today. The film focussed on the key people responsible for a few of the historical turning points in the movement. Early into the film, Eric Raymond said that "Apache was the killer app[lication]" and was responsible for the mass adoption of the Linux operating system. A number of other key people were interviewed including Brian Behlendorf from the Apache Project and Michael Tiemann from Red Hat. We were impressed at the balance and accuracy of the film, especially the positive way the people interviewed were portrayed. The film would be interesting to engineers as well as outsiders. At the end of the film the director took questions from the audience aided by Eric Raymond and Bruce Perens. They explained that the film took two years to make and was planned to be shown in the future at film festivals and other conferences. ************************************* old stuff For the first tutorial of the day we wanted to learn more about the ..... Mention multiple conferences at the same place and how press can get between them Mention that we were expecting deja-vu talks on Apache from Ryan Bloom and the like Matt is doing 7 sessions O'Reilly Open Source Conference: Day 2 Apache Week visited the second day of tutorials at the O'Reilly Open Source Conference in San Diego, which ended with a lightning talk from Perl creator Larry Wall We kicked off the second day much as the first, spending our breakfast trying to decide amongst the 17 simultaneous tutorials. Amongst the sessions we didn't get to see was Ryan Bloom's "Writing an Apache 2.0 Filter" which was given to a small, but enthusiastic group of developers. We hear a lot of positive comments from people using the python-based Zope application server so decided to attend the tutorial "Introduction to Zope" given by Mike Homyack. Mike ran through what Zope is, and its architecture, telling us that "Zope is full Object-Orientated" and "really good at dynamic stuff". Zope has a built in server, z-server, that handles access to the internal content via a number of mechanisms including HTTP, FTP, and DAV. It is usual to let Zope handle all your web site content, but in most situations another server such as Apache or a reverse proxy such as Squid is placed in front in order to accelerate any static content. The main zope.org site itself uses Zope together with Apache; using Rewrite rules to proxy and cache requests to a Zope backend. Zope currently has its own license but we were told that there was "motivation to give Zope some license like Python" to make it GPL compatible. Zope is in production use by some major companies including CBS New York. At the same time as the Zope tutorial, Bruce Momjian gave an introductory tutorial on the PostgresSQL database. Attendees received a complimentary copy of Bruce's book, which the tutorial was based upon. Only a small amount of database expertise was presumed so this talk was very open to beginners. The half-day session allowed many chapters of the book to be covered in reasonable detail, starting with the basic architecture of a database, how to input data, modify data, and make simple queries. The talk then progressed to describe the construction of more complex queries, joins, and how to utilize the relational database capabilities of PostgresSQL. Bruce also presented a follow-up tutorial in the afternoon, covering some of the more advanced features. In the afternoon we visited a talk on "Secure Internet Servers and Firewalls with OpenBSD". Although not directly related to Apache, it was interested to see how much security had been added into the OpenBSD system by default. OpenBSD ships with an SSL-enabled version of Apache by default. We were also lucky to catch the second of a pair of tutorials by Mark-Jason Dominus, entitled "Stolen Secrets of the Wizards of the Ivory Tower". In an enigmatic talk, a set of Perl programming techniques were described including Memoization, the use of iterators, and drew particular attention to closures and anonymous subroutines. The obscure title alludes to the LISP heritage of many of these ideas. In the evening Larry Wall gave an entertaining and lightning talk on the new features in Perl 6. Larry's talk didn't touch on anything Apache related, so if you are interested read all about it in "The State of the Onion 5" at perl.com. O'Reilly Open Source Conference: Day 3 Apache Week visited the O'Reilly Open Source Conference in San Diego last week and found an overwhelming source of Apache information. Today we discovered why Apache was important to large enterprise customers, how to tune mod_perl, the many ways of XML content management, and much more. Wednesday started as usual with the complimentary breakfast. With 14 simultaneous talks split across the two hotel blocks we spent most of our breakfast choosing which to visit. Four of the days tracks were dedicated to Perl, two to XML, and the remainder split across Tcl/Tk, Mozilla, mod_perl, Java, MySQL, Python, and Emerging Topics. The dedicated Apache track was due to start on Thursday. We noticed that the number of Perl tracks had shrunk slightly this year, with other open-source technology tracks becoming more prominent. In particular we were pleased to see the two XML tracks, something we said was missing from last year. Before the keynotes of the day a short film was shown which was made up from interviews of the various conference attendees during the tutorial days. Tim O'Reilly appeared on stage and reminded the packed ballroom that we should "think the Internet" and think of "technologies such as Apache, PHP" and not just Linux. Fred Baker, previous chair of the IETF, gave his keynote presentation titled "Will the next Internet generation still depend on open source?". He explained that although Linux was the only real technology that could threaten Windows and that successful open source is "all about getting good documentation and predictable quality". He welcomed the involvement of commercial interests in open source: "Once the open source technology has to be used by real people then real companies have to do code freezes and manage the development in a way that makes a quality product". He predicted that in the coming years we'll see more open source projects in partnership with the business world. Open source leads to rapid prototyping and exploratory code, with the business partnerships being able to productise them. W. Phillip Moore from Morgan Stanley Dean Witter then took the stage to show "an open source success story on Wall Street". He showed why open source was important to their business, allowing them to tailor existing applications to their complex environment with a bit of Perl glue thrown in. MSDW are an enterprise class business that have decided to slowly migrate from using Sun hardware with Solaris to using commodity hardware and Linux, with Apache as their primary web server. They've also made contributions back to open source, and have been covertly submitting patches back into the community as well as funding open source development. "It all comes down to vendor risk management", he said, with proprietary software "you're placing a bet on the security of that company and the security of their product, a bet you're not always aware you're making". With open source this dependency is removed and it's possible to get enterprise level support for open source software from a number of vendors. Also taking place at the convention was the O'Reilly summit on Open Source strategies, aimed at CTOs, CIOs, and CEOs who want to find out how to use open source as a strategic advantage. Although this summit was separate to the main conference we decided to take a look at the opening talk given by Tim O'Reilly, and the subsequent panel discussion with the economist Hal Varian, Brian Behlendorf, and Michael Olsen from Sleepycat. To begin the session, Tim O'Reilly discussed the reasons underlying the success of the Internet and Open Source software, finding many common themes. The highlights were the emphasis on decentralisation, the combination of many small modules into large complex systems, and the ability to easily extend existing technologies - all important to the wide adoption seen in both arenas. By looking at current trends, Tim talked about some emerging projects which may prove key to the Next Generation Internet. One of the biggest challenges for Open Source and Internet companies is the search for an appropriate business model. The panel discussion which followed the talk gave many interesting insights from those who have been successful in that search. Brian Behlendorf spoke about the need to identify which intellectual property is released freely, and which is "owned" by the company generating it. All speakers noted that embedded systems would be increasingly important. After lunch, Apache Software Foundation member Ask Bjoern Hansen gave a talk on how to use mod_perl in an efficient way. He explained that it is generally preferable to use mod_perl statically compiled into Apache instead of as a dynamic (shared object) module. However, by doing this you end up with a server that has a much larger memory footprint and since the majority of the time the server is dealing with buffering data to slow clients, this is wasted overhead. The solution presented was to run a separate server that has mod_perl compiled into it behind a reverse proxy. Apache can also be used as this reverse proxy and can serve static content as well as cache the content created by the dedicated Apache+mod_perl server. In this way the memory usage can be decreased and performance increased. The slides from the full presentation are available online. *** Be interesting to write a white paper comparing this to the gains from something like Tux or another accelerator in front of mod_perl. This is also the way you can run Apache 2.0 and still reliably use mod_perl *** There were a large number of talks throughout the conference on SOAP and XML-RPC. Matt Sergeant took a step back to examine what all the fuss was about in a short talk renamed "Why SOAP sucks, Why SOAP rocks". He started out by asking why we are using SOAP when we could use HTTP instead, since HTTP already has all the features that are normally needed, and more. Using HTTP natively allows caching and logging for example. The talk then showed how to do SOAP without SOAP; using mod_perl to control the URL space and using Perl HTTP modules for the transport. The current major advantage of SOAP is that modules such as the Perl SOAP::Lite module exist which allow applications to be developed quickly and easily. There currently is no simple library that would do the equivalent directly over HTTP. Finally we were shown some services that are already doing the equivalent of a SOAP transaction without SOAP; such as the ability to get search results from Google in XML format (for example try http://www.google.com/xml?q=apacheweek). The slides to this talk are available online. For the remainder of the afternoon we visited the XML track; in particular we were interested in XML application servers. The first session "XML Content management using XSLT, Schematron and Ant", showed one extensible way of serving XML content to browsers. Following that talk a panel discussion "XML-based Application Frameworks" took place. The basic idea of an XML application server is that you create all the content for your site in XML. The use of XML allows the separation of content from presentation, a useful extra abstraction layer. The XML content can come from static files, from a database, or be dynamically generated content from scripts. In its simplest form you take your XML content then apply a style-sheet to generate HTML for a browser. Application servers usually perform this style-sheet conversion on the fly, caching the results for speed. XSLT is one language that is used to transform XML data in this way. Tools also exist that will take XML and generate PDF, Postscript, presentations, (and more) on the fly. The most well-known open source XML application server is Apache Cocoon, which relies on Java. Other solutions such as AxKit (C/Perl/mod_perl), Charlie (C/C++/Perl/mod_perl), and technologies such as Xerces/Xalan (Java), and Sablotron (Java), and LibXML/LibXSLT (C), are also available. Even scripting languages such as PHP now have their own XML solutions, although during his tutorial earlier in the week mod_perl guru Matt Sergeant said that the "PHP XML solutions are not very strong". When the attendees were asked which application server they were using for their applications, the majority said they were using a system they developed themselves (home grown) from the underlying technologies. The rest were a pretty even split between the application frameworks listed. However, having such a wide choice of technologies and servers is no bad thing. As one panel member said "no matter what, if your content is in XML you win". Brian Ingerson presented this talk on the award-winning Inline module (which only celebrated its 1st birthday a few days before the conference). Inline.pm allows programmers to embed code from a variety of programming languages directly inside a Perl script, from C, C++, and assembler through to Java and Python. Brian covered some of the advanced features available when using using embedded C, notably caching of compiled object files. A demonstration was given showing some "one-liners" using Inline.pm, including an ASCII Mandelbrot set generator. The talk went on to discuss some of the different ways to use Inline.pm: replacing the traditional usage of XS and MakeMaker, and also explained how to extend the module to support new languages. O'Reilly Open Source Conference: Day 4 Apache Week visited the O'Reilly Open Source Conference in San Diego last week and found an overwhelming source of Apache information. Today we discovered about Microsoft and the Apache license, where is Apache httpd 2.0, and had a demonstration of mod_perl 2.0, Wednesday had ended with a night of Mexican food and drink in the conference tent, followed by a party from Stonehenge. Even with all the free drink and food the night before, by 8.45am on Thursday the ballroom was packed for the much anticipated debate between Craig Mundie of Microsoft and Michael Tiemann of Red Hat. The details of the debate has been covered in a number of other articles. However, we were interested in the comments with relevance to Apache made during the panel discussion. Craig Mundie stated that Microsoft's concern was not about open source but "about the GPL" as it "creates it's own closed community". Tim O'Reilly commented that University licenses (like the BSD License and Apache Software License) "give the best balance between freedom and the right to make money". Also on the panel was Apache Software Foundation member Brian Behlendorf, who said the Apache model has worked well to build up momentum. Although with the Apache license there are no obligations placed on commercial users, history has shown that the companies involved do re-invest and give back to the community. aspects of open source movement - and free software movement press has been confused Microsoft comments are around free software movement open source isn't the issue said Mundie Microsoft is trying to learn form the open source community at large since may we've tried to get our developers involved in community programs MT better to be open than to seem open we want ms to do what is right it's like the alternative minimum tax; it's a proprietary software license bb: important while building community companies - build Apache + momentum no obligations but the companies do need to re-invest, to build it back up [look for transcript] cm: give back other things, standards, not just code "Packaging is more important" (binaries etc) CM: "we are in the business of licensing IP" BB: "like DNS" CM: "our concern about GPL is it creates its own closed community" TOR: Uni license like ASFL "best balance between freedom and the right to make money" With the provocative title "Apache 2.0; where is it?", Ryan Bloom proved a popular start to the Apache track, with over 80 attendees packed in to hear his session. The aim of the talk was to cover what was new in Apache 2.0 but also answer the question of why Apache 2.0 is taking so long. Ryan explained that since Apache is now so big there are "only three or four people who know 100% of Apache 2.0", and that fortunately he was one of them. The new features of 2.0 were then explained, stopping at Layered IO which is "the Holy Grail" of Apache. Ryan then gave a demonstration of Apache 2.0 acting as a POP3 server to show that it is easy to have Apache serve up other protocols as well as HTTP Apache Week asked Ryan if he was correct in using the name "Apache 2.0" throughout his talk given that the Apache group have a number of other products and that the binary downloads have been renamed to "httpd". Ryan said that the name was officially "Apache httpd 2.0" but hinted that there was talk of changing the name to something other than httpd in the future. To answer the question of Apache 2.0 availability Ryan said that he expected to see a full release "next year." After attending the PostgreSQL tutorial on Monday, we decided to follow up with this talk from Gavin Roy, which gave a practical guide to using PostgreSQL in web applications. Gavin gave an overview of which web platforms could make use of a PostgreSQL database (for instance, PHP and Perl), and gave testimony to the product's reliability and performance in large scale web applications. The talk proceeded to discuss the architecture of systems using a web server together with a PostgreSQL database, covering the advantages and disadvantages of using a single machine or two separate machines. Some tips on optimising performance in a production database were also given, emphasizing the use of database indices, and regularly vacuuming the database. In closing, Gavin briefly covered security, authentication and authorization issues when using Postgres in a web environment. http://www.blackperl.com/OSCON/openrpc/ Intro to OpenRPC. Explain why not XMLRPC or SOAP. Abstraction of XMLRPC to separate out transport layer. Allow transport over non-HTTP (SMTP, etc) "XSLT and scripting languages" we've already covered XSLT primarily a language between XML vocabularies integrated with .net, Apache, IE, Java etc. XSLT is not a scripting language it is a programming language in a way PHP was optimized for a task therefore it is popular XSLT makes input based recursion easy To end the day we were expecting good things from Doug MacEachern's talk on "mod_perl 2.0". We were not disappointed as over 50 people packed into the last mod_perl session to hear a heavy technical talk about Apache 2.0 and mod_perl. Doug showed Apache 2.0.22-dev working with both mod_ssl and Perl/mod_perl. This is perhaps the first demonstration of its kind, as mod_ssl is only just becoming usable in the Apache 2.0 tree. He continued and took a program that communicated entirely using stdin and stdout (in this case a NNTP server) and showed how it was easy to make this function as a Apache protocol handler. This allowed Apache to serve newsgroups to his news reader, whilst still allowing other filters to be included such as SSL and authentication. Future plans for mod_perl 2.0 include the ability to write a MPM completely in Perl, and to continue with the Apache-TestKit, a package not tied to mod_perl that has been designed to test Apache. Doug said that there was still plenty left to do on mod_perl even though it currently seems stable and that there would be "probably a release of some sort at the end of the summer." At the same time as the talk on mod_perl, Paul Weinstein was giving his popular introduction to mod_ssl in the Apache track. The history of mod_ssl for Apache 1.3 was discussed together with some of the decision making process for including mod_ssl in Apache 2.0. The slides to this talk are available online. In a pair of talks which attracted 60 people into a room designed for 40, the speaker known as "chromatic" described the basics of the Extreme Programming (XP) software development method, and in particular their application in the Open Source world. The first talk gave an introduction to XP, its differences from more traditional software development, and the motivations behind the techniques it uses to promote the development of high quality software. The talk highlighted that the most important aspect of XP is the emphasis on writing unit tests, and also covered the principles of incremental change, and pair programming. The room remained packed into the second half of the session, where chromatic discussed how XP can be used within Open Source software (OSS) development. Some elements of XP are already employed in many OSS projects, for instance, the tight feedback loop between users and developers. Many other XP techniques could also be usefully employed, but some, such as pair programming, were considered inappropriate in the majority of Open Source development. O'Reilly Open Source Conference: Day 5 Apache Week visited the O'Reilly Open Source Conference in San Diego last week and found an overwhelming source of Apache information. We found out why enterprise customers like commercial support, all about the APR, and visited the exhibition Last year, the conference sessions were held over just two days and we were pleased to see they were extended to a third day to fit in more presentations. Friday consisted of the extension of tracks from previous days together with tracks dedicated to PHP, Zope, and Open Source Speech. After breakfast, Michael Tiemann was the moderator for the morning keynote looking at the "big hairy problems: open source challenges in the enterprise". The first speaker was from DreamWorks, the animation company behind such epics as Antz, Chicken Run, and now Shrek. He told us how DreamWorks were slowly switching thousands of machines from SGI to Linux giving them increased performance and value for money. When working on their strategy for adopting Linux they analysed six key factors: performance, scalability, stability, software, support, and transition. W. Phillip Moore from Morgan Stanley Dean Witter took the stage and built upon his previous keynote. He explained that it was important that the enterprise customers have a support number they can call with problems, the ability to get fixes to existing problems, and the ability to get enhancements. He complimented Covalent and Red Hat specifically but said that there was a need to see more companies providing commercial support for open source software: "you need to know there is a 800 number and a staff of people that will be able to solve the problem." thing about open source is the low barriers to entry said Tim o embrace new technology to make.. "using animation to tell stories" Shrek: 30% desktops (200) Linux, 50% renderfarm (1000+) Linux "XML" stuff - not exciting for AWeek Spent the morning at Pervasive XML; infosec infoset to XML is like number to numeral pcdata for 23/23.0/2.3e1 is not the same - needs schema Post Schema Validation InfoSet (PSVI) says they are the same; knows about the objects Formatting objects: Rendering XML Ryan Bloom gave this talk on APR, the Apache Portable Run-Time, which began with a quick history lesson explaining how Apache 1.3 addressed portability issues, and how APR and Apache 2.0 grew out of that experience. Ryan explained what the initial goals for the library were, and showed how it provides an abstraction layer for commonly used operating system interfaces which has been ported to a range of 50 Unix platforms, BeOS, Windows, and OS/2. The talk gave a breakdown of the different components which make up APR: from file and network I/O, memory handling, through to some of the more complex interfaces providing threading support. For each component an overview of the API was given, showing how it could be used in applications. Ryan also gave an insight into why various OS interfaces (such as POSIX) cannot be used portably, justifying the need for the abstraction layer which APR provides. To give a more in-depth look at the API, the talk gave a walk-through of a code sample using the threading interface, and took a look at some of the test code present in APR which exercises most of the library's capabilities. Although APR's primary user is the Apache httpd server, the library is also used by a number of other projects such as Subversion. Paul Weinstein closed off the afternoon with his talk all about private certificate authorities. The session showed the basics of how to create and then use a private certificate authority, then went into the more advanced details. The examples were based around the OpenSSL toolkit; showing which parameters to use on the OpenSSL command line, and how to integrate the certificates into Apache with mod_ssl. Finally the tricky subject of certificate revocation was covered. The slides to this talk are available online. Oh, I think around 30 people or so, less then the frs presentation. There where a few questions on supporting MS products with the certificates created by OpenSSL (both IIS and IE) an few on keeping tack of what's been create and how it relates to everything (i.e keeping track of private and public keys for CA, client, server, CSR files for client, server where they go what file is for what step) and a few on creating a security policy for the CA, how to verify information within a company, how strict/informal you can be about managing your CA) http://www.weinstein.org/redhat/presentations/ The vendor exhibition area was very popular with a large number of companies attending. We didn't find much information specific to Apache at the exhibition: NuSphere were giving out MySQL CDs that come with a packaged version of Apache, and Red Hat had some information on their Apache services. However there were plenty of free promotional t-shirts to add to our collection, as well as more of the flashing clear rubber bouncy balls we picked up last year from collab.net. Oh, and let's not forget the "Apache by night" Apache Week postcards of course. Overall Impressions Apache Week reflects on the busy five days of the O'Reilly open source conference. We will be back to normal next week and catching up on the Apache news and features from the last few weeks. Even if you were not interested in any of the other tracks there were plenty of talks and tutorials relevant to Apache users, although a number of them were direct copies or updates of talks given at previous Apache conferences such as ApacheCon 2001. Apache Week talked to a large number of the attendees of the conference and the overall impression was very positive. One attendee said that "the keynotes alone were worth the trip". We were also particularly impressed by the child care facilities; allowing conference speakers and participants to bring their families and enjoy a mini holiday in San Diego. The night-time activities and the food was also excellent. The only complaint we heard repeated by a number of attendees was that lunch was not included on Friday, even though there was a full day of sessions. With 802.11b wireless internet connectivity to most of the conference rooms it was hard to escape from work; and with five intensive days packed with new material we found ourselves tired and in need of a holiday by the end of the week. Next time we'll bring our swimming trunks and sun cream. Please note that although Apache Week is an O'Reilly Network affiliate, O'Reilly had no editorial control over this review of their conference, even though they did give us free beer. Apache Week will give you our unbiased opinion of all the conferences we attend that have things of interest to Apache users and developers. For more coverage of the rest of the conference visit the O'Reilly Network web site. Larry wall quotes I don't want to lose: on debuggers: I'm an -insert print statement- guy" on security: "For some definition of security, Perl 6 will be secure" *** Linux on S390 could become strategic said MSDW who use Covalent *** New in Apache 1.2 All about what is coming up in Apache 1.2 Coming in 1.2 The next release of Apache will be version 1.2. The big new feature in 1.2 will be support for HTTP/1.1, the new version of the Hypertext Transfer Protocol. Apache 1.2 will fully support this protocol as a Web server (except for the proxy module). HTTP/1.1 is now going through the final stages of approval as an Internet standard, and Apache 1.2 cannot be fully released until it is approved, since there might be changes. However, HTTP/1.1 is not the only change in Apache 1.2. There will be quite a few other new features, new optional and core modules, and many bug fixes. In this feature, we list all the major changes coming in 1.2. Each change is accompanied by a link to the Apache Week issue where it is reported in more depth. Better Content Type Negotiation (Apache Week 16 August 1996) To overcome problems with some browsers, Apache's content negotiation algorithm has been updated to better guess what content type the browser wants. Conditional configuration directives (Apache Week 26 July 1996) Configuration directives can be ignored if the module they are defined in is not compiled in. Customising for Browsers (Apache Week 09 August 1996) It will be possible to set environment variables based on the browser's user-agent text. This will give CGI and SSI scripts a simple way of customising their output based on features available in the browser. Easier CGI script debugging (Apache Week 09 August 1996) It will be easier to debug CGI scripts, because Apache can now log the full input and output of the scripts. Faster persistent connections (Apache Week 02 August 1996) Various changes have been made to increase the speed of persistent connections. File, Directory and Limit can take regular expressions (Apache Week 02 August 1996) The files and URLs affected by each of these sections can be defined by regular expressions. Graceful Restarts (Apache Week 28 June 1996) Apache can re-read the config files and re-open log files without terminating transactions in progress. HTTP/1.1 support (Apache Week 16 August 1996) Apache will be 'unconditionally compliant' with the HTTP/1.1 specification (except for mod_proxy). Limit access on per-file basis (Apache Week 02 August 1996) A new section, <File>, can be used to restrict access on a file-by-file basis. It will now be possible to (for example) password protect a single file. More SSI Commands (Apache Week 09 August 1996) The 'extended SSI' module (XSSI) will replace the current server-side-includes module. This will give a number of powerful new features, such as the ability to set variables and use conditional statements. Multiple configurable log files (Apache Week 16 August 1996) More than one log file can use used, with the log format fully customisable. This reduces the need for addition log modules (mod_log_referer or mod_log_agent), and make it much easier to add customised log files and formats. PICs module (Apache Week 12 July 1996) An optional module will be included which can provide PICS labels. Resource limits for CGI scripts (eg max CPU time) (Apache Week 02 August 1996) To prevent runaway processes, the resources used by CGI scripts can be limited. Rewrite module (Apache Week 09 August 1996) The 'rewrite' module will be included with Apache for the first time. This module can be used to map incoming URLs onto other URLs, using regular expressions. Setuid CGI execution (Apache Week 09 August 1996) Apache will support the execution of CGI scripts as users other than the server user. A number of security checks will be built in to try and make this as safe as possible. Simplified configuration file format (Apache Week 09 August 1996) The process of configuring Apache for compilation has been simplified. User Tracking (cookie module) Updates (Apache Week 02 August 1996) It will be possible to disable the generation of cookies, even when the cookie module is compiled in. Also, an expiry time can be set of the cookies. A number of bugs from 1.1 have been fixed. Some of the major ones are: KeepAlive connection problems on some browsers If ErrorDocument redirect fails, displays filename Negotiation module negotiates on proxy requests (eg proxy:") then fails Scoreboard out of date (shows PID of children that have died) Problem with ScriptAlias including path info data <Location> matches directory sections support HTTP continuation headers mod_dir truncates file size Updates for QNX, OS/2, A/UX, IRIX, AIX compile warnings or system-specific behaviour Apache 2.0 Preview A preview to development work on the next generation of Apache, version 2.0 Apache 2.0: The Next Generation Over the last few months, we've received many queries about why Apache Week had little to report of Apache 1.3 development. Most of the Apache developers have been hard at work writing the next generation of Apache, version 2.0. Ryan Bloom takes time out to summarise the development effort. First published in Apache Week issue 173 (24th September 1999). It has been about a year since Apache 1.3 was released, and the core Apache members are now working on version 2.0. The new version will be significantly different to the current one, which raises issues such as "Why update Apache at all?" and "What does this update mean for Apache administrators?" We hope to answer those and many other questions in this article and, as the release of 2.0 approaches, provide more up to date information. It is important to note that presently there is only development code available for 2.0 and that downloading it now is not advised for anybody other than those who are already familiar with the Apache internals. The code in its current state is not guaranteed to compile from day to day or to work on many platforms. Apache Week will announce any upcoming alpha or beta versions and the details of the 2.0 release as soon as they are ready. Apache 1.3 is a great web server which serves pages for the vast majority of the web, but there are things it can't do. Firstly, it isn't particularly scalable on some platforms. AIX processes, for example, are very heavy-weight and a small AIX box serving 500 concurrent connections can become so heavily loaded that it can be impossible to telnet to it. In situations like this, using processes is not the right solution: we need a threaded web server. Apache is renouned for being portable as it works on most POSIX platforms, all versions of Windows, and a couple of mainframes. However, like most good things, portability comes with a price which in this case is ease of maintenance. Apache is reaching the point where porting to additional platforms is becoming more difficult. In order to give Apache the flexibility it needs to survive in the future, this problem must be resolved by making Apache easy to port to new platforms. In addition, Apache will be able to use any specialised APIs, where they are available, to give better performance. Multiple-Processing Modules (MPM) The original reason for creating Apache 2.0 was scalability, and the first solution was a hybrid web server; one that has both processes and threads. This solution provides the reliability that comes with not having everything in one process, combined with the scalability that threads provide. The problem with this is that there is no perfect way to map requests to either a thread or a process. On platforms such as like Linux, it is best to have multiple processes each with multiple threads serving the requests so that if a single thread dies, the rest of the server will continue to serve more requests. Other platforms such as Windows don't handle multiple processes well, so one process with multiple threads is required. Older platforms which do not have threads also had to be taken into account. For these platforms, it is necessary to continue with the 1.3 method of pre-forking processes to handle requests. There are multiple ways to deal with the mapping issue, but the cleanest is to enhance the module features of Apache. Apache 2.0 sees the introduction of 'Multiple-Processing Modules' (MPMs) - modules which determine how requests are mapped to threads or processes. The majority of users will never write an MPM or even know they exist. Each server uses a single MPM, and the correct one for a given platform is determined at compile time. There are currently five options available for MPMs. Their names will likely change before 2.0 ships, but their behaviours are basically set. All of the MPMs, except possibly the OS/2 MPM, retain the parent/child relationships from Apache 1.3. This means that the parent process will monitor the children and make sure that an adequate number are running. PREFORK This MPM mimics the old 1.3 behaviour by forking the desired number of servers at startup and then mapping each request to a process. When all of the processes are busy serving pages, more processes will be forked. This MPM should be used for older platforms, platforms without threads, or as the initial MPM for a new platform. PMT_PTHREAD This MPM is based on the PREFORK MPM and begins by forking the desired number of child processes, each of which starts the specified number of threads. When a request comes in, a thread will accept the request and serve the response. If most of the threads in the entire server are busy serving requests, a new child process will be forked. This MPM should be used on platforms that have threads, but which have a memory leak in their implementation. This may also be the proper MPM for platforms with user-land threads, although there has not been enough testing at this point to prove this hypothesis. DEXTER This MPM is the next step in the evolution of the hybrid concept. The server starts by forking a static number of processes which will not change during the life of the server. Each process will then create the specified number of threads. When a request comes in a thread will accept and answer the request. At the point where a child process decides that too many of its threads are serving requests, more threads will be created. This MPM should be used on most modern platforms capable of supporting threads. It should create the lightest load on the CPU while serving the most requests possible. WINNT This MPM is designed for use on Windows NT. Before Apache 2.0 is released, it will also be made to work on Windows 95 and 98 although, just like Apache 1.3, it is unlikely to be as stable as on NT. This MPM creates one child process, which then creates a specified number of threads. When a request comes in it is mapped to a thread that will serve the request. OS/2 This MPM is designed for use on OS/2. It is purely threaded, and removes the concept of a parent process altogether. When a request comes in, a thread will serve it properly, unless all of the threads are busy, in which case more threads will be created. Multi-processing modules are designed to work behind the scenes and do not interfere with requests in any way. In fact, its only function is to map the request to a thread or process. One advantage of this technique is that each MPM can define its own directives. This means that if you are using a PREFORK MPM, you won't be asked how many threads you want per server, or if you are using the WINNT MPM, you won't need to specify the number of processes. Will Apache 1.3 Modules work? Modules written for 1.3 will not work with 2.0 without modification. There are many changes which will be documented by the time 2.0 is released. In Apache 1.3, each module uses a table of callback routines and data structures. Instead of using this table to specify which functions to use when processing a request, 2.0 modules will have a new function to register any callbacks needed. In the past, new features have been added to subsequent releases of Apache which required the callback table to be expanded causing existing modules to break. In 2.0, each module is able to define how many callbacks it wants to use instead of using a statically defined table with a set number of callbacks. If the Apache Group decides to add callbacks in the future, the changes are less likely to affect existing modules. Many things have been abstracted in Apache 2.0 and there are many new functions available. This means it will no longer be possible to access most of the internals of Apache data structures directly. For example, if a module needs access to the connection in order to send data to the client, it will have to use the provided functions rather than access the socket directly. The Apache Portable Run-Time (APR) APR was originally designed as a way to combine code across platforms. There are some sections of code that should be different for different platforms as well as sections of code that can safely be made common across all platforms. Apache on Windows currently uses POSIX functions and types that are non-native and non-optimised for communicating across a network. By replacing these functions and types with the Windows native equivalent there has been a significant performance improvement. For example, spawning CGI processes is very confusing in Apache 1.3 because Unix, Windows, and OS/2 all handle spawning in different ways. By using APR, the logic can be combined for spawning CGI processes, decreasing the number of platform-specific bugs that are introduced later. APR will make porting Apache to additional platforms easier. With a fully implemented APR layer any platform will be able to run Apache. APR is small and well defined and once it is fully integrated into Apache, will change very little in the future. Apache has never been well defined for porting purposes as there was too much code to make porting a simple task. In addition, the code was originally designed for use on Unix, which made porting to non-POSIX platforms very difficult. With APR, all a developer needs to do is implement the APR layer. APR was designed with Windows, Unix, OS/2, and BeOS in mind and is more flexible as a result. APR acts as the abstraction layer in Apache 2.0. To allow the use of native types for the best performance, APR has unified functions such as sockets into a single type which Apache will then use independently of the platform. The underlying type is invisible to the Apache developer, who is free to write code without worrying about how it will work on multiple platforms. When, When, When? Apache 2.0 is a major re-working of Apache that will hopefully result in a web server that can continue to grow and serve the web. As has been traditional with previous Apache releases, the 2.0 upgrade will be made available when it is ready and stable. There is no promised release date although it is hoped that a beta version will be available either late in 1999 or early in 2000. This article covers some of the major changes in Apache 2.0, such as MPMs, module callbacks, and the abstraction layer. Future editions of Apache Week will report on the progress of Apache 2.0 and highlight any major developments. ApacheCon 2000 Europe Conference Report Report from the third Apache conference Report from ApacheCon Europe 2000 This is a special report covering the ApacheCon 2000 conference in Europe held in London. First published 3rd November 2000. ApacheCon Europe 2000, the first ApacheCon outside USA was held on Apache Week's home ground from October 23rd to October 25th. As promised, Apache Week was there in London to cover the conference. Early Monday morning at 8 am, we had a brisk walk from the Hilton London Olympia hotel where we were staying to the Olympia Conference Center about three blocks away. There was no fear of losing our way as shortly after leaving the hotel, we were greeted by a succession of signboards displaying the familiar Apache feather, leading us straight to the conference center. Our first day did not really get off to a good start as during the registration our records were not found in the database. Luckily the organizers were efficient enough to resolve this problem quickly and we were handed our passes and complimentary ApacheCon bags containing three thick manuals of conference proceedings and other goodies. As the conference package included light breakfast and lunch for all days registered, we all had empty stomachs that morning. We really should have taken the word "light" literally as to our dismay breakfast consisted of only one plate of biscuits per table, and tea or coffee. The only difference was you could keep your dirty cup after you had drunk your coffee or tea. As this was the case, there were plenty of seats for us to take our pick but like the other attendees, we did not stay long for breakfast. The conference had three main parallel "unthemed" tracks of classes or talks with one hour, one and a half hours or two hours time slots. There were a total of 42 classes, covering the Apache web server, XML, Java, mod_perl, PHP, and a case study of the real-life implementation of the Apache web server. The classes were spread over three days, including a busy Monday that packed in 21 of the 42 classes. There was also an additional concurrent track of talks by vendors namely, Sun Microsystems, IBM, MyComponents.com, and Oracle, with a sprinkling of BoFs (Bird of a Feather sessions) as well. As we were approaching the auditorium for the opening session at 9 am, strains of a western tune drifted to our ears and for a split second when we stepped into the room, we thought we were transported back in time to the wild, wild west as the formidable figure of Ken Coar loomed above us on the stage with a cowboy hat on his head. Later he revealed that the piece of music we heard was "Apache", one of the many western-themed hits by the Shadows who reigned unchallenged as Britain's top band between 1960 and 1963. After the welcoming speech, Ken Coar proceeded to give an update on the schedule where one talk was cancelled and a few were swapped. This was not good news for those who had already decided on the talks that they were going to attend. For the few unlucky ones, this change caused their chosen talks to be back to back so they have to go through the mind-boggling task of making a choice again. The official number of pre-registered attendees was about 900. For my first of the seven classes on the first day, I decided to attend "Toward the Semantic Web: a View of XML from Outer Space" given by Stefano Mazzocci, Cocoon's creator in the Apache Cocoon project. The attendance was so high, that even with extra chairs some delegates were left sitting on the floor. This talk gave a clear explanation of the XML model and the "semantic web", covering many of the technologies that the W3C are developing to shape the future of the World Wide Web. Stefano described the ways in which XML can be used to overcome some of the problems inherent in today's Web, and demonstrated how they can be implemented using Cocoon, Xalan, and other Apache projects. After 2 hours of XML, an hour of Apache 2.0 by Ryan Bloom was my next stop. The major changes in Apache 2.0 are the implementation of MPM (Multiple-Processing Modules), APR (Apache Portable Run-Time) and I/O filtering. No release date was decided for an Apache 2.0 beta, although Ryan promised it would be as soon as possible. Lunch was served between 12 pm and 2 pm but talks were still being held during these two hours so it was either lunch or class. At 1 pm, I had no choice but to forgo a class as hunger beckoned and I joined one of the two long queues to collect my meal at the reception and bars area. Seats were limited but as the turnaround time was quick (no one loitered at the lunch tables), everyone managed to find a place at the tables in the end. A bit short on space but at least it worked out well. After a meal that was nothing to shout about, I had just enough time to drop by the Sun's Internet Pavilion to check out my emails before joining the next class at 2 pm. It was time for a change so I joined a business-oriented talk instead of another technical one. Peter Moulding gave a few useful tips for convicing higher management to use Apache instead of other proprietary web servers in his "Apache in the Real World - Beating the In-house Bias" talk. After this was another two hours slot class and it was "Introduction to Apache Server" by Rich Bowen for me. This class was more for users new to Apache so I left halfway to listen to "AxKit - an XML Delivery Toolkit for Apache" presented by Matt Sergeant. AxKit is implemented as a Perl Apache module using mod_perl that provides on-the-fly conversion from XML to a variety of format, such as HTML and WML for WAP phones. It provides similar functionality to Cocoon. After attending four classes and missing one due to lunch, there were two more talks to go with 3 hours in total, an hour and a half each. Sterling Hughes, co-author of the soon-to-be-published-in-November "The PHP Developer's Cookbook" gave a very technical talk on "Extending PHP4" covering the PHP API and compiling a PHP extension in detail. The talk covered the new scripting engine in PHP 4, Zend. Like a traditional interpreter, the old PHP scripting engine would execute scripts while parsing them. The new Zend engine operates using the more efficient model of pre-compiling the script. The last class of the first day was a highly entertaining and animated talk by Ralf S. Engelschall, author of mod_ssl, mod_rewrite, and much more. The talk, "Security Solutions with SSL", covered the evolution of mod_ssl, described its features, and gave twelve useful configuration examples. Each of the beautifully presented slides included an amusing quote to lighten up the atmosphere of this heavy subject. After a long day of exhausting technical classes, it was time for a relaxing night event named "The LongevIT Spa" at the Rock sponsored by IBM WebSphere. Round trip transportation from the Olympia Center was provided. Most delegates had absolutely no idea where the coaches were taking them. The Rock is a newly opened nightclub. There were free cocktails and beers; head, neck and shoulder massage; and two virtual reality simulators that emitted smells too but we were too conservative to give the latter two a try. Despite the free flowing drinks, sushi and loud music, we were desperate for a decent meal at 10 pm so we nipped out for dinner and were back by 11 pm for the coach back to our hotel. In doing so, we missed the raffle and the bag of goodies given away by IBM - a pair of slippers, t-shirt, CD-ROM and etc. We were real tired when we reached the hotel and could barely walked to our room. What a day! ApacheCon Europe 2000: Day 2 The schedule for the second day was not as punishing as the first day. There were only a total of 12 classes held on this day with only four to attend with three keynotes. There was ample time for lunch and for visiting the exhibition that didn't start until 12 pm. The first session of the second day was "JCP (Java Community Process) and Apache" presented by George Paolini, Vice President of Technologies and Advocacy. Basically he talked about the role Sun has working with Apache Software Foundation and the roadmap for the Java 2 platform. Juggling between Java Application Servers, mod_snake, and mod_perl, I finally dropped the former two and settled on the latter. In a nutshell, Eric Cholet talked about configuring Apache with Perl using <Perl> sections and @PerlConfig, and configuring mod_perl applications using PerlSetVar and custom configuration directives. The main question is why would anyone write Perl codes inside Apache httpd.conf file? One of the benefits is that in a many virtual hosts environment, Perl codes can be used within the httpd.conf to generate suitable values for directives based on some external variables. Next Dr Kristof Kloeckner, Vice President of Business Integration Development and Director from IBM Hursley Laboratory enlightened us on how IBM relates to open source both as a contributor and a beneficiary. Soon it was lunchtime. Only an hour of IBM Management Briefing, "Infrastructure for Web Services" in the Vendor Theatre overlapped with the two hours lunchtime so there was time to visit the exhibition. Around eighteen companies including IBM, Sun Microsystems, Covalent Technologies, Thawte, Zend Technologies Ltd, Eliad Technologies took part in the trade show. There were a coffee stand in Sun's booth and two romper rooms with pinball machines and two Sega Racing Arcade machines. We picked up more freebies such as a Tomcat cup and t-shirt, cap, and magazines and even tried our hand at a pinball machine but alas, we were not Brooke Shields. Soon it was time for three more classes within the next four hours before the long awaited guest keynote by Douglas Adams. I filled the next three hours with Tomcat by attending "Migrating Apache JServ Applications to Tomcat" by Craig McClanahan and "Advanced Tomcat Configuration and Performance Tuning" by Costin Manolache who was a fast speaker and completed his very technical talk in just an hour within his two hours slot. Then it was an hour of "Improving script and handler performance under mod_perl" by Stas Bekman who unfortunately had to wrap up his talk quickly as delegates were waiting to enter the room for the final keynote of the day. "Living in a Virtual World" was the keynote everyone was waiting for. The whole auditorium was filled to the brink and the audience were not let down as Douglas Adams soon had them in stitches with his urban myths and unique perspective about computers. The last event for the second day was the reception serving cocktails and hors d'oeuvers on the Exhibit Floor. "Bop Ad" was definitely the STAR of the day as fans, ASF members and fellow Apache enthusiasts alike queued for his autograph and a free paperback copy of "The Hitchhiker's Guide to the Galaxy". With that, I ended the day and retreated to the haven of my hotel room. ApacheCon Europe 2000: Day 3 On Wednesday, I was late for the first talk of the day, as I had to check out from my hotel. On this final day of the conference, there were only a total of nine talks running in the three concurrent tracks, two keynotes, a book-signing event, a few vendor presentations by MyComponents.com and Oracle, and not forgetting the closing plenary. When I reached the center at quarter-past nine, all three talks had started. I planned to attend "Running a Successful Web Hosting Business" by Frank DeChellis, one of the two business-oriented talks in this conference. Peering through the glass panel in the door, I couldn't find an available seat in that class so I sneaked unnoticed into the auditorium instead. This turned out to be a good choice, as the talk "Managing your Web Site with Cocoon" was very well presented by Doug Tidwell. Doug, author of an upcoming book on XSLT, demonstrated how the array of tools written by the Apache XML project (including Cocoon, Xerces, Xalan, and FOP) could be used to perform server-side transformations of XML documents. From a single XML document, HTML, PDF and WML could all be served to the client. Next came the Oracle keynote titled "Convincing Management to Embrace Open Software Development" by Brian Behlendorf, president of the Apache Software Foundation and cofounder of CollabNet. He gave a brief definition of open source, and the various licences used such as the Apache Licence, GNU General Public Licence and Mozilla Public Licence. He described how open source software is designed and built using the collective wisdom of a group of developers, with contributions from a large user community. When I heard the word "collective", images of the Borg flashed through my mind. Brian also offered some tips on how to make lawyers less nervous and ended the talk by sharing a list of free buzzwords: "reduce time to market", "increase margins", "expand public mind share" and "take ownership of your future", with the audience. The exhibition hours were only from 12 pm to 6 pm but outsiders were still registering on-site for exhibition passes. I was slightly taken aback when I was stopped and asked to show my pass (it was hidden under my jacket) at the exhibition entrance but I guess they were just being careful. During the two-hour lunch break from the main talks, presentations by vendors were still in progress. I ate lunch leisurely as most delegates had already taken theirs and no one was waiting for my seat. I had a pleasant conversation with two participants from Germany and the USA during lunch. The latter only heard about ApacheCon after the Orlando event. Instead of waiting for the next ApacheCon in the USA, he persuaded his company to send him to this one. The former was in charge of migrating his Netscape web server to the Apache web server all by himself. Both of them were very satisfied with the quality of this conference and the useful technical details that they managed to absorb from the talks. Oblivious to time, I missed the Wrox Press book-signing event at 1 pm by Peter Wainwright, author of "Professional Apache" but still, I managed to pick up a cute horsey toy known as "CocoJ" from Eliad Technologies booth. At 2 pm, I was off to "mod_perl Version 2.0" given by Doug MacEachern. Because of the architectural changes in Apache 2.0, particularly the introduction of thread support, mod_perl has been rewritten from scratch. The presentation was served by Apache 2.0 and the development version of mod_perl 2.0, and Doug demonstrated use of some of the more advanced features of Apache 2 which are supported in mod_perl 2, including I/O filtering. Soon James Davidson took over the stage for the "Guru Keynote" session titled "Jakarta Perspective". This was his personal account of the origins and goals of the Jakarta project. In a spontaneous talk, he reminisced about the history and progress of Tomcat and Ant including an insight into the various obstacles that had to be overcome in getting the ASF and Sun together. In his zest to deliver an up-close and personal look at Tomcat, users unfamiliar with the Jakarta project might have complained that he had neglected to give a clear definition of the Jakarta project. Nevertheless I enjoyed the talk as it provided a glimpse into Tomcat's roots. For the final 2-hour class of the conference, I attended the talk, "The Backhand Project: Load-Balancing and Monitoring Apache Web Clusters" by Theo Schlossnagle. He clarified the differences between "load balancing" and "high availability" since they are often used interchangeably to mean both. Both mod_backhand and mod_log_spread were covered in this talk. Back to back with this class was "WebDAV and Apache" by Greg Stein which other delegates from Apache Week reported was an excellent talk about WebDAV and mod_dav. The closing session hosted by Ken Coar saw only one third of the attendance of the opening plenary. He announced that there were about 1200 registrants (20 percent more than at Apachecon 2000 Orlando) with around half attending only the exhibition. With a panel of ASF members on stage, it was time for comments about the conference. The overall feedback was positive. Some complaints were that the Monday schedule was too tight and the Internet access was slow and not very reliable. One suggestion was to introduce lightning sessions where speakers would talk for five minutes on a subject. Hands-on sessions in the evenings were also suggested. While most attendees came from Europe, there were also some from the USA, Canada, South America and even all the way from Japan. Delegates who attended both ApacheCon conferences this year commented that ApacheCon Europe was definitely better than the previous one held in Orlando. If this is the trend then it is good news as we can expect more improvement in the next ApacheCon, which will be held in Santa Clara, California from April 4th to April 6th. The location for the next ApacheCon to be held outside the USA is yet to be determined, but a hint was dropped about Australia. As in all conferences, there were various technical glitches when presentation laptops froze and batteries ran out, some inexperienced speakers, and not enough seats but these were all minor issues considering the excellent detailed technical knowledge that was imparted by the speakers. An annoying distraction was the occasional ringing of mobile phones during the talks. Perhaps the audience need to be reminded to switch off their cell phones at the start of presentations. My personal opinion is that it is very important to pick suitable talks to attend based on your own requirements, as all of them seemed very interesting from the abstract provided. As soon as you are aware that the talk is not what you expect it to be, you must just walk out and join another talk. This may seem very rude to the speaker but to make the most of the conference, this is the only way. One suggestion is for the planning committee to indicate the level of technical knowledge required for the talk, so delegates can make a better choice depending on their own expertise. This conference was most suitable for "technical technical" people who wanted to know in depth about a certain subject and to talk to the authors of various modules but it also catered for higher-level managers and new users. With that, I end my report and hope to see you all at ApacheCon 2001 in Silicon Valley next year! ApacheCon 2000 Conference Report Report from the second ever Apache conference Report from ApacheCon 2000 This is a special report covering the ApacheCon 2000 conference in Florida held in March 2000. First published 10th March 2000. The conference ran from March 8th to March 10th, at the Caribe Royal Suites in Florida, USA. The hotel is situated very close to the main Orlando attractions and the sessions took place in the conference center of the hotel. In total, just over 1000 people attended the conference and this included a large number of Apache Software Foundation members. At the very first session of the conference, the opening plenary, the previous record for the most Apache developers in the same place at the same time was broken. Apache Week counted 18 developers during the session, 4 more than at ApacheCon 98. While most people came from the US and Canada, there were also a significant number of people from Europe and beyond. This was the second official Apache conference, the first being held in San Francisco in 1998 with over 500 attendees. Some initial pictures from around the conference are available from the Apache Week site. In addition, personal pictures from attendee Kevin Burton are available. Ken Coar opened the conference on Wednesday with a plenary session and a song. Joining him on stage was a selection of the current Apache Software Foundation members. Roy Fielding, current president of the ASF, said that the conference provided a unique opportunity to talk to the people who actually write the code. The floor was opened to questions which mostly revolved around the function of the foundation, the Java and XML Apache projects, and the upcoming 2.0 alpha. Roy said that 2.0 would be available "real soon now" and Ryan Bloom promised that an alpha would be available during the week of the conference, or at least a few days after it. The first keynote speech was from Dr Alfred Z Spector, Senior Technical Strategist with IBM. Dr Spector outlined IBM's contribution to open-source and particularly their work on Apache 2.0 and that Java and XML Apache projects. The increasing importance of modularity by creating customisable building blocks and code reuse was stressed. The conclusions of the talk were that developers need to be given access to libraries of standard components and better tools to utilise them, and that the education system should be changed to put more emphasis on the use and reuse of components. Brian Behlendorf gave an energetic look at the internals of the ASF in his keynote session "State of the Foundation". His talk covered some of the reasons that the ASF was formed which includes protection for the individual contributors against lawsuits and the abilty to control the Apache identity. The ASF is the umbrella organisation behind the Apache httpd server as well a number of other open-source projects such as Jakarta and Apache XML. Brian announced that the ASF had currently received over US$35,000 in donations which was quite an accomplishment given that it is not publicised that the Foundation accepts donations. The difficulty for the ASF now is working out how to spend the donations. The FreeBSD project was cited as a good model as it gives grants to developers who needed additional resources for example. For the future a goal for the ASF is to develop a structure to help support new projects, aided by creating a standard framework of developer tools and procedures for running open-source projects. On Friday the first keynote was given by the president of the Java Software Group within Sun, Patricia C Sueltz. She talked about how Sun views the open source movement. She said that Sun has made three technology bets in the year 2000: computers will need to massively scale, that the network stack will need to be interoperable, and that devices will be always on and always connected. Finally, she talked about how Sun views open source, and addressed some of the criticisms of their current approach. She said Sun was committed to working better with open source, and is working to improve its source license. Various pieces of software have been or will be released to open source groups, including the Tomcat servlet engine and the Xerces XML parser. There were four parallel tracks running throughout the conference, with a total of over 40 classes. Unlike the last ApacheCon the tracks were not themed, and some of the classes took place as "Nightschool" events. Over the next few weeks the ApacheCon site will be updated to include links to all the talks and papers that were presented. Popular talks on Wednesday included a series of tutorials on starting with mod_perl, Comanche a GUI for Apache, and the Catherdral Meets the Bazaar. The mod_perl tutorials continued into Thursday and was joined by talks about XML, HTTP, and APR. The day classes ended with a well received talk on load balancing for Apache using the Backhand module, mod_backhand. Talks on Friday covered a variety of subjects, including XML and XLST, PHP, and Apache 2.0. There was also a panel discussion about the future of Apache after 2.0, including a variety of ideas for new features for Apache in the future. This included IO layering (such as the ability for the server-side includes module to parse the output of the CGI module), and replacable configuration engines so that configuration information could be stored in a database instead of a file. Around sixteen companies exhibited at the trade show during the conference. Companies present included IBM, Sun Microsystems, LinuxMall, and Covalent Technologies. The exhibition was very popular and reinforced how Apache has built an associated industry. The exhibitors we talked to were very happy with the quality, interest, and response of people that they met at the conference. The final session of the conference consisted of a launch of the first alpha of Apache 2.0. A number of ASF members on stage updated the website and copied the distribution files into the correct locations live of in front of the audience. Announcements were then sent to a number of key sites such as Slashdot and Freshmeat. This was followed by a session of questions and answers about the conference. In general most of the attendees seemed to like the conference, with positive reaction to the speakers. An interesting point was that was that most speakers knew their subjects very well, and although not all were experienced speakers, they were preferred to excellent speakers without detailed technical knowledge. Other comments were that the session lengths were just right and that the conference was good value overall. The main critisisms were the fact that lunch was not included in the conference price, the difficulty obtaining meals since BOFs were scheduled at lunch times, and confusion since the 'nightschool' sessions were not included in the 'full conference' registration. Plans are already underway for the next ApacheCon conference, which will be in London in October this year. The conference will be smaller than the US show and tailored towards the European community. The next ApacheCon conference to be held in the US will probably be in San Jose in 2001. The first meeting of the Apache Software Foundation members took place on the Saturday morning following the conference. A total of 27 of the 38 ASF members were present, together with representatives of the conference organising company, Camelot. A secret ballot was held to elect the new board of directors of the ASF as well as to elect a number of new ASF members. ApacheCon 2001 Dublin Cancelled The ApacheCon Europe 2001 conference scheduled for Dublin in October has been cancelled, due to financial difficulties with Camelot Communications. ApacheCon 2001 Dublin Cancelled The ApacheCon Europe 2001 conference scheduled for Dublin in October has been cancelled, due to financial difficulties with Camelot Communications. Camelot are the production company who produced all but the very first ApacheCon conferences. The Apache Software Foundation today released the following statement: Due to financial considerations beyond our control and unrelated to past ApacheCon conferences, our conference producer has decided that they are unable to produce the upcoming ApacheCon Europe 2001 in Dublin. With only three months left before the conference was scheduled to begin, The Apache Software Foundation has decided that it is in the best interests of attendees to cancel the show now rather than attempt to find another conference organizer for the Dublin event. We had suspected there was a problem with the conference when we were contacted by Peter Moulding, a speaker at ApacheCon 2001 in Santa Clara. Peter said that he had not had his travel refunded and that the conference organisers, Camelot Communications, had called him to tell him they were closing the company. We were unable to get an official response from Camelot or the Apache Software Foundation in time to run the story in issue 254 (13th July 2001). It is disappointing that Camelot is unable to produce the conference; the previous conferences that they have run have been well attended, made a profit, and been highly rewarding for everyone involved. The Apache Software Foundation are about to begin evaluating proposals by other conference organisers so that future ApacheCon events will not be affected. More news in Apache Week and on the conference web site as it becomes available. ApacheCon 2001 Conference Report This is a special edition of Apache Week covering the April ApacheCon 2001 conference in Santa Clara. ApacheCon Santa Clara 2001: Day 1 ApacheCon 2001 was held in Santa Clara, California from April 4th to April 6th. As promised, Apache Week was there to cover the conference. The first day didn't get off to a good start as there were no signs in the hotel explaining where the conference registration was, [photo: "registration", 77K jpeg] so we ended up eating a breakfast provided for a different conference in the hotel. This turned out to be a good plan, as the ApacheCon breakfast wasn't nearly as good. Registration was quick and painless but even though conference proceedings were available on a CDROM, the registration bag contained hard copies of all the papers, running to three thick volumes well over 600 pages. Unlike the last ApacheCon there were no free goodies in the bag; last time we got a t-shirt and a pen, this time we just got marketing leaflets from companies sponsoring the event. The schedule showed that ApacheCon had packed over 24 classes into the first day, running from 9am through to after 9pm. First up was the opening plenary presented by Ken Coar, and over 180 people packed the theatre [photo: "ken coar", 59K jpeg], [photo: "packed theatre", 169K jpeg] Ken gave a welcoming speech, details of changes to the schedule, and where to find lunch. Just under 200 proposals for sessions were received for this conference from which just 89 were picked. Sadly attendees we talked to afterwards said the session came across as unplanned and unprofessional for a conference of this type. This would have been a good opportunity to introduce the Apache Software Foundation or give a brief overview of the major events since the last conference. We made use of the wireless Internet access available throughout the conference area to catch up on some work before attending the "birds of a feather" (BOF) session on clustered Apache services [photo: "BOF audience", 63K jpeg]. The group behind the Spread toolkit explained how to create reliable distributed clustering systems and showed examples of how Spread can be used within Apache. Apache-SSL has code that makes use of Spread to facilitate a shared session key server, although the toolkit can be used for much more complex tasks such as database replication. Next, Harrie Hazewinkel gave a short but interesting talk on quality of service measurement, using SNMP to monitor and manage Apache. Harrie is the author of the Apache SNMP module, mod_snmp. After the provided lunch, Jon "maddog" Hall from Linux International enlightened us with an entertaining and animated keynote speech [photo: "maddog", 64K jpeg]. He touched on trademark issues where people take advantage of the Linux name to create, for example "Linux University". These issues are of particular interest to Apache, and the ASF take care to protect the Apache name. With the recent downturn in the technical sector he explained his business plan which involves combining microcomputing and microbrewing. "When the computer industry is at a low, beer drinking is at a high." he said. By combining both industries into a single course you can make sure you always have a job. The keynote touched on issues to do with classification of machines, the accuracy of his predictions applied to the Internet, and look at Star Trek technology including communication badges, personal log computers, and female Borg. Next we had intended to visit the talk on WebDAV and Apache with Greg Stein, but the small presentation room was overflowing with people, so much so that the talk was repeated later in the week for those that could not fit in the first time. Instead we went to see Giacomo Pati and his talk on Cocoon. When we started developing Apache Week back in 1995 we looked at content-independent ways to store the issues. We actually wrote our own format, in a style similar to the Ventura publisher markup language. If we were to start again we'd definitely be using XML, in fact we already use XML for parts of Apache Week as well as the "In the news" section of the main apache.org site. We were interested in finding out more information about some of the XML publishing systems available, and this is the goal of the Apache Cocoon project. Doug Tidwell spent some time explaining Cocoon 2.0 and focussed on serving up XML documents. The basic idea is that you write a XML representation of the resource you wish to serve together with an XSL stylesheet that shows how the XML is to be translated. The XSLT process is normally left to the server and is usually cached as the translation may take a significant time. In the future, browsers will be able to do this transformation themselves with the server just providing the XML and XSL files directly. Some browsers attempt to do this now, but support is still limited. Cocoon is able to pick which XSL stylesheet to use to render a page based on things such as the user-agent field. Once you have an XML representation of your data you are not limited to just providing a translation to HTML, and we were shown tools that could convert the XML into other presentation types such as JPG and even the creation of dynamic PDF. For the remainder of the day we decided to attend the talks on security. The first, "PKI with OpenSSL", aimed to show the applications for which OpenSSL can be used. OpenSSL is an open-source toolkit that implements SSL as well as many other cryptography and public key protocols. Before September last year the RSA patent prohibited the use of OpenSSL inside the USA. Rodney Thayer explained that OpenSSL can do much more than act as the SSL layer for a secure web server as he went through the various standards as well as commands for general cryptography, certificate processing, and key storage. OpenSSL is now used in a large number of applications and is a product-grade general purpose cryptography tool. The last class of the first day was a highly entertaining and animated talk by Ralf S. Engelschall, author of mod_ssl, mod_rewrite, and much more. The talk, "Security Solutions with SSL", covered the evolution of mod_ssl, described its features, and gave useful configuration examples. Each of the beautifully presented slides included an amusing quote to lighten up the atmosphere of this heavy subject. The future of mod_ssl was discussed including the work currently going on to port it to Apache 2.0, add LDAP CRL handling, and a distributed session cache. mod_ssl will not need EAPI hooks for Apache 2.0, but other EAPI functions may be useful. It is not certain how this effort will fit into the work being done in Apache 2.0 on mod_tls and if we will end up with two SSL solutions like we have with Apache 1.3. When asked about support for Win32 Ralf replied "if you really think that you can run a secure web server on Windows you've not understood security". ApacheCon Santa Clara 2001: Day 2 The second conference day was almost as packed as the first, with 25 talks and additional BOF sessions spanning from 9am until after 8pm. After the free breakfast doughnuts I decided to attend the BOF sessions on using Apache for serving multiple protocols. One of the aims for Apache 2.0 is that the HTTP engine is abstracted, and in particular APR is designed to be a portable layer that can sit beneath all sorts of applications. The BOF gave a list of the protocols that have been examined so far including HTTP, FTP, POP, IMAP, IDENTD, and SNMP. It then looked at why you'd want to use Apache to do this when good applications for each of these protocols already exist. The main advantage is that you get a common infrastructure for all your applications so you can use one standard configuration format, one standard way of doing authentication and so on. You can also make use of the extensive tools such as the Rewrite module and SSL across all protocols. The biggest requirement for the project is that the performance for serving HTTP requests should not be affected if you don't use Apache to serve any other protocols. Once discussion moved to POP and IMAP support I was reminded of Jamie Zawinski's law of software envelopment: "Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can." Each time a secure web server receives a connection from a new client it has to establish a new SSL session. This negotiation requires the server to perform a private key operation, usually with a 1024 bit RSA key. This operation is mathematically complex and is therefore time consuming. Hardware accelerators are designed to offload the most complex parts of this operation allowing more new connections to be established every second. Existing hardware units handle anywhere between 75 and 300 of these operations per second using a number of internal processors, and can cost up to US$15,000. The OpenSSL project has recently been incorporating support for various hardware cryptographic accelerator cards. Until recently these accelerators were only supported by commercial secure servers. A number of these hardware vendors were invited along to a special BOF to discuss OpenSSL support and their units. Representatives of nCipher, Rainbow, and GIGI attended and gave short talks about the capabilities of their hardware and how it was supported. nCipher stressed that the ability to keep your servers private keys on an external device, and scalability was more important than performance. Rainbow said that they concentrated on acceleration, having the fastest boards available. Dr Lee Nackman of IBM gave a keynote entitled "Open Source and the Corporation". He said that IBM had an "open source zeal" and had developed internal processes that made working with open source projects less painful. Of course IBM wants to see a return from their investment, and in the case of their substantial contributions to Apache-XML they saw that it would open up new business models for IBM. They see themselves supporting the customer demand for Linux and being able to exploit the emerging technologies. Looking to the future, he predicted an increase in web services and service-orientated web applications such as stock quotes, news, and increased integration with business processes. Soon it was lunchtime, and at this conference the ApacheCon planners had decided not to schedule sessions overlapping with lunch. Instead lunch coincided with the opening of the exhibition hall [photo: "lunch queue", 80K jpeg] The turn out of exhibitors was disappointing, under half the number at the last ApacheCon, and a distinct lack of giveaways. I failed to find which company was giving away inflatable camels (or in fact why they were doing so) [photo: "apacheweek sign", 61K jpeg], [photo: "exhibitors hall", 98K jpeg], [photo: "exhibitors hall", 97K jpeg]. I skipped most of the afternoon sessions in order to finish off the Apache Week guide to the history of Apache 2.0 and catch up with some sleep. ApacheCon Santa Clara 2001: Day 3 Friday marked the last day of the conference, but the schedule was still packed with exciting talks and keynotes. For the first talk of the day we visited Mark Wilcox who was presenting "Apache and LDAP". The talk outlined the role that LDAP can play with Apache, looking at what directory services are, and how to make use of LDAP with Apache and Perl. Mark explained that the aim of a directory service is to provide quick access to hierarchical information in a way that can be distributed and replicated. These services can be useful to Apache for authentication, authorisation, and perhaps even configuration. The HTTP protocol is stateless so user authentication needs to happen on every request. Rather than have every page request do a new database lookup, LDAP services are usually combined with some other system, such as cookies. The Perl::LDAP module provides an easy way to interface to directory services from within Apache. Jon Tigue gave an interesting presentation on extending directory indexes provided by mod_autoindex. By cleaning up the HTML produced by the module with a simple patch, the output from the module can be sent through an XML parser. When used in conjunction with clients that can parse XML this allows things such as the column sorting in the FancyIndexing without any server interaction. After lunch a panel discussion took place about Apache on Windows. Ryan Bloom, William Rowe, Jeff Trawick, and Rich Bowen formed the panel but were greeted by only 20 attendees [photo: "win32 round", 77K jpeg]. The discussion formed around APR and how the implementation of this layer makes Apache 2.0 think that Windows is just another Unix. Even though Apache for Windows is designed to run best on NT (and hence Windows 2000), a substantial proportion of the audience wanted to keep support for Windows 95 and 98 for testing purposes. The closing session hosted by Ken Coar saw only a fraction of the attendance of the opening plenary, but it was getting late on a Friday evening. With a panel of ASF members on stage [photo: "some ASF members", 52K jpeg], it was time for comments about the conference. The overall feedback was positive. Some complaints were there was poor Internet access, this was true if you relied on the computers provided but I found the wireless coverage to be excellent. One suggestion was that there should be less sessions in the evenings, leaving them free for more social interaction or BOF sessions. Another suggestion was to have talks that explained (probably in an unbiased way) the commercial products available that interfaced with or were based on Apache. Overall I was very impressed with the conference. A lot of the problems from previous ApacheCon conferences had been addressed and the quality of the presenters was high. It was a shame that more exhibitors had not taken part as it seemed that a number of corners had been cut to save money. The only negative impressions were fairly minor; the food choices were limited (on Friday all the meal choices involved cheese making it difficult for Vegans to find things to eat), the conference was a long way from any other facilities (having a car was essential), and there were no fancy parties. Wireless internet access was available throughout the conference rooms and I found it difficult sometimes to stay focussed on the speaker, missing parts of presentations whilst catching up on email without realising it. With so many interesting talks I couldn't attend all of them and this report gives only a snapshot of the ones I thought would be interesting to me. ApacheCon has a variety of talks aimed at all technical levels, so you should definitely consider attending if you've not been to one before. With that, I end my report and hope to see you all at the next ApacheCon later this year! ApacheCon'98 Conference Report Report from the first ever Apache conference Report from ApacheCon '98 This is a report about the first ever conference dedicated to Apache, held in October 1998 in San Franscisco. First published 16th October 1998. The conference ran from October 14th to October 16th, at the San Francisco Hilton in California, USA. This is the largest hotel in San Francisco, and is located in the downtown area. In total, just under 500 people registered for the conference. While most people came from the US and Canada, there were also a signficant number of people from Europe. For the first conference on Apache, this was a very good attendence, and the exhibitors and sponsors were very happy with the number of people at the conference. In addition, most of the 18 core Apache developers also attended, coming from the US, Canada, Italy, UK and Germany. This article contains some links to pictures taken at the conference. Some additional pictures are also available. The first general session on the 14th started with a keynote speech from author Bruce Sterling (picture). This was not directly related to Apache, but contained Bruce's thoughts on the future of a networked society. This was followed by another keynote, from John Gilmore of the Electronic Frontier Foundation (EFF) (picture). The EFF is an organisation concerned with freedom and liberties in computing and the Internet. He outlined objections to software patents, and covered the problems caused by the US export restrictions on secure encryption. Decisions about what is exportable and what is not exportable from the US are made by government employees, without any ability to appeal. Even worse, government employees can revoke export permission at any time without giving any reason, which could seriously affect businesses who rely on exports. The export restrictions were applied to the NCSA httpd server, where the government demanded that the server removed all "hooks" which could allow encryption to be added, even thoughb there was no actual encryption technology in the server. This is the reason that Apache does not contain any hooks to enable encryption to be added. The first talk on the second day was by John Patrick from IBM (picture). He talked about his view of how the Internet will evolve. In the last session, David Filo from Yahoo! showed how Yahoo! has used open source software (picture). They started by using commercial operating systems and home-written web servers, but had problems with vendors not being able to scale to the huge number of hits they soon received. They moved to FreeBSD so they could read and if necessary tweak the operating system code. They also use Apache on most of their servers and find that the majority of the performance limitations come from the application layer software. There were four parallel tracks running throughout the conference, with a total of 55 talks. The tracks were Dynamic Content, Performance, Security and Case Studies. On the Dynamic Content track there were talks about using Java servlets at beginner, advanced, and performance levels. Two talks about PHP showed beginner and advanced techniques. There were also talks on writing Apache modules and mod_perl (although unfortunately the first mod_perl session could not be given by the original presenter and the second advanced session had to be cancelled). The Performance track covered making Apache go faster on Windows and Unix systems, using servlets efficiently, and tweaking Linux and FreeBSD. Also on this track was a presentation on how the Netscape Portable Runtime (NSPR) package (available under Netscape's NPL license) could be integrated with Apache to provide a multi-threading Apache on all Unix platforms as well as NT. There was also a talk about how the Apache development process works and how people with contributions can get involved. On the Security track the talk about the new mod_ssl package was popular. The author presented what mod_ssl does and why it was created from the existing Apache-SSL package. Also on this track was an introduction to SSL and TLS, basic security issues in Apache, NT security, and a panel on public key infrastructure on the Web. The final track had Case Studies from various companies. This track also contained a demonstration of various GUI configuration programs for Apache. There are various free and commercial configuration systems in development currently, some of which were demonstrated. This seems to be the start of a more concerted effort to develop a GUI infrastructure within Apache, which will allow multiple front-end implementations. About a dozen companies exhibited at the trade show during the conference (picture). Companies present included IBM, RedHat, C2Net, Sendmail, nCipher, SUSE and O'Reilly. This was a very explicit demonstration of how Apache has built an associated industry, and the exhibitors we talked to were very happy with the quality, interest, and response of people that they met at the conference. The final session was a chance to communicate with the core Apache developers (picture). After introducing each member, there was a short discussion of items of interest to the developers, such as plans for 2.0. This was followed by an open session for questions from the floor. Questions covered a range of topics, from IBM's involvement with the Apache group (they have several people working full time on Apache and will contribute back changes) to a request for Apache incorporate SSL by storing it on a server outside the US (this cannot happen because then no US citizen could work on any part of Apache as it includes encryption). This was the first conference about Apache, and the first conference ever organised by the Apache Group. The result was a very successful conference, where sponsors, exhibitors and attendees were all happy. The success of the conference means that there will be another ApacheCon in the future, but the location and dates have not yet been decided. As soon as anything is known, it will be announced in Apache Week. Apache 1.2 API Guide For module authors, a comprehensive list of changes to the Apache module API. Introduction Apache 1.2 is now out. Here we list all the module API changes compared to the API in Apache 1.1.3. Anyone who has written a module for Apache 1.1.3 or earlier should read this to see if the need to make modifications for it to work with 1.2. In any case, Apache adds many new features from HTTP/1.1, and modules might want to take advantage of them. See also our Guide to Apache 1.2 First published in Apache Week issue 44 (6th December 1996), last updated 6th June 1997. API Changes The module API version is now 19970526. A new phase of request processing is available, to allow modules to process the request headers early on in the request. The functions which handle directives should now return type "const char *" instead of "char *". If this is not done, compiling the module might result in type-mismatch warnings, although it will still work. Directives can now be defined in more than one module as once. Each module is given the chance to handler the directive, and can decline it by returning DECLINE_CMD. This gives other modules the chance to handle the directive. This is used in Apache in mod_auth.c and mod_auth_dbm.c, which both support AuthUserFile, but only handle it if they recognise the file type argument. Directives can now take up to three arguments, and can take optional arguments. The number of arguments is specified in the module's command table, with values such as TAKE2 (for two arguments). Possible values are now: TAKE3: takes 3 arguments TAKE12, TAKE23, TAKE123, TAKE13: takes a variable number of arguments (1 or 2, 2 or 3, 1 or 2 or 3, 1 or 3 respectively). The function called should declare arguments for the maximum number of argument the directive can take. Arguments not set on the directive will be passed to the function as NULL. Finally, the cmd_parms structure has been updated (this is passed in as the cmd argument to directive handlers). A new 'cmd' element is now available, pointing at the directive's command table definition (command_rec). Apache now supports the additional OPTIONS and TRACE request methods. Two new defines are available for these methods, M_OPTIONS and M_TRACE. The request_rec's method element could be set to one of these. The handler can send an OPTIONS response using send_http_options() (although it could also decline the request, and let the default handler send the response). Handlers can also set the new allowed request_rec element to enable the creation of a proper Allow HTTP/1.1 header. This is done by shifting the M_ defines right by the appropriate amount. For example, to specify that GET and POST (only) are allowed for a particular resource, the following could be used: r->allowed = (1 < M_GET) | (1 < M_POST); The way that a module reads PUT or POST data has been completely changed. This is necessary to support HTTP/1.1, which can send this data in a 'chunked' encoding. Modules can request that they get the data after it has been 'dechunked', or they can get the raw data. Any module which handled PUT or POST data by using the old read_client_block() will need to be modified before it will compile with 1.2. The way to read a request body in 1.2 involves several steps: Call setup_client_block() to prepare to handle the data. The second argument to this function tells Apache how to process the body (if at all). It can be one of: REQUEST_NO_BODY (issue a 413 error if any body is present), REQUEST_CHUNKED_ERROR (issue a 414 if the body was sent encoded), REQUEST_CHUNKED_DECHUNK (if body is chunked, process to remove the chunking), REQUEST_CHUNKED_PASS (pass on the chunks). Call should_client_block() when ready to read the data. This sends a "100 Continue" status to the client (new in HTTP/1.1) and tells the module whether it is ok to read the data. Repeatedly call get_client_block() to get the data (possibly all in one go, but possible also a bit at a time) A HTTP response can include headers to indicate to the client that this response should not be cached at all. In previous versions of Apache, this was done by setting the no_cache element of the request_rec. This also had the effect of always sending the response, even if a "304 Not Modified" response could be returned. Now a new element has been added, no_local_copy. When this is set, a 304 response will never be generated. Setting no_cache will send a response that cannot be cached. New in the request_rec no_local_copy and no_cache replace 'no_cache' (type int) request_time - time request was received (type time_t) boundary - boundary string for multipart/byteranges (type char *) range - range header text (type char *) content_language deprecated. Use content_langauges array instead (array of char*) allowed - set to allowed methods (returned on Allow: header by send_http_header()) (type int) byterange - number of byte ranges (type int) chunked - if sending chunked encoding (type int) read_length (bytes read so far) (type long) read_body (read_body can take REQUEST_NO_BODY, REQUEST_CHUNKED_ERROR, REQUEST_CHUNKED_DECHUNK, REQUEST_CHUNKED_PASS) - set by handler (type int) clength - real content length (type long) remaining - bytes left to read (type long) There are some other elements used internally within Apache. In addition, the existing port element is now an unsigned int rather than a (signed) int. New in the server_rec send_buffer_size - sets the TCP send buffer size addrs - list of addresses for this vhost (type server_addr *) server_uid and server_gid contain the euid/egid to run suexec wrapper as (types uid_t, uid_t) The server_rec no longer contains host_addr, host_port or virthost. Instead, the server could be responding to multiple server addresses, so a new array (addrs) is created, each type type server_addr. The server_addr_rec contains the IP address, port and name of the server. Apache is now compiled with a regular expression library. Modules can use the function calls provided by this library to make use of regular expressions. Note that on systems which provide a stable and bug-free regular expression library, the one supplied with Apache is not used. The library is available in the src/regex directory of the Apache distribution. The only thing to note when using this regular expressions is that regsub() should not be used. This is because it returns a string allocated internally, not using Apache's pool allocation system. A new API function, pregsub() is provided instead which does the same as regsub(), but allocates space in the pool passed in as an argument. Resources can be associated with multiple resources. Typically, mod_mime obtains information about which languages a file is in from its extensions, but modules can also set the language of their response. Previously, the language was set as a string called content_language in the request_rec. That is still available for backwards compatibilty, but will only hold the last language that mod_mime set. To get all the language in a file, or to set a response with multiple languages, the new element content_languages should be used instead. This is an array (created using the standard Apache array functions such as make_array()), with each element being a "char *" string containing a language tag. For example, if a module wants to output a response in English and German, it should set content_languages with: char **new; r->content_languages = make_array (r->pool, 2, sizeof(char*)); new = (char **)push_array (r->content_languages); *new = "en"; new = (char **)push_array (r->content_languages); *new = "de"; The following API functions are new in Apache 1.2, and have not already been mentioned above. blookc() can be used to look ahead one character in a BUFF* stream. call_exec() to run sub-programs, possible as a different user. clear_table() to empty a table construct_server() returns a string giving the "hostname:port" for a given hostname and port (:port is omitted if it is 80). find_last_token() looks if a given token appears as the last part of a string. find_token() looks to see if a given token exists in a comma-separated list of tokens getword_white() available to get a word, skipping white space is_table_empty() check if a table has any contents (this is a macro) pregcomp() to preform a regular expression comparison pregfree() to mark memory used by a regular expression comparison as available. pregsub() is used after a regular expression match to substitute matching parts. scan_script_header_err can be used instead of scan_script_header() to return error information from the headers send_fd_length() sends a part of an open file. send_header_field() sends a single header to the client. set_flag_slot() sets an on/off flag in a module's config (complements existing set_string_slot()). rflush() can be used when sending a response to force output to be flushed to the client. table_do() to call a function for each item in a table API functions that use a port number previously used a signed int and now use an unsigned int. File descriptors are now passed as long instead of int to functions such as pclosef() and note_cleanups_for_fd(). All the HTTP status codes have been renamed to start with HTTP_, and the new codes from HTTP/1.1 have been added. Macros are now available to check status codes, such as is_HTTP_REDIRECT(status) Internal Changes People writing modules might also be interested in how the core Apache code works. This list, provided for information only, is a summary of the major changes to the source code which have not been reported elsewhere (as new features, for example). CookieLog now handled by mod_log_config Code to do some of transparent connected negotiation (see #define HOLTMAN in mod_negotiation.c) Configure updated to handle new simpler configuration file format Date-related functions are now in util_date.c mod_includes calls can_exec() for sub-processes Modules can be compiled in but inactive. The compiled in modules are listed in preloaded_modules[] array, while the active modules are stored in prelinked_modules[]. Modules will be moving into the src/modules directory (only mod_proxy has moved so far) Proxy code moved to src/modules/proxy directory, within the new modules directory Regular expression library has been added in src/regex directory Returns 100 Continue before reading request entity Scoreboard now contains the name of the vhost processing the request. The #define names for OS-specific functions have been simplified and made consistent: HAS_GMTOFF is now HAVE_GMTOFF, HAVE_SYS_SELECT_H and HAVE_SYS_RESOURCE_H and been added, and USE_* used to select preferred options on particular OSes (USE_FCNTL_SERIALIZED_ACCEPT; USE_FLOCK_SERIALIZED_ACCEPT; USE_LONGJMP) The fd for each listener is stored to allow graceful restarts To support graceful restarts, scoreboard records a 'generation' number Various function arguments and return values are declared as const. Appaloosa Awards 2000 The Appaloosa Awards were announced at the O'Reilly Open Source Conference this week. The winners included ASF members Ryan Bloom, Lars Eilebrecht, Roy Fielding, Doug McEachern, Dirk-Willem Van Guilik on behalf of Apache XML projects, and Rasmus Lerdorf on behalf of the PHP team. Appaloosa Awards New to the conference this year were the Appaloosa Awards, designed to reward the people and projects who have had a significant influence on Apache. The voting was open to Apache Week readers for one week and we received just under 3000 votes in total. The awards were announced on Tuesday evening at the conference by ASF member and Apache program co-chair Chuck Murcko. [Photo, Chuck Murcko, jpeg 33k] The Vision Award was for the best ideas to move Apache forward and was won by Ryan Bloom for Apache 2.0 and Roy Fielding for standards and industry acceptance of Apache. [Photo, Ryan Bloom, jpeg 29k] [Photo, Roy Fielding, jpeg 30k] The Evangelism Award was for promoting Apache awareness or acceptance was won by Lars Eilebrecht, and collected by Dirk-Willem Van Guilik for the Apache XML Projects. [Photo, Dirk-Willem Van Guilik, jpeg 30k] The Technical Contribution Award went to Doug McEachern for mod_perl and Rasmus Lerdorf collected on behalf of the PHP Group. [Photo, Doug McEachern, jpeg 47k] [Photo, Rasmus Lerdorf, Sam Ruby, Jim Winstead, jpeg 53k] All photographs copyright Story Photography 2000 Apache 1.1.1 bugs review A round-up of all the bugs in 1.1.1 Bugs in 1.1.1 The next version of Apache will be 1.2. This will include a lot of new features, as previewed in our Apache 1.2 article (from issue 29). It will also fix most of the outstanding bugs identified in 1.1.1. In this issue we summarise these bugs sorting them by affected function. There are quite a few bugs listed here, but most will not have a serious affect on most setups. Many are restricted to specific operating systems, or to particular configurations and modules. It should be remembered that Apache 1.1.1 is a stable release and most users are unlikely to come across these bugs. For each bug we have tried to identify its current status in the latest development version of Apache. If the bug is followed by the word FIXED then the bug has been fixed and tested. If the status is VERIFIED then the bug exists but has not yet been fixed (although in many cases a fix will be in progress or undergoing initial testing). If neither word is present, then the bug has not been verified or fixed. We have tried to ensure that only real bugs are listed here, but the Apache group receives quite a few bug reports, many of which relate to incorrectly configured systems or which are caused by the operating system or other software. These bugs affect the operation of the core server, or are related to low-level networking or operating system interaction. DNS Failure causes core dump Apache can core dump if it cannot obtain the local hostname from the ServerName directive or from the DNS. FIXED. High Load Problems At startup Apache forks the initial children. If it fails to fork (perhaps because of resource limitations), it immediately tries again, which can make the load situation worse. FIXED. A race condition can cause occasional hung processes on very high load systems. VERIFIED. Memory allocation failure causes core dump The memory allocation return value is not checked which could cause core dumps. FIXED. ErrorDocuments ErrorDocument redirect fails, displays filename. FIXED Docs claim %s in ErrorDocument string prints reason for error - no code to implement this. VERIFIED ErrorDocument displays " in string message. FIXED Executing sub-programs When a sub-program is about to be run, Apache checks for correct permissions, but it does not account for other groups that the current user might be in. Scoreboard Scoreboard sometimes out of date (shows PID of children Domains Starting with Numbers Hostnames starting with a number (e.g. 123.domain.com) are incorrectly treated as IP addresses. VERIFIED. Domain name capitalisation Domain names on allow and deny lines are not compared case-insensitively. VERIFIED. Expires Header Apache is not setting Expires header on 304 responses FIXED Continuation Headers Doesn't support HTTP continuation headers FIXED Keep Alives Netscape Navigator 2 has bugs in its keepalive support, so Apache should turn off keepalives when accessed from Navigator 2. FIXED. The proxy module has been extensively modified since 1.1.1 to correct a large number of problems and omissions. NULL requests logged Report of request "NULL" being logged in access log Missing Hits Reports of access_log missing some hits (possibly related to keepalives) ErrorLog ErrorLog | does not work. VERIFIED. Imagemap Module Long URLs (>100 chars) can cause buffer overflows (possible core dump) VERIFIEDo Status Module Can gives wrong start-up time on some systems Core dumps on a few systems (OSF, SCO) Wraps bytes total at 4.2GB FIXED Transfer bytes per second figures wrong FIXED Negotiation Module Language negotiation doesn't work for cgi scripts without extensions, which are in a valid ScriptAlias directory. Charset negotiation is not implemented. VERIFIED. Language negotiation doesn't match languages against sub-languages, i.e. it treats en and en-US as completely different languages. FIXED. Directory Index Module Core dump on Solaris 2 with empty directories Truncating file size in listing (e.g. 1.8Mb is displayed as 1Mb) FIXED Userdir UserDir cannot handle certain configurations, such as http://10.1.2.3/~* VERIFIED Includes Module Possible mod_include bug causing core dumps if SSI include fails due to incorrect .htaccess directive Current working directory can change while processing includes These bugs are related to specific operating systems. A/UX: Linger close fails on A/UX FIXED AIX: Compile warning for SERVICE_UNAVAILABLE FIXED Apollo Domain: Some compilation errors on Apollo Domain Digital Unix/OSF: V4.0 requires -lm because the frexp() function has been removed from libc.so. Incompatible pointer type warning. IRIX: IRIX kernel fails to notify Apache of dead children FIXED Linux: File descriptor bug causing SEGV in includes module. FIXED. NeXTSTEP: support/logresolve.c does not compile because of strdup OS/2: Simplified code for OS/2 FIXED. OS/2 filesystem is case-independent, can cause URLs to fail to match protection limitations QNX: Missing prototypes for QNX FIXED. SCO: Dumps core in status module with a Floating exception when compiled with -DSTATUS on SCO ODT 3.0 SGI: Compile warning in http_bprintf FIXED Ultrix: Compile error in http_main.c UnixWare: Configuration updated for UnixWare (needs NEED_LINGER) Example URLs for status and info Example URLs for status and info pages (/status and /info) can intercept other URLs (e.g. anything in a directory called /info or /information). FIXED. ScriptAlias and PATH_INFO problem Bug in the SCRIPT_NAME passed to CGI where the ScriptAlias directory included some PATH_INFO. FIXED VHosts Host: header can override IP virtual hosts to give access to other vhosts's information. VERIFIED. IP-based Virtual hosts on main IP address but different ports not working. VERIFIED. Directives with on/off arguments Directives that taken an argument that is either "on" or "off" infact accepted any argument. FIXED. Default configuration mime types can conflict with encodings Default mime.types contains content-types for gz and Z extensions, but should be given as encodings with AddEncoding. FIXED Port directive Apache accepts non-numeric Port number. FIXED. Authoritative misspelt Spelling of authoritative (as authorative) wrong in auth_anon and auth_msql FIXED. Finally, a few bugs reports cannot by verified or discounted. That is, they may or may not exist, but cannot be reliably reproduced. While they may be Apache bugs, they could also be bugs in the operating system, or problems related to particular load conditions or configurations. Any further information about these possible bugs should be reported on the apache-bugs email address or Web page. CGIs intermittently fail with 'premature end of file error' on site with 100 vhosts. Occurs even with low load. Server will not respond after a few days of running. Instead of the 5 processes typically running, there is only one. Server accepts the requests, but never responds. This site makes heavy use of CGIs (>50% of all requests). Some hits are not logged in the access_log, or logged as "NULL". Using Certificate Revocation Lists Certificate Revocation Lists (CRL) increase the security of Client Authentication Realms by enabling server administrators to block client certificates that have been revoked because they are known to have been compromised. Mike Leach and Tim Starr take a look at how to get CRLs working with mod_ssl and Apache. Feature: Using Certificate Revocation Lists One of the most common kinds of access control for secure web servers is Basic Authentication, in which a login and password are required. Access controls can apply to part or all of a web site. The restricted area is called the "authorization realm." Even though Basic Authentication is the most common kind of access control, it is not the most secure. The most secure kind of access control is Client Authentication. Client Authentication uses client certificates installed in users' web browsers or other client applications (clients) to authenticate users, and only lets clients with the right client certificates into the authorization realm. (In this article, an authorization realm with client authentication will be called a "Client Authentication Realm.") A client certificate is issued by a Certificate Authority (CA). A CA checks whether a client certificate applicant meets the CA's criteria for trustworthiness before issuing the client certificate. The client certificate is good for access to the Client Authentication Realm until its validity expires. After expiration, the user will be blocked. To renew access, the user's trustworthiness must be reaffirmed by the CA before renewal of the client certificate. This checking when client certificates are issued and renewed helps to ensure that valid client certificates are only in the hands of users trusted to get into the Realm. However, a client certificate can be compromised before it expires. For example, it can fall into the wrong hands, or the CA may decide that the user it was issued to is not trusted anymore. To reject client certificates which are known to be compromised before expiration, a web server consults a Certificate Revocation List (CRL). A CRL is a list of client certificates that were revoked before they expired. Clients with revoked client certificates will be denied access to a Client Authentication Realm if the revoked client certificates are in the server's CRL. This article explains how to configure Apache+mod_ssl to keep clients with revoked client certificates out of a Client Authentication Realm. Don't forget to make a backup of your configuration files and keys and certificates before trying these examples. This article assumes that you have: Apache+mod_ssl installed on your machine A browser that supports client certificates such as Netscape Navigator or Microsoft Internet Explorer A revoked client certificate installed in the browser The root certificate (rootcert) which signed the client certificate The CRL file which includes the revoked client certificate. The client certificate, rootcert, and CRL file must be issued by a CA. The CA can be a third-party application or service, or OpenSSL (the SSL toolkit on which mod_ssl is based) can be used as a CA. The certificates and CRL must be in the PEM (base64-encoded x509) format required by mod_ssl. The Client Authentication Realm can be either a secure virtual host or a directory. Make sure these directives are in the secure virtual host or directory container for the Realm in httpd.conf: SSLVerifyClient require SSLVerifyDepth 10 After these changes are made and the server is restarted so the changes take effect, clients without client certificates will be kept out of the Client Authentication Realm. Even browsers with client certificates will be denied, unless the rootcert has already been installed on the server. Test this by trying to access the Realm with a browser without a client certificate (or with a client certificate with an uninstalled rootcert). To let a client with a client certificate into the Client Authentication Realm, the rootcert must be installed. This can be done with the <SSLCACertificateFile> directive (or with SSLCACertificatePath, which will not be covered here). Install the rootcert by adding it to the default SSLCACertificateFile, client-rootcerts.pem. If the rootcert filename is ca.crt, The rootcert can be added with this command: cat ca.crt >> client-rootcertificates.pem The rootcert can also be made the SSLCACertificateFile instead of client-rootcerts.pem if none of the other rootcerts in the default SSLCACertificateFile are needed. After the server is restarted again, browsers with client certificates signed by the installed rootcert will be let into the Client Authentication Realm, even if the client certificates are revoked. Revoked client certificates will not be blocked until the CRL is enabled. Test this by accessing the Realm with a browser that has a client certificate that is revoked and signed by the installed rootcert. Make a CRL directory such as /ServerRoot/crl/. Copy the CRL file (ca.crl) into the CRL directory, then configure the CRL in httpd.conf with either SSLCARevocationFile or SSLCARevocationPath: With SSLCARevocationFile, put this directive in the secure virtual host container for the Client Authentication Realm: SSLCARevocationFile /ServerRoot/crl/ca.crl SSLCARevocationPath requires two steps. First, put this directive in the secure virtual host container for the Client Authentication Realm: SSLCARevocationPath /ServerRoot/crl/ Next, make a symlink of the CRL file in the CRL directory, with a filename based on a hash of the CRL file: ln -s ca.crl `openssl crl -hash -noout -in ca.crl`.r0 Every CRL file in the SSLCARevocationPath must have one of these symlinks. After the web server is re-started, the CRL will be enabled. Clients with revoked client certificates will not be let into the Client Authentication Realm and will get a browser error message saying that access was denied because the client certificate was revoked. An error message such as this will appear in the /ServerRoot/ssl/error_log: [Thu Aug 31 15:32:47 2000] [error] mod_ssl: Certificate Verification: Error (23): certificate revoked There are a couple of known problems which may come up because of differences between the CRLs issued by CA software and mod_ssl's requirements. One is that CA software may issue CRLs without the required start -----BEGIN X509 CRL----- and end -----END X509 CRL----- lines. Here is an example of a CRL generated with OpenSSL that works with mod_ssl: -----BEGIN X509 CRL----- MIIBmjCCAQMwDQYJKoZIhvcNAQEEBQAwgb0xCzAJBgNVBAYTAlVTMRMwEQYDVQQI EwpDYWxpZm9ybmlhMRAwDgYDVQQHEwdPYWtsYW5kMRYwFAYDVQQKEw1SZWQgSGF0 LCBJbmMuMSIwIAYDVQQLFBlHbG9iYWwgU2VydmljZXMgJiBTdXBwb3J0MR0wGwYD VQQDExRSZWQgSGF0IFRlc3QgUm9vdCBDQTEsMCoGCSqGSIb3DQEJARYdc3Ryb25n aG9sZC1zdXBwb3J0QHJlZGhhdC5jb20XDTAwMTExMzIwNTcyNVoXDTAwMTIxMzIw NTcyNVowFDASAgEBFw0wMDA4MzEyMTE5MTdaMA0GCSqGSIb3DQEBBAUAA4GBAIge X5VaOkNOKn8MrbxFiqpOrH/M9Vocu9oDeQ6EMTeA5xIWBGN53BZ/HUJ1NjS32VDG waM3P6DXud4xKXauVgAXyH6D6xEDBt5GIBTFrWKIDKGOkvRChTUvzObmx9ZVSMMg 5xvAbsaFgJx3RBbznySlqVU4APYE0W2/xL0/8fzM -----END X509 CRL----- Another problem is that CRLs issued by third-party CA software may not have all the fields required by mod_ssl. It may be possible to configure the CA software to issue CRLs with all the required fields. Use this OpenSSL command to view the CRL: openssl crl -text -noout -in filename Then compare its fields to those in the sample CRL above to see if the same fields are in your CRL: Certificate Revocation List (CRL): Version 1 (0x0) Signature Algorithm: md5WithRSAEncryption Issuer: /C=US/ST=California/L=Oakland/O=Red Hat, Inc./OU=Global Services and Support/CN=Red Hat Test Root CA/Email=stronghold-support@redhat.com Last Update: Nov 13 20:57:25 2000 GMT Next Update: Dec 13 20:57:25 2000 GMT Revoked Certificates: Serial Number: 01 Revocation Date: Aug 31 21:19:17 2000 GMT Signature Algorithm: md5WithRSAEncryption 88:1e:5f:95:5a:3a:43:4e:2a:7f:0c:ad:bc:45:8a:aa:4e:ac: 7f:cc:f5:5a:1c:bb:da:03:79:0e:84:31:37:80:e7:12:16:04: 63:79:dc:16:7f:1d:42:75:36:34:b7:d9:50:c6:c1:a3:37:3f: a0:d7:b9:de:31:29:76:ae:56:00:17:c8:7e:83:eb:11:03:06: de:46:20:14:c5:ad:62:88:0c:a1:8e:92:f4:42:85:35:2f:cc: e6:e6:c7:d6:55:48:c3:20:e7:1b:c0:6e:c6:85:80:9c:77:44: 16:f3:9f:24:a5:a9:55:38:00:f6:04:d1:6d:bf:c4:bd:3f:f1: fc:cc If your CA issues CRLs that do not work with mod_ssl and have fields that differ from those in the example shown above, consult your CA administrator or software vendor to see if it can be reconfigured to issue CRLs with the same fields as above, and, if so, how. CRLs increase the security of Client Authentication Realms by enabling server administrators to block client certificates that have been revoked because they are known to have been compromised. Without CRLs, server administrators would have to wait for the client certificates to expire, or change CA certificates and issue new client certificates to all users who are still trusted to access the Realm. Waiting for the client certificates to expire would risk having untrusted users get into the Realm until expiration, while issuing and installing new client certificates to all users who are still trusted would be a great inconvenience both to server administrators and to users. CRLs allow server administrators to avoid this inconvenience by blocking revoked client certificates without affecting unrevoked client certificates. The authors would like to thank Shari Miller and Simona Nass for their comments on earlier drafts of this article. DBM User Authentication With more than a few users, keeping user passwords in a .htpasswd file can get inefficient and slow down page accesses considerable. DBM user files let sites efficiently store many tens or thousands of users (or more) with very quick access. This feature explains what DBM is, and how to use it with Apache. DBM User Authentication This week, we explain how to store user authentication information in DBM files for faster access when you have thousands of users. The feature on User Authentication shows how to restrict pages to selected people. We showed how to use the htpasswd program to create the necessary .htpasswd files, and how to create group files to provide more control over the users. We also said that .htpasswd files and group files like this are not very efficient when a large number of users are involved. This is because these are plain text files and for every request in the authenticated area Apache has to read through the file looking for the user. A much faster way to store the user information is to use files in DBM format. This article explains how to create and manage DBM format user authentication files. DBM files are a simple and relatively standard method of storing information for quick retrieval. Each item of information stored in a DBM file consists of two parts: a key and a value. If you know the key you can access the value very quickly. The DBM file maintains an 'index' of the keys, each of which points to where the value is stored within the file, and the index is usually arranged such that values can be accessed with the minimum number of file system accesses even for very large numbers of keys. In practice, on many systems a DBM 'file' is actually stored in two files on the disk. If, for example, a DBM file called 'users' is created, it will actually be stored in files called users.pag and users.dir. If you ever need to rename or delete a DBM from the command line, remember to change both the files, keeping the extensions (.pag and .dir) the same. Some newer versions of DBM only create one file. Provided the key is known in advance DBM format files are a very efficient way of accessing information associated with that key. For web user authentication, the key will be the username, and the value will store their (encrypted) password. Looking up usernames and their passwords in a DBM file will be more efficient than using a plain text file when more than a few users are involved. This will be particularly important for sites with lots of users (say, over 10,000) or where there are lots of accesses to authenticated pages. If you want to use DBM format files with Apache, you will need to make sure it is compiled with DBM support. By default, Apache cannot use DBM files for user authentication, so the optional DBM authentication module needs to be included. Note that this is included in addition to the normal user authentication module (which uses plain text files, as explained in the previous article). It is possible to have support for multiple file formats compiled into Apache at the same time. To add the DBM authentication module, edit your Configuration file in the Apache src directory. Remove the comment from the line which currently says # Module dbm_auth_module mod_auth_dbm.o To remove the comment, delete the # and space character at the right-hand end of the line. Now update the Apache configuration by running ./Configure, then re-make the executable with make. However, before compiling you might also need to tell Apache where to find the DBM functions. On some systems this is automatic. On others you will need to add the text -lndbm or -ldbm to the EXTRA_LIBS line in the Configuration file. (Apache 1.2 will attempt to do this automatically if needed, but you might still need to configure it manually in some cases). If you are not sure what your system requires, try leaving it blank and compiling. If at the end of the compilation you see errors about functions such as _dbm_fetch() not being found, try each of these choices in turn. (Remember to re-run ./Configure after changing Configuration). If you still cannot get it to compile, you might have a system where the DBM library is installed in a non-standard directory, or where the there is no DBM library available. You could either contact you system administrator, or download and compile your own copy of the DBM libraries (a good choice might be GDBM: read about it or download it). For standard (htpasswd) user authentication password files, the program htpasswd is used to add new users and set their passwords. To create and manage DBM format user files another program from the Apache support directory is used. The program is called dbmmanage and is written in perl (so you will need perl on your system, and it will need to have been compiled with support for the same DBM library you compiled into Apache. If you have only just installed DBM on your system you will might need to re-compile perl to build in DBM support). This program can be used to create a new DBM file, add users and passwords to it, change passwords, or delete users. To start by creating a new DBM file and adding a user to it, run the command: dbmmanage /usr/local/etc/httpd/usersdbm adduser martin hamster The creates the DBM file /usr/local/etc/httpd/usersdbm (which might actually consist of /usr/local/etc/httpd/usersdbm.dir and /usr/local/etc/httpd/usersdbm.pag), if it does not already exist. It then adds the user 'martin' with password 'hamster'. This command can be used with other usernames and passwords to add more users, or with an existing username to change that user's password. A user can be deleted from the password file with dbmmanage /usr/local/etc/httpd/usersdbm delete martin You can get a list of all the users in the DBM file with dbmmanage /usr/local/etc/httpd/usersdbm view Now you have a DBM user authentication file with some users in it, you are ready to create an authenticated area. You can restrict a directory either using a <Directory> section in access.conf or by using a .htaccess file. The feature on user authentication explained how you can set up a basic .htaccess file, using this example: AuthName "restricted stuff" AuthType Basic AuthUserFile /usr/local/etc/httpd/users require valid-user To use DBM files, the only change is to replace the directive AuthUserFile line with AuthDBMUserFile /usr/local/etc/httpd/usersdbm This single change tells Apache that the user file is now in a DBM format, rather than plain text. All the rest of the user authentication setup remains the same (so the authentication type is still Basic, and the syntax of require is the same as before). Each user can be in one or more "groups", and you can restrict access to people just in a specified group. This makes it possible to manage all your users on your site in a single database, and customise the areas that each can access. The use of DBM files for storing group information is particularly efficient because you can use the same file to store both password and group information. The dbmmanage command can be used to set group information for users. For example, to add the user "martin" to the group "staff", you would use dbmmanage /usr/local/etc/httpd/users adduser martin hamster staff You put a user into multiple groups but listing them, separated by commas. For example, dbmmanage /usr/local/etc/httpd/users adduser martin hamster staff,admin Note that dbmmanage has to be told the password as well, and there is no way to set or change group information for a user without knowing their password. This means in practice that dbmmanage is not suitable for managing users in groups, and you will have to write your own management scripts. Some help writing perl to manage DBM files is given later in this article. After creating a user and group file containing details of which users are in which groups, you can restrict access by these groups. For example, to restrict access to an area to only people in the group staff, you could use: AuthName "restricted stuff" AuthType Basic AuthDBMUserFile /usr/local/etc/httpd/users AuthDBMGroupFile /usr/local/etc/httpd/users require group staff The supplied dbmmanage script to manage DBM files is adequate for basic editing, but cannot handle advanced use, such as managing group information. It is also command line driven, while a Web interface might be a better choice in many situations. To do either of these things you will have to write programs to manage DBM files yourself. Using perl this is not too difficult. As a simple example, say you have an existing .htpasswd file and you want to convert it to a DBM file, putting all the users in a specific group. We will introduce the concepts here, and there is a link below to the completed program for you to download. It will be written in Perl which is quick to write and easy to customise, although the principles of DBM use are the same whatever language is used. The basic way to look in a DBM file is given here. DBM files are opened in Perl as 'hashed arrays'. The "key" is the user name, and the value is the encrypted password and optionally group information. A simple script to lookup all the keys and values in a DBM is: dbmopen(%DBM, "/usr/local/etc/httpd", 0644) || die "Cannot open file: $!\n"; while (($key, $value) = each %DBM) { print "key=$key, value=$value\n"; } dbmclose(%DBM); Note that if the given DBM file does not exist, it will be created. This script will work with both perl 4 and perl 5 (although Perl 5 users might prefer to use the new tie facility instead of dbmopen). To lookup a known key you would use: $key = "martin"; dbmopen(%DBM, "/usr/local/etc/httpd", 0644) || die "Cannot open file: $!\n"; $value = $DBM{$key}; if (!defined($value)) { print "$key not stored\n"; } else { print "key=$key, value=$value\n"; } dbmclose(%DBM); Now we can write a script to convert a htpasswd file into a DBM database, optionally putting each user into one or more groups. The script is htpasswd2dbm.pl, and is used like this: cd /usr/local/etc/httpd htpasswd2dbm.pl -htpasswd users usersdbm The -htpasswd option specifies the htpasswd file to be read, the the final argument is the DBM file to create (or add to). To set a group, use the -group argument. For example, to put all the users from this file into the groups admin and staff, use htpasswd2dbm.pl -htpasswd users -group admin,staff usersdbm The program will add users to an existing DBM database, so it can be used to merge multiple htpasswd files. If you give users from different files different groups, you will be able to set up access restrictions on a group-by-group basis, and manage all your users in one database. Note that if there is already a user with the same username in the DBM file it will be overwritten by the new information. Group information stored in a DBM file as part of the value. If no group information is stored, the key associated with a username just consists of the encrypted password. To store group information, the encrypted password is followed by a colon, then a list of groups that the user is in, each separated by a comma. So a typical key might look like this: E7yT67YGht65:admin,staff A program written in perl can easily extract the group information, for example: $value = $DBM{$key}; ($enc, $groupfield) = split(/:/, $value); @groups = split(/,/, $groupfield); It is also possible to store additional information in the DBM file, by following the groups list with a colon. Apache will ignore any data after a colon following the groups list, so it could be used, for example, to store the real name and contact details for the user, and an expiry date. This could be stored in the DBM like this: $DBM{$key} = join(":", $enc, join(",", @groups), $realname, $company, $emailaddr, $expdate); Keeping all the user information together in a database like this, which Apache can also use for user authentication, can make administering a site with many users simpler. Dynamic Page Languages From SSI to CGI via PHP and perl: which language should you use for your dynamic pages? Feature: Dynamic Page Languages When choosing how to generate dynamic pages there are serveral things to consider: Performance: dynamic pages require more work on the server, so are less efficient than static files, but some types of dynamic pages are more resource efficient than others. Complexity: dynamic features can be generated from relatively simple code build into HTML pages (called "embedded"), through to self contained programs written in C or perl, using the CGI interface. Security: some methods of generating dynamic pages allow you to use a programming or scripting language on your server. There is a risk of letting users access things on your system that they should not do if the pages are poorly written. Traditionally there were three ways of getting dynamic pages on your site: use "server side includes" (SSI) inside HTML pages, use a scripting language such as Perl or PHP, or use a compiled programming language such as C or Pascal. Both scripts and compiled programs were accessed using "CGI". But the distinctions are becoming more blurred. SSI as implemented in Apache 1.2 now has variables and conditional execution, making it more like a scripting language, while the PHP scripting language can be embedded into HTML pages. There is even a module to embed perl commands into HTML pages. Also, many scripting languages can be built into Apache as Apache modules, rather than using CGI. This makes executing the scripts much more efficient, since an interpreter does not need to be started for very request. There are two ways to get the server to run your programs: either embed a script into an HTML document, or create a standalone program which makes use of the CGI interface. Embedded scripts are easier to write but restrict you to the languages available for embedding, while CGI can be used with any language. The traditional embedded language is "Server-Side Includes" (SSI) but other scripting languages are available which can be embeded. Embedded commands are executed by the server before it serves the page to the client (so serving HTML pages containing embedded commands is slower than serving straight HTML pages). Embedded pages can be processed either by an Apache module or a CGI program. Using a module will be much faster. Languages available for embedded use include SSI, PHP, Perl and NeoScript (of these, SSI is built into Apache by default, while the others require a new module to be compiled in). The alternative to embedding the commands into HTML is to write self-contained programs. These usually use the CGI, or Common Gateway Interface, to work with the server. The CGI specification says how servers should talk to the script or program and how the script or program formats its reply for use by the server. CGI is not a language itself. If you know the CGI protocol you can write programs for use with a web server in any language. If you want better performance from your pages (by performance we mean low use of resources, resulting in more pages served more quickly), you should use either a pre-compiled language (such as C) and CGI, or a scripting language which is available as an Apache module. In the case of the perl and python modules, preload scripts or data that will be used often. If you are thinking of using CGI, you might consider using FastCGI instead. FastCGI is an alterative method of running programs from a server which has several new features and is more efficient than normal CGI. If your CGI is in perl, think about using mod_perl to pre-load the perl scripts (and, where possible, to open database and similar connections when Apache starts and re-use them across multiple requests). Of course the best performance can be obtained by using static pages instead of dynamic ones. You might consider pre-generating HTML files, rather than serving up dynamic pages if possible. For example, if your readers access pages from a database, it might be faster to export those pages into HTML every so often, rather than lookup the records in the database for every request. Alternatively (or in addition) consider using a local cache in front of your Apache server. The client would connect to the cache first, and if that page has already recently been requested, the cache would return it without calling the server. This sort of local cache is also called a "server accelerator". Your dynamic pages will have to be set up to allow them to be cached though (SSI pages, for example, are usually not cacheable). Security is a very important considerable when thinking about dynamic pages. All CGI programs, both scripted and compiled, are potentially insecure. You have to be very careful when writing CGI programs, for instance, to ensure that Internet users cannot execute programs on your server or read files they should not have access to. Another security issue which might be important is related to other local users. For example, you might want to let your customers or colleagues use a dynamic language. But if you let them write CGI programs they could write a program which accesses other people's files (since by default all CGI programs run as the same user). More limited scripted languages (such as SSI) might be safer in this situation. Finally, here is a reference list of ways of including dynamic pages on your site. Language Embedded? Apache Module? Description SSI Yes Yes Traditional "Server Side Includes" allow simple dynamic pages. Apache 1.2 extends SSI to include variables and conditional code. Already part of Apache. Because of the restricted range of commands this can be more secure than other languages, and Apache has the ability to turn off some less secure features. PHP Yes Yes A more comprehensive embedded language than SSI, with built-in support for many databases (such as mSQL, mySQL, DBM), page counters. NeoScript Yes Yes An embedded scripting language based on Tcl. Meta-HTML Yes No An extended version of SSI. Python No Yes Python is an interpreted object-orientated language. This module builds the Python interpreter into Apache for better performance than normal CGI. embedded Perl (ePerl) Yes No Perl is a powerful general purpose interpreted (scripting) language. This module lets you embed arbitrary Perl commands into your HTML. mod_perl Yes Yes Perl is an advanced interpreted language. This very powerful module integrates Perl into Apache, letting you pre-load Perl scripts, re-use resource across multiple requests, and even write whole Apache modules in Perl. This gives you much more access to and control over the server than CGI programs in Perl (which this module also supports). The ability to write modules in perl makes it possible to extend the server's functionality relatively easily, without the complexity of writing a module in C. Compiled languages (C, Pascal, Fortran, etc) No No Facilities available depend on language. Usually more efficient than scripted or embedded languages. Has to be written to use CGI protocol, or an equivalent such as FastCGI. Scripting languages (Perl, Python, shell, etc) No* For Some Languages Facilities available depend on language. Unless an Apache module is available, has to be written to use CGI protocol. When using CGI is less efficient that compiled languages or scripting languages using an Apache module. (Note: * Perl can be embedded if eperl or mod_perl is used). Java No Yes Use mod_jserv to call Java "servlets" from Apache. It is impossible to recommend the "best" dynamic page language since what is best will depend on your needs. However some general conclusions can be drawn. If you do not already know a scripting or programming language, use one of the embedded languages. SSI is probably the simplest, but PHP has some useful extra features. If you want a language than is quick to develop in and efficient, use an embedded language such as PHP or embedded perl, or use perl with mod_perl. If you prefer other scripting languages, use one with an Apache module (e.g. python). If you already use perl CGI programs, consider moving over to using mod_perl, which will give you much better performance and more control over the server. If you want a "full" programming language for arbitary programs, either use any compiled language (e.g. C) or use perl with the perl module. If you've been put off Perl because of concerns about performance, think again. The module makes it very efficient, and the ease of development and large range of add-on perl modules (packages) make developing applications more convenient. To make external CGI programs more efficient, use FastCGI instead of CGI, or write in Perl and use mod_perl. For Java programs, use mod_jserv. The final way to make a top-performance dynamic page is to write an Apache module. This is complex and requires care to ensure that you do not "leak" resources or affect the rest of the server, but will give the best performance. Modules have to be written in C (although it might be possible to link in other languages). An alternative to writing modules in C is to use mod_perl, which lets you develop Apache modules in perl. Apache 1.2 Guide A guide to everything new and changed in Apache 1.2 Major Features The biggest single change in Apache 1.2 is the support for HTTP/1.1. However there are also major changes to simplify configuration, provide better help, speed up network transfers, log requests to multiple files, switch UID for running CGIs, use regular expressions in various places, make debugging CGI easier, and more. Apache 1.2 is fully compliant with the new HTTP/1.1 standard (except for the proxy module). Some of the power of HTTP/1.1 support will not be apparent until browser are available which implement it. The major changes are: All possible status values are now defined Byte ranges fully implemented for receiving Content negotiation by content type, language, charset and encoding Content negotiation can return 406 status with a list of possible variants, if none are suitable for the browser's preferences Much better cache control with Cache-Control and Vary headers, and use of entity tags (etags) New preconditions with If-Match, If-None-Match, If-Range, If-Unmodified Since request headers New request methods OPTIONS and TRACE join the existing GET, PUT, POST etc Persistent connections implemented, and internally copes with some known buggy browsers Resources can be in multiple languages Sends a 'etag' with the response where possible (i.e. if sending a file), which can be used for more efficient caching Support for reading and sending 'chunked' encoding The default handler can send byte ranges and multipart documents Configuration process simplified Configuring Apache is now much easier. The Configure script automatically identifies the operating system and compiler to use. These can still be set in Configuration if required. Many more operating systems are now supported. A Makefile is created in the support directory. Better Help, Documentation and Bug Tracking Various updates to provide help: the new -h option lists all the available directives, while -l lists available (compiled) modules. The descriptions of the directives has been updated and expanded. A -v option gives the version number of Apache. The full Apache documentation comes in the distribution, while a new FAQ and comprehensive bug tracking database are available on www.apache.org. Network Improvements Persistent connections are now faster, and are used in more cases. Network traffic has been reduced. Persistent connections are not used if the browser appears to be one that has a bug in its implementation. Graceful Restarts to Avoid Dropping Connections Apache can be told to re-read configuration files and re-open log files, without dropping connections in progress, as currently happens with a -HUP restart. Better Logging The configurable log module is now the default. It can now log each request to multiple log files, each in a different format. There are several extra items which can be logged: filename (%f), notes from other modules (%n), port of request (%p), PID of child handling request (%P), formatted time (%t), time to service request in seconds (%T), URL path requested (%U) and name of server or vhost (%v). More Control over Files It is now possible to apply directives to individual files with <File>, which can appear in access.conf or .htaccess files. Multiple files can be selected using regular expressions (which can also now be used in <Directory> and <Location>). Running CGIs as Other Users A helper program (suexec) can be configured to run CGI scripts as other users. If the CGI is in a public_html directory, it can run as the user whose directory it is in, or a user can be set for each virtual host. Various security checks are performed before running CGI as another user. More NCSA-Compatibility Some directives have been updated to be more compatible with the NCSA HTTPd. The Satisfy, RedirectTemp and RedirectPerment directives are now implemented. AuthUserFile and AuthGroupFile can now take an argument to specify dbm format files. KeepAlive and MaxKeepAliveRequests are NCSA compatible. Easier CGI Debugging It is now possible to log the input and output of a CGI script when an error occurs. This will make debugging CGI programs much easier. More Includes Directives Server-Side-Includes (SSI) have a number of important new features. Variables can now be set and tested, and regular expressions can be used. Code can be conditional, using if...endif directives. Content Negotiation Enhanced Content negotiation has been updated to meet the HTTP/1.1 specification. In addition some special cases are catered for to cope with browsers which currently send incomplete negotiation information. Better Control over Options Options can now be set or removed on an individual basis, rather than having to set all the options at once. More Configurable Authentication It is now possible to restrict pages by username and password, but to let users from particular domains access the pages without giving a password. This is implemented with the Satisfy directive. Restrictions can be applied to individual files with <File>, and to files which match a regular expression. SetUID Execution of CGI Programs CGI programs can be executed as other users (on a per-virtual host or per-userdirectory basis) if the optional suEXEC code is compiled. Conditional Modules and Directives Part of the configuration files can be made conditional, depending on what modules are currently loaded. The <IfModule>...</IfModule> section surrounds directives which are only executed a particular module is loaded (or not, if the test is negated). Compiled in modules can be activated or disabled, with ClearModuleList and AddModule. Preventing Too Many Resources Being Used New directives can set total amount of resources that can be used to child processes (such as CGI scripts). This can be used to prevent run-away scripts from taking over the system. The resources which can be limited are: cpu usage, virtual memory usage and number of (sub-) processes. This feature is available on operating systems which implement these restrictions. Virtual Host can Handle Multiple Addresses and be a Default Each virtual host can now be configured to handle requests on multiple addresses, by listing the addresses in the <VirtualHost ...> directive. Also a virtual host can be defined to accept requests not handled by any other host (instead of leaving them to the main server configuration). Can Return HTTP Redirect Permanent, Gone or See Other Status The Redirect directive has been enhanced to allow for additional response codes. The current Redirect directive always returns a "temporary redirect" code. In 1.2, the redirect code can also be "permanent redirect" or "see other", or a resource can be marked as "gone" (permanently removed). Better and More Robust Performance The code has been cleaned for easier maintenance and to fix various bugs. Error conditions are dealt with better, including network problems, timeouts and signals. It is better commented. Various performance optimisations have been applied to enhance speeds. Network traffic has been reduced where possible by sending larger blocks of data. Persistant connections are used if possible, even after error statuses. Major Changes to the Proxy Module The proxy module has been extensively updated for this release. It is not yet compliant with HTTP/1.1. There are a lot more smaller changes, some of which are listed here: BADMMAP compilation directive removed Checks to see if Apache is linked to modules compiled with a previous version of the module API. Checks argument to Port directive is a number and not 0. Cookies used by the usertrack module are not sent by default, unless enabled by CookieTracking. The initial cookie request is now logged. The CookieLog directive is deprecated. Does not flush output after headers (with was a 'hack' to get around a bug in keep-alives in a some versions of Netscape. Apache now does not use keep-alives if this version is being used) The maximum value of MaxClients has been increased from 150 to 256. Attempts to set a value higher than this will display a warning message. Compilation rule to tell IRIX that NIS is running (Rule IRIXNIS=yes) Some systems failed to notice when the child Apache processes died, leading to scoreboard entries for dead processes. An explicit check for dead processes is now performed each 60 seconds, and the scoreboard updated if necessary. CGI programs can get the port on the remote system in the environment variable REMOTE_PORT and the original URI is REQUEST_URI. Error code number not shown in <h1>..</h1> on error page As defined in HTTP/1.1, an empty Accept-Encoding: request header means that no encoding is acceptable (previously it meant any encoding was acceptable) Status screen output has been tidied up, and now also lists the server host name servicing the request (the virtual host or main server) Responses can be marked as HTTP/1.0 rather than HTTP/1.1 if the force-response-1.0 environment variable is set Access can be denied based on which environment variables are set Return 404 status on POST to bad URL (previously used 405) Linux now defaults to shared-memory scoreboard (not available on 1.2 kernels, or Alpha hardware) Better error_log messages, including Unix system call error status Modules can be placed in separate directories If virtual host cannot be configured (hostname cannot be resolved) then Apache continues to start-up but disables this virtual host. Can now work-around bugs in MSIE and Netscape Navigator when serving PDF files, and bug in Navigator which cause cause broken images. Modules re-ordered to allow rewrite and alias modules to process requests before they are handled by the proxy module (if enabled). Preserve query_string information during a redirect. If the client connects but does not send a request, log a 408 ("Timed Out") error instead of a OK response (200). Major Modules Changes The following modules have been added to this version of Apache. Of these, only mod_browser is compiled in by default. The other modules here are optional, and to use them you need to uncomment the appropriate line in Configuration and re-compile Apache. API Example Module (mod_example) This example module can be used to see how Apache processes requests. It is not compiled in by default and should not be used in a "production" server. FastCGI (mod_fastcgi) This module implements the FastCGI method of invoking sub-processes, which is faster and more configurable than CGI. It is available from the FastCGI site and is not part of the Apache distribution. Set Response Expiry Times (mod_expires) This module can be used to set 'expiry' times on responses. This can be used to tell caches about the expected life-time of resources, to make caching more efficient or to prevent users seeing out-of-date information. Set or Remove HTTP Headers (mod_headers) This module allows individual HTTP headers to be set or removed. Set Environment Variables based on Browser (mod_browser) This module can be used to set environment variables based on the 'user agent' that created the request. This could be used to set environment variables based on the capability of the browser. Rewrite Requested URL (mod_rewrite) This module provides a generic way of re-writing the incoming request URL based on various aspects of the request. Cookies module renamed Usertrack (mod_usertrack) The cookies module (mod_cookies) has been rename usertrack (mod_usertrack) to prevent confusion over what it does. As in previous releases, this module is not compiled in by default. Config log module replaces common log (mod_log_config) The common log module (mod_log_common) has been replaced by the configurable log module (mod_log_config) as the default log module. This module has been enhanced to allow multiple log files, so it can also replace most of the functionality of the mod_log_referer and mod_log_agent modules (although it is not a complete replacement for these modules). Directive Changes This section lists the directives which are new in this release, or which have changed their behaviour or syntax. Note that only the modules compiled in by default are covered here, and the directives provided by the new modules are not listed (see the documentation for the module concerned for its directives). <Files>... </Files> section applies directives to individual files, or files that match a wildcard or regular expression. <IfModule>...</IfModule> make directives conditional depending on which modules which are compiled in CustomLog adds a transfer log with a custom MaxKeepAliveRequests sets the number of requests per connection instead of KeepAlive RLimitCPU, RLimitMEM and RLimitNProc limit resource usage of sub-processes Redirect can take an optional first argument giving the status value to return (one of temp, permanent, seeother, gone or a numeric status). RedirectTemp and RedirectPermanent added for NCSA-compatibility (but Redirect status should be used instead). ScriptLog set a logfile for CGI debug output ScriptLogBuffer set a maximum size for PUT or POST data logged to a ScriptLog file ScriptLogLength sets an overall maximum size for a ScriptLog logfile SendBufferSize sets the size of the TCP send buffer <Location> now only matches full URL segments (<Location> /i does not match URL /info, for example) <Location> and <Directory> can match the URL or path (respectively) against a regular expression <VirtualHost> can take multiple addresses Anonymous_Authorative has been renamed to Anonymous_Authoritative AuthDigestFile can take optional second argument of "standard" (for NCSA compatability) AuthUserFile and AuthGroupFile are now NCSA compatible, with an optional second argument which can be either dbm or standard (dbm is only valid if the optional mod_auth_dbm module is compiled in) Auth_MSQL_Authorative has been renamed to Auth_MSQL_Authoritative deny has been updated to allow an argument of user-agents followed by a list of user-agents to deny access IdentityCheck timeout now 30 seconds rather than 60 KeepAlive now takes an "On" or "Off" argument, rather than a number (if a number if used, 0 means Off while any other number means On). If switched on, the default requests per connection is 100. See also MaxKeepAliveRequests. Options can set or remove individual options, instead of replacing all the options currently in force Timeout defaults to 300 seconds instead of 1200 TransferLog can now be used more than once in each main server or virtual server User and Group can be set inside virtual host sections, and are used when running sub-processes (e.g. CGI) if the server is configured for setuid execution In all directives, a backslash character (\) now only escapes quotes or / chars (e.g. XXX "123\"456" gives argument 123"456. Previously \ could escape any character Configuration and Support Program Changes The conf directory contains examples of the four configuration files needed: httpd.conf, srm.conf, access.conf and mime.types. Each of these files has been updated slightly. httpd.conf Example BrowserMatch directive is given, which disables keep-alives for browsers which had a buggy implementation. srm.conf No changes (except in the sample domain names) access.conf An example <Location> section to log attempts to access the phf CGI program is given. phf has a security hole which is actively being exploited, and should immediately be removed. This example shows how to log people trying to access this program, possibly in an attempt to hack your site. The logging is done at the apache.org site, or you can log it locally using a supplied CGI program in the support directory. mime.types A type has been added for midi files, and removed for .gz and .Z files (they should be marked as an encoding type, not a media type). In all files, all domain names have been replaced with names that can never occur on the Internet. A new CGI program phf_abuse_log.cgi is provided which can log attempts to access the phf CGI program. The program suexec is provided as C source. If compiled, this can be used with Apache to allow for the execution of programs as users other than the default server user. It makes extensive checks before it runs the CGI as another user to prevent security problems. Other than these two new programs, there are no functionality changes to the programs in the support directory. The C programs have been updated to prevent compiler warnings on some systems, and the perl dbmmanage now creates passwords with a random 'salt'. Apache 1.3.6 Guide A guide to everything new and changed in Apache 1.3.4 New in 1.3.6 This is a guide to all the changes between Apache 1.2 and Apache 1.3.6. For each change, we say which version it was introduced in, so you can also use this feature to upgrade between 1.3.* versions. First published 25th September 1998. Last updated 26th March 1999. Apache 1.3.6 was released on 25th March 1999 and is now the latest version of the Apache server. The previous release was 1.3.4 (version 1.3.5 was never made publically available). Apache 1.3.6 is available in source form for compiling on Unix or Windows, in pre-compiled form for many common versions of Unix, and in pre-compiled for as an single-file installer on Windows. All the pre-compiled forms also include full source code. All all available for download from any Apache local download site. This is a bug fix and minor upgrade release, with a few new features. Users on Unix systems should upgrade to fix various bugs. Users on Windows systems should consider whether to upgrade, becausee htpasswd files that worked with 1.3.4 and earlier will not work with 1.3.6 unless updated. The main new features in 1.3.6 (compared to 1.3.4) are: Logging can be conditional based on whether an environment variable is set or not (see the CustomLog directive). mod_rewrite has much faster DBM and TXT maps through the use of an internal cache. Passwords in htpasswd files can be encrypted with MD5 instead of DES. On windows this allows encrypted passwords for the first time, using the new bin/htpasswd.exe program. Access restrictions can be applied to all methods (known and unknown) apart from specific named ones, with the new <LimitExcept...> section. On Windows, additional Start menu items have been added and the bug where the conf files where not being created has been fixed. On Windows, it is now possible to tell Apache to use the registry to find how to execute CGI scripts based on the file extension, with the new ScriptInterpreterSource directive. New in 1.3.4 There are several new features in 1.3.4 compared to 1.3.3: A default language for documents can be set with the DefaultLanguage directive. Mappings from file extension to handler can be removed with RemoveHandler The negotiation module has been extensively updated to support the latest version of the HTTP/1.1 specification, to fix various bugs and inefficiencies, and to add some support for the transparent content negotiation RFCs. All the new HTTP/1.1 methods required for WEBDAV (distributed authoring) have been added, so that they can be used by third-party modules to implement the DAV specifications. A default order for fancy directory indexes can be set with IndexDefaultOrder. New options have been added to ./configure: --target sets the executable name, --permute-module sets relative module order, --with-layout sets the directory layout and --shadow has been extended to specify the shadow directory name. There have been a number of important security fixes to Apache on Windows. The most important is that there is much better protection against people trying to access special DOS device names (such as "nul"). In addition, there is better processing of UNC paths, and Makefiles are now provided to allow Apache to be compiled on Windows 95. Apache 1.3.3 and earlier came with three configuration files in the conf directory: httpd.conf, access.conf and srm.conf. This was for purely historic reasons: any directive can appear in any file, and the configuration files can have any filename (although the configuration file defaults to conf/httpd.conf unless overridden with the -f command line option). Many people configure Apache using a single file, normally httpd.conf. This can be created by appending the contents of access.conf and srm.conf to httpd.conf, then removing access.conf and srm.conf. Apache 1.3.4 comes with this already done (although the access.conf and srm.conf files will exist containing a comment about why they are now empty). New in 1.3.4 compared to 1.2 There are many new features in Apache 1.3.4 when compared to Apache 1.2. The major features are: Support for Windows NT systems Apache now compiles and runs on Windows NT. It will also work, with slightly less functionality, on Windows 95. The current 1.3.4 release is not as well developed as the Unix version, and will be slower and may include some security problems (although it is much better than earlier 1.3 releases). For now it should be regarded as a "beta" quality release on Windows. See the separate section below on Apache for Windows. Better configuration and building process The Apache source files have been re-organised. Modules have been moved into sub-directories directories, making it easier to add additional modules. OS specific code has been moved into separate directories. A new command-line way of configuring and installing Apache has been added. The source file re-organisation has made it easier to add third-party modules. They can be dropped into a directory and, with the appropriate configuration command at build time, Apache will create the Makefile for the module and build it. Larger modules can have their own directory, and can integrate easier into the build process. If modules require additional libraries to command line arguments, they can add the required options themselves during the build process, without the user having to edit the Configuration file. The new way of configuring and building Apache is refered to in the source tree as "APACI". This provides a command-line method of configuring Apache rather than editing the "src/Configuration" file. This method also builds a Makefile which can be used to install Apache after it has been built. APACI consists of a new configuration program, called "configure", which should be given details of all the build options such sa destination directory, modules to be built and included, compiler to be used, and so on. This is the information previously placed into the "src/Configuration" file. "configure" will use a different directory structure during installation than the normal Apache layout, unless the --compat option if used. Support for dynamic modules Apache now supports loading of additional modules without having to recompile the source. This is refered to as "DSO" or "Dynamic Shared Objects" on Unix, and "DLL" on Windows. This means that a small Apache executable can be created, and other modules added as required. It also lets module developers release or sell modules in binary only form, ready to be loaded into a running Apache. With graceful restarts it is even possible to add or remove modules while Apache is running without any downtime. DSO and DLL functionality is provided by the new module mod_so. Modules can be built ready for dynamic loading with new directives in the src/Configuration file, or using APACI's "configure" script. Using the latter can also automatically build a correct configuration file for loading the dynamic modules. A program is also provided to build modules for dynamic loading without using the Apache source tree. Dynamic modules are supported on these operating systems: Windows, FreeBSD, OpenBSD, NetBSD, Linux, Solaris, SunOS, Digital UNIX, IRIX, HP/UX, UnixWare, AIX, ReliantUnix and generic SVR4 platforms Better performance There have been considerable internal changes to make Apache perform better than 1.2. Some of the more important changes are: the code which merges per-directory configurations (<Directory> sections) is more efficient, IP virtual hosts are looked-up in a hash table, less system calls are used when serving static pages, faster adaption to load spikes, less copying of data when assembling responses for sending to the client, and so on. Better security Public web servers are always open to the risk that someone will try to attack the server. Apache is carefully written to try to eliminate as far as possible the damage that this can cause. The most serious type of attack is where the attacker can gain some kind of unauthorised access to the server system. There are no known ways of doing this with recent versions of Apache. So attackers may decide to use a "denial of service" attack. This is where they know that they cannot get into the system, so instead they try to overload the server to prevent it being used by anyone else. Obviously there is little that can be done when someone decides to attempt to overload the server by sending more and more requests, because those requests are usually indistinguishable from real requests. The load on the server in this case will increase in direct relationship with the speed of the attack. However in Apache 1.2 there were some ways in which the attacker could make the load on the server increase much more rapidly than the speed of the attack. These have been eliminated in 1.3. To help server administrator limit the amount of resources used by attackers, there are now also a series of new directives which can be used to specify limits on the size of each request. The size of the request line, the number of request headers, the size of the request header lines, and the size of any request body can now all be limited. If the server administrator does not trust users on the server system (for example, if the server is a multi-user system for customers to provide web documents), there were additional potential denial of service attacks available in 1.2. These included putting extra long lines in .htaccess files or creating .htaccess files that were actually special devices. These have been eliminated in 1.3.2. Enhanced virtual host configurations Virtual host handling has been updated. For IP based virtual hosts, finding the virtual host for a given request is faster because the configurations are stored in a hash table. For name-based virtual hosts, the configuration has been made less ambiguous. It is now necessary to explicitly state which IP:port combination will be used for name-based requests, and requests coming in on this IP:port will only get served by virtual hosts defined for that IP:port. See Apache name-based virtual hosts. The order that virtual hosts are used in the configuration file has been reversed from Apache 1.2. Now the virtual hosts listed first in the configuration file have priority over those listed later. To help debug virtual host configurations, the new command line option -S displays how Apache has parsed the virtual host information in the configuration files. The features above are the major changes between 1.2 and 1.3.4. This section lists most of the remaining changes, sorted into some broad categories. As well as new features, 1.3.4 has a lot of bug fixes compared with 1.2.X. Configuration parsing: Multiple whitespace characters within quoted strings in configuration files are no longer compressed to a single space (1.3.2) Better error checking in configurations: reports missing closing section directives, reports if directive which are not valid within <VirtualHost> sections are used in a <VirtualHost> section, reports invalid multiple arguments to <Files>, <Directory>, etc (1.3.0) <DirectoryMatch> sections are applied after all <Directory> sections have been applied (1.3.0) Include directive added to read additional configuration files (1.3.0). Extend to allow Include directive in .htaccess and <Directory> sections (1.3.2) Command line options: Add a -t command line option for testing the syntax of the configuration files (does not check .htaccess files) (1.3.1) Add ability to process configuration directives given on the command line. The option -c "directive" gives a directive to process before reading the configuration files, and -C "directive" gives a directive to process after reading the configuration files (1.3.0) New command line option -V displays the options used when compiling Apache (1.3.0) New command line option -S displays the virtual host configuration (1.3.0) The -S option now does not attempt to start the server: it will exit after showing the virtual host configuration (1.3.4) The -h, -l and -L options have changed meaning in 1.3.4. Previous -? gave a list of options, -l gave a list of directives and -h gave a list of modules compiled into the server. In 1.3.4, -h gives a list of options, -l gives a list of modules and -L gives a list of directives (1.3.4) Child processes, CGI and SSI: Does not pass invalid environment variable names to child (CGI) processes. Any invalid character in a variable name is replaced with an underscore (1.3.0) REMOTE_HOST environment variables is not set if IP cannot be resolved to a hostname (1.3.0) Add SERVER_SIGNATURE environment variable containing the sigature as controlled by ServerSignature directive (1.3.3) Add VARIANTS environment variable from the spelling module containing list of possible matching URLs (1.3.3) Logging and error messages: The default text of a 404 error message changed from "File Not Found" to "Not Found" (1.3.0) In log formats, %a logs the client IP address and %h now logs only the hostname (never an IP address). If no hostname is available for a given IP address, %h logs "-" (1.3.0) In log formats, %v and %p log the server name and port from the configuration files, not the request (1.3.4) In log formats, %V logs the hostname of the request, subject to the setting of UseCanonicalName. This is the same behaviour as %v in 1.3.3 and earlier (1.3.6) Does not log an error about "handler not found" if a handler was found, but declined to serve the request (1.3.1) The Apache parent process will log the reason why a child process dies, if it dies from an unexpected signal (1.3.0) Logs client IP addresses in error_log messages (this was in 1.2, but not in 1.3.0 or 1.3.1. It is restored in 1.3.2) Fix problem where mod_usertrack could corrupt the client hostname in the log files (1.3.1) The reason for "500 Server Error" responses is passed to error documents in the ERROR_NOTES environment variable (1.3.2) Logging can be conditional depending on whethe an environment variable is set or not (1.3.6). Proxy: More accurate error responses can be returned from the proxy (1.3.6) The proxy module now handles invalid responses from IIS (1.3.2) Proxy module now prompts for FTP username and password, if required, to avoid storing that information in URLs and the access_log (1.3.2) The proxy module now accepted reject requests with URL syntax http://host:/path (1.3.4) Performance: More efficient <Directory> and <DirectoryMatch> section matching (1.3.0) More efficient virtual host matching. Address * behaves like _default_ (1.3.0) More efficient use of network: combines smaller network writes (1.3.0) Faster response to load spikes, by first spawning one new child, then the next second two, then four and so on up to 32 children per second, until there are enough idle servers (1.3.0) Efficient unbuffered CGI. As soon as the CGI stops sending output, it will be sent to the client. This replaces the old scheme where output was buffered up to a fixed size, or until the CGI process exited. This also replaces the old "nph-" prefix for getting unbuffered CGI output (which was not compatible with HTTP/1.1 or SSL layers anyway) (1.3.0) Security fixes: Directives to limit size of requests, to avoid denial of service attacks based on sending extra big requests. Eliminate unnecessary processing when handing requests (1.3.2) Avoid denial of service attacks if a configuration file (such as a .htaccess file) is a device file, by refusing to open device files apart from /dev/null which is still valid (1.3.0) Correctly handle over-long lines in configuration files (1.3.0) Fix denial of service attack by sending requests with lots of slashes in them (1.3.0) Deny access to directories if a .htaccess file in that directory cannot be read (1.3.0) Better name-based virtual host support, using new directive NameVirtualHost. This gives the port:IP of interfaces which are used for name-based virtual hosts. Requests on this port can only match <VirtualHost> sections defined on that port:IP combination. Also reverse order of matching of <VirtualHost> sections so earlier sections override later ones(1.3.0) Detach from stdin, stdout and stderr after reading configuration files, so Apache can be started via rsh, etc (1.3.0) Directory indexes now dynamically size the width of the filename column (1.3.2). Columns can be sorted (1.3.0) Do not kill connections in progress when a TERM (shutdown) signal is received (1.3.0) Experimental support for passing symbols required by the Apache core through dynamic modules onto libraries loaded by those modules (Rule SHARED_CHAIN). (1.3.2) Expires headers will now be returned for content which is served from sources other than files, if configured with mod_expires (1.3.2) Header files can be included into C++ code (1.3.0) mod_negotiation has been overhauled to bring it up to the latest HTTP/1.1 revision 6 specification and to support some of the transparent content negotiation drafts (1.3.4) mod_negotiation also works around a bug in Lynx where it sends a header saying it understands transparent content negotiation, but it does not (1.3.6) mod_rewrite now correctly sets the HTTP/1.1 Vary: response header if decisions are made based on request headers (1.3.2) mod_rewrite has much faster DBM and TXT maps through the use of cacheing. (1.3.6) mod_status is now included by default. The new directive ExtendedStatus can be used to turn this module on (1.3.2) New script apachectl to start, stop, restart and check the status of Apache (1.3.0) SIGPIPE is no longer reserved for use by the Apache core while sending a response (1.3.6) Support for DES and MD5 encrypted passwords (1.3.6) Support has been added for the HTTP methods defined in the distributed authoring drafts (WEBDAV) (1.3.4) Support has been added for the new Expect: response header, as introduced in HTTP/1.1 revision 5 (1.3.4) The configuration directives are now all given in httpd.conf, and the default access.conf and srm.conf are empty (1.3.4) The PID file is removed when Apache exits (1.3.2) The meta information module (mod_cern_meta) can be configured on a per-directory basis (1.3.0) The status page now shows the "generation" of each child process (1.3.6) Try to avoid problems with eight-bit characters in URLs and configuration files (1.3.1) Use the supplied regex library on all systems, unless explicitly told otherwise (1.3.0) Various year 2000 compliance changes (these are minor changes, in things like log messages) (1.3.0) Major Modules Changes The following modules have been added to this version of Apache. Of these, only mod_setenvif is compiled in by default. The other modules here are optional, and to use them you need to uncomment the appropriate line in Configuration and re-compile Apache. Dynamic loading of modules (mod_so) The mod_dld module from previous releases has been removed and replace with a much improved replacement, mod_so. This module supports dynamic loading of modules on most Unix systems and on Windows. This module was added in 1.3.0. Conditionally set environment variables ( mod_setenvif) The mod_setenvif module can be used to set environment variables based on headers on the incoming request or other aspects of the request (for example, the client hostname). The replaces the mod_browser module which set environment variables based on the User-Agent request header. This module was added in 1.3.0. Fix typos in URLs (mod_speling) This module can be used to correct simple typing errors is requested URLs, based on looking at real directory and file names. This modules was added in 1.3.0. Generic unique ID for every request ( mod_unique_id) This module generates a unique identifier for every hit. It was added in 1.3.0. Automatically work out MIME type ( mod_mime_magic) This module can be used to return a MIME type based on the contents of the file being served. This is similar to the Unix "file" command. Added to 1.3.0. Directory indexing module ( mod_autoindex) This new module contains the directory indexing functionality previously provided by mod_dir. See the section on mod_dir below. API Example (mod_example) This module provides example code for module developers. mod_dld replaced by mod_so See section above about mod_so. mod_browser replaced by mod_setenvif See section above about mod_setenvif. mod_dir split into two modules (mod_dir and mod_autoindex) The mod_dir module has been split into two modules. Both are included by default in an Apache build. The new mod_autoindex module supports creating directory indexes. The updated mod_dir now just supports the basic functionality of trailing-slash redirects and DirectoryIndex files. This means that if directory indexes are not required, the large mod_autoindex module does not need to be compiled into Apache. (Updated in 1.3.0) mod_auth_msql removed This module is no longer supplied with Apache, because there are a lot of possible databases and it is not possible to include all database modules into the Apache distribution. (Removed in 1.3.0). New and Updated Ports This section contains summaries of changes for more unusual systems or systems not widely used by the main Apache developers. Sometimes these ports are not maintained after their initial inclusion in the Apache source tree. Changes to support the major platforms used by Apache developers (such as FreeBSD, Linux, IRIX and Solaris) are not listed here. Changed the name of the "OS/2" port from "__EMX__" to "OS2" (1.3.2) New port and binaries available for Windows NT (1.3.0) New port to Acorn RISCiX (1.3.0) New port to BeOS (1.3.0) New port to Cyberguard V2 port (1.3.4) New port to DRS 6000 (1.3.3) New port to Encore UMAX V (1.3.0) New port to HP UX 11 (1.3.0) New port to Linux with glibc (e.g. RedHat 5) (1.3.0) New port to NCR MP/RSA 3.0 (1.3.0) New port to PowerMAX OS (1.3.4) New port to Rhapsody (Mac OS X) (1.3.2) New port to SCO SV (1.3.0) New port to SONY NEWS-OS (1.3.0) New port to Sequent (1.3.0) New port to Siemens Nixdorf BS2000-OSD (1.3.0) New port to UnixWare 7 (1.3.1) New port toNEC EWS4800 (1.3.2) Recongnise UnixWare 7.0.1 (1.3.3) Updated support for ARM Linux (1.3.1) Updated support for LynxOS (1.3.0) Updated support for MPE (1.3.0) Updated support for NCR SVR4 (1.3.1) Updated support for NEXTSTEP (1.3.1) Updated support for QNX 32 bit systems (1.3.1) Changes for Apache on Windows Apache 1.3.0 was the first full release of Apache to support Windows systems. Some of the most important changes since the last 1.3 beta release are listed here. Add support for encrypted passwords (encrypted with the MD5 algorithm). Added bin/htpasswd to create and modify MD5 passwords (1.3.6) Errors from running Apache with -i or -u command line arguments are now displayed on the console rather than sent to the error log (1.3.6) Compile time default for the error log filename is error.log rather than error_log (1.3.6) New directive ScriptInterpreterSource which configures Apache to find a CGI file interpreter via the registry rather than via the #! line in the CGI file itself (1.3.6) The Apache executable now contains an icon (1.3.6) The binary installer now creates additional Start menu options for shuttind down a running console application and to uninstall the NT Apache service (1.3.6) Remove limit of 64 threads per process (1.3.2) Remove trailing "."s in path components, which are ignored by windows when accessing files so could be used to bypass security settings (1.3.1) Eliminate directory components consisting of three or more dots (e.g. "...") which can cause security problems (1.3.1) Make IndexIgnore case insensitive because the Windows filesystem is (usually) case insensitive. Set current working directory for CGI scripts (1.3.0) Pass environment variables to CGI scripts (1.3.0) Add ability to gracefully shutdown or restart Apache on Windows 95, without pressing Control-C in the Apache console window (1.3.3) Allow CGI child processes to die properly if the client aborts the connection (1.3.3) Handle paths like D:/ correctly (1.3.3) Handle drive letters sub-requests properly (1.3.3) A running console version of Apache can be restarted or shutdown with the -k command line option (1.3.3) Makefiles have been added to allow Apache to be build on Windows 95 (1.3.4) Various problems with UNC paths have been fixed (1.3.4) Possible security and denial of service attacks by use of special DOS devices have been removed (1.3.4) Directive Changes This section lists the directives which are new in this release, or which have changed their behaviour or syntax. Note that directives provided by the new modules are not listed (see the documentation for the module concerned for its directives). When upgrading from an earlier version of Apache, check this list to see if any of the directives in your configuration have changed. <DirectoryMatch>, <LocationMatch> and <FilesMatch> can be used to match sections using regular expressions. These are equivalent to the <Directory ~ ...> syntax (1.3.0) <IfDefine name >...</IfDefine> sections which are only used if Apache is started with a corresponding -Dname command line options (1.3.1) <LimitExcept method method ...> is the inverse of <Limit>. This contents of LimitExcept only apply if the request method is not listed as an argument. (1.3.6) AddModuleInfo provides additional text in mod_info output (1.3.0) AliasMatch, ScriptAliasMatch and RedirectMatch provide the ability to use regular expressions (1.3.0) AllowCONNECT to allow CONNECT requests on arbitrary ports (for proxying HTTPS requests) (1.3.2) CoreDumpDirectory gives the directory to use to dump core files, after receiving signals which cause core dumps (1.3.0) DefaultLanguage sets a default language to files without a language specified by an extension (1.3.4) ExcessRequestsPerChild Used on Windows systems only ExpiresActive to turn the expires module on or off (1.3.0) ExtendedStatus to turn on or off collected status information for display by mod_status. Off by default. Replaces the previous compile-time rule "Rule=STATUS" (1.3.2) Include specifies arbitrary configuration files to be read when this directory is processed (1.3.0) IndexDefaultOrder sets a default sorting order for fancy directory indexes (1.3.4) LimitRequestBody limits the size of the request message body (1.3.2) LimitRequestFields sets a maximum number of requests headers that Apache will accept (1.3.2) LimitRequestFieldsize sets a maximum size of any single request header (1.3.2) LimitRequestLine set a maximum request-line length that Apache will accept (1.3.2) ListenBacklog can set the size of the TCP backlog (the argument to listen()) (1.3.0) LogLevel sets the detail that will be logged to the error_log file. Possible values are "emerg", "alert", "crit", "error", "warn", "notice", "info" and "default". The default is error. (1.3.0) NameVirtualHost added to support better configuration of name-based virtual hosts (1.3.0) NoProxy in mod_proxy prevents proxying certain addresses (1.3.0) ProxyDomain in mod_proxy adds a domain to unqualified requests (1.3.0) ProxyPassReverse in mod_proxy lets Apache work as a "revere proxy", i.e. a front-end to multiple servers (1.3.0) ReceiveBufferSize in mod_proxy to control size of the receive buffer (like SendBufferSize) (1.3.0) RemoveHandler in mod_mime removes a mapping between a file extension and a handler name (1.3.4) ScriptInterpreterSource (valid on Windows only) can be used to tell Apache to file CGI interpreters via the registry. If set to "script" it uses the initial #! line from the CGI file, like previous versions. If set to "registry" it uses the registry to map the file extension to the interpreter. The default is "script". (1.3.6) ServerSignature can be used to turn on a "signature" in various automatically generated responses such as error messages. The possible values are "off" which is the default, "on" which uses a signature of the server version and hostname, and "email" which adds the mail address from the ServerAdmin directive (1.3.0) ServerTokens allows the Server: response header to be configured. Possible values are "min" which returns just the Apache version number, "OS" which also returns the operating system type, and "full" which returns the identifiers from any modules which request to be added. The default is "full". (1.3.0) ThreadsPerChild Used on Windows systems only UseCanonicalName is used to determine how Apache creates URLs pointing back to itself. The default value is "on" which means that Apache will use values from the configuration (i.e. ServerName and Port settings). If set to "on", Apache will use the information supplied by the client. (1.3.0). The use of this directive is now controlled by the Options override, rather than AuthConfig (1.3.4) <Directory> and <Location> sections defined in a virtual host override corresponding sections defined in the main server, rather than the other way around (1.3.0) <Directory> wildcards (* and ?) now do not match the forward slash character, to be compatible with shell expansions (1.3.0) <Directory>, <Files> and <Location> can now use [...] style wildcards (1.3.0) <Limit> now matches request methods on a case-insensitive basis, as required by the HTTP/1.1 specification (1.3.1) AccessFileName can take more than one filename argument (1.3.0) AuthName argument must be enclosed in double-quotes if it contains whitespace (1.3.0) CheckSpelling is now valid in per-directory locations (.htaccess files and <Directory> sections) (1.3.2) CustomLog can now take an additional argument env=[!]env-var which makes the logging conditional on the named environment variable being set (or, if ! is used before the env-var, unset) (1.3.6) CustomLog formats can contain or to represent a tab or newline character in the log file (1.3.6) FancyIndexing now no longer unsets any options already set by IndexOptions (from 1.3.2) HostnameLookups defaults to off (1.3.0) HostnameLookups has a new possible argument, double, which ensures that Apache only uses a remote hostname if it passes a double-reverse lookup. This replaces the MAXIMUM_DNS compile time option (1.3.0) IndexOptions has new arguments: NameWidth specifies the width of the filename column in directory indexes (1.3.2). SuppressColumnSorting turns off the links for sorting the output (1.3.0). SuppressHTMLPreamble prevents Apache outputting the start of the HTML response (1.3.0). IconHeight and IconWidth set the size of the icons (1.3.0). Options can now be added or removed with leading + or - (like Options) (1.3.3) LocationMatch no longer matches a single slash against multiple slashes in the request URL (1.3.0) RefererIgnore is now case-insensitive (1.3.0) RewriteMap now has two additional map types: "rnd" for randomreplacements, and "int" to use an internal function to make a replacement (1.3.0) SetenvIf and SetenvIfNoCase can now match an empty field with ^$ (1.3.1) TransferLog: if no log file is defined, Apache will not log requests. Previous versions would always log to the default filename (access_log) (1.3.0) Userdir can disable specific users, or can selective enable particular users (1.3.0) allow and deny can accept network/netmask and cidr formats. If hostnames are used a double-reverse lookup is always used (1.3.0) allow can be used to allow access based on environment varibales, with allow from env=variable. This is useful with the new mod_setenvif directives. The old allow user-agents syntax is no longer valid. (1.3.0) require can now accept TAB characters between arguments (1.3.3) Configuration and Support Program Changes The conf directory contains examples of the four configuration files needed: httpd.conf, srm.conf, access.conf and mime.types. Each of these files has been updated slightly. In 1.3.4 all these files have been merged into the single conf/httpd.conf file. httpd.conf HostnameLookups is set to "off" to reflect the new default. LogLevel set to warn. LogFormatCustomLog is used instead of TransferLog. ServerSignature is set to "on". srm.conf A <Files .htaccess> section prevents access to .htaccess files. access.conf Apache now defaults to a much more restrictive set of permissions, by specifying AllowOverride none and Options FollowSymLinks in a <Directory /> section. This means that .htaccess files will not be processed unless turned on by another <Directory> section, and all options (except following symbolic links) are turned off. This is a much more secure initial configuration. mime.types New types for javascript, mpeg 3, VRML, CSS and XML documents. All currently known MIME types (as registered with the IANA) have been added (1.3.4) New in the support directory are a web benchmark program (ab.c), a script to control the starting and stopping of the Apache server (apachectl), a perl script to compile modules for dynamic loading without using the source tree (apxs.pl), a perl script to resolve IP addresses in log files (logresolve.pl), a script to split logfiles based on virtual hosts (split-logfile), and manual pages for all these programs (1.3.0). The benchmark program has been overhauled and can now output HTML pages (1.3.6). apxs can now pass arbitrary arguments on to the compiler or linker, with -Wc and -Wl respectively (1.3.4). The httpd_monitor program has been removed since status information about Apache can be obtained via mod_status's output. (1.3.0). The manual pages for ab and apachectl have been moved to section 8. (1.3.6). The new option --permute-module allows the relative order of modules to be specified (1.3.4) The default directory layout for make install is now the same as the layout that src/Configure uses. The new --with-layout option can be used to specify a different layout, for example --with-layout=GNU would use the previous default layout for ./configure (1.3.4) The new option --target=name can be used to give the binary a different name than the default "httpd" (1.3.4) The --shadow option has been extended to take an argument which is the name of the shadow directory to create (1.3.4) Upgrade Notes Because of the various changes between 1.3.3 and 1.3.4, when upgrading you should beware of the following things: If you use ./configure to configure and compile Apache, be careful to ensure that you get the directory layout you want. If you previously used --compat, you can omit it. If you previously did not use --compat you must give --with-layout=GNU If you have can scripts which run Apache and use any of the arguments -?, -h, -l or -L, then they must be updated to use the new arguments (-h, -l, -L and -R, respectively) If you use the -S command line option to show the virtual host configuration and start the server running, you will have to do this is in two steps since -S will now exit without starting the server If you use UseCanonicalName inside .htaccess files, you must ensure that the Options override is in force rather than the AuthConfig override. If you used multiviews for content negotiation and relied on the fact that Apache read the variants from the disk in the directory order (rather than, say, alphabetically) you should check that the negotiation still works as expected (Apache now sorts the variants into order before using them, so that negotiation is not dependent on the usually arbitrary directory order of the files). This should not normally be a problem. The first three items are described in more detail below. If you configure Apache with ./configure you will have to change the options you use to set the directory layout. If you do not currently use an option to set the directory layout you will have to use an option in 1.3.4 because the default layout has changed. There are two layouts for directories: the first is the "Apache" layout. This was used in all versions of Apache before 1.3, and in Apache 1.3 it is still used if you use src/Configure to configure and build Apache. The second layout was introduced by ./configure, and is called the "GNU" layout because it is similar to the standard layout used by GNU tools. This created two layouts within Apache 1.3.*: the Apache layout if src/Configure was used, and the GNU layout if ./configure was used (although ./configure could also be told to use the Apache layout with the --compat option). Unfortunately this created a lot of confusion, and in particular many people thought that the GNU layout was the preferred directory layout for 1.3, because it was the default in ./configure. It is not: the preferred layout is the "Apache" layout, consistent with src/Configure and Apache 1.2. In Apache 1.3.4, the Apache layout becomes the default layout for ./configure. If you have been using the --compat option, then you do not need it anymore. However if you did not use the --compat option (that is, you used the GNU directory layout) then you must now use --with-layout=GNU. This table summarises the meaning of the directory layout arguments in each version: Layout option Meaning in 1.3.3 Meaning in 1.3.4 None GNU layout Apache layout --compat Apache layout Apache layout (but not needed since this is the default) --with-layout=GNU Not valid GNU layout --with-layout=Apache Not valid Apache layout (but not needed since this is the default) Various command line arguments have changed in meaning. This affects the -h, -l and -L options. This table shows the meanings of these arguments in both versions of Apache. Option Meaning in 1.3.3 Meaning in 1.3.4 -? List command line options List command line options (but use -h instead) -h List modules List command line options -l List all directives List modules -L Specify location of the core loadable module if built with SHARED_CORE List all directives -R Not used Specify location of the core loadable module if built with SHARED_CORE So if you were using -?, change to using -h. Similarly, change from -h to -l, from -l to -L and from -L to -R. Also, the -S option now exits after showing the virtual host configuration, rather than continuing and starting the server. When upgrading from a 1.2 server to 1.3, the following changes will also be required: Virtual hosts are matched by looking from the first one downward in the configuration file, rather than from the last one. So you should consider reversing the order of your virtual host sections. Use the new -S option to check your virtual hosts configuration. If you use name-based virtual hosts read carefully the Apache documentation about them. This has changed considerably. If you server both name-based and IP-based hosts from the same IP:port combination you will need to change your configuration. In all cases you will need to add NameVirtualHost directives for each IP:port on which name-based requests can be received. Again, use the -S option to check your virtual hosts configuration. Check your AuthName directives (remember to check in .htaccess files as well) for multi-word arguments. If you have any, put quotes around the argument. Known Bugs These bugs in 1.3.3 have been fixed in 1.3.4: Windows-specific Bugs In some circumstances the configuration files in the conf directory are not installed. This can occur if the computer needs to be rebooted because a system DLL file was updated. For now a work-around is to re-install Apache again after the reboot, since the DLL will not need to be installed again. . Requests for filenames containing non-ASCII characters such as accented characters gives a "Forbidden" error. . If the ErrorLog directive is removed from the httpd.conf file, Apache will use the built-in default filename for the error log file. This should match the name given on the ErrorLog directive in the distributed httpd.conf file, which was error.log. However it would actually revert to the "Unix" name of error_log. From the next release it will default to error.log. Other Bugs The default method of locking between processes on Linux has been changed from flock and fcntl, because of possible instability with flock in some kernel versions. . In Apache 1.3.4, lines in the error log were being preceeded by "httpd: ". This will be removed in the next version to avoid breaking any automatic error log analysis programs. If a CGI returns a Set-Cookie header it was sometimes being duplicated in the response to the client. . If the mod_info module was compiled as a DSO and the relevant lines uncommented in iin the distributed httpd.conf file, Apache would not start because the mod_info directive appeared before the line which loaded mod_info into the server. . Fix potential buffer overrun problem. . Added support for the standard file layout on Mac OS X (Rhapsody). apachectl gives an error if the PID file does not exist. The macro escape_uri was renamed to ap_escape_uri but no backward compatibility was provided from the old name. . Using the mod_speling module where there were lots of possible matching files caused Apache to use more memory than a linear relationship to the amount of data being handled. It is recommended to use a single configuration file (typically conf/httpd.conf) but mod_info will log a warning message if it cannot read conf/access.conf or conf/srm.conf. . With some browsers, Apache may not send a full response even though the file was updated on disk. This affects browsers which use HTTP/1.1 "etags" to ask servers for later versions of a file. Browsers known to do this are MSIE 4.1 and 5.0beta (older browsers used the modification time of the file). The problem is that Apache did not correctly compare the "etag" in the request with the "etag" of the file on disk (which will be different if the file has been updated). . When using ./configure with the --with-layout=GNU the directory layout may be different from the default layout in Apache 1.3.3. This only occurs if the "prefix" includes a directory component named "apache", and results in directories containing unnecessary "httpd" components. This was an effect of a new feature in Apache 1.3.4 which allowed for the executable name of Apache to be changed from "apache". . Compiler options starting with + cannot be used in EXTRA_CFLAGS in src/configuration. Most compilers use - for compiler options, but HP-UX's C compiler also uses +. . The INSTALL file shows examples of commands to start and stop the server using apachectl. However it assumes that this script is in the sbin directory, but the default is now bin. . HTTP/1.1 HTTP/1.1 is a major revision of the HTTP standard, which defines how browsers, servers and proxies communicate. The Hypertext Transfer Protocol From version 1.2, Apache was be fully compliant with the new HTTP/1.1 specification. This is the protocol which tells browsers and servers how to communicate, and the features added here determine how Web pages can be accessed. We take a look at what HTTP/1.1 includes and what changes it will bring to browsers and servers. Part of Apache Week issue 28 (16th August 1996). Hypertext Transfer Protocol (HTTP) defines how Web pages are requested and transmitted across the Internet. Almost all servers and browsers currently use version 1.0 of this protocol, but a major update, version 1.1, has been released. HTTP/1.1 adds a lot of new features to HTTP, which in turn will lead to new capabilities in both servers and browsers. We look at what is new in 1.1 and how it is likely to affect the Web. HTTP was initially a very simple protocol used to request pages from a server. The browser would connect to the server and send a command like: GET /welcome.html and the server would respond with the contents of the requested file. There were no request headers, no methods other than GET, and the response had to be a HTML document. This protocol was first documented as HTTP/0.9. All current servers are capable of understanding and handling HTTP/0.9 requests, but the protocol is so basic it is not very useful today. Browsers and servers extended the HTTP protocol from 0.9 with new features such as request headers and additional request methods. The resulting HTTP/1.0 protocol was only officially documented in early 1996 with the release of RFC1945. Servers and browsers having been using HTTP/1.0 for several years. Even while 1.0 was being documented, the next version was in serious development. This time the specification was developed first. This new version, 1.1, is now available as RFC2068. HTTP/1.1 will include a lot of new features, and will also document for the first time some features already found in servers or browsers. Knowing how HTTP works is very useful for a server administrator. It lets you check out the operation of your server without having to fire up a browser, and gives you a very useful diagnostic tool to check in detail how the server responds to individual requests. You can use telnet to emulate how a browser requests documents from a server. With telnet you can connect to the server, issue a request, and see what the server responds with. For example, to get the home page from www.apacheweek.com, you would use: % telnet www.apacheweek.com 80 Connected to www.apacheweek.com. GET / HTTP/1.0 [RETURN] [RETURN] This assumes you are connecting from a Unix system, starting at the command prompt (%) and with a telnet command available. You could also use any other telnet program such as the one in Windows 95. The text in bold is what you type. The standard port for Web requests is port 80, so we connect to that port number. Once connected we can type in and send a HTTP request, followed by the request headers. In this case, the request is GET / HTTP/1.0. The / is the resource we want to obtain, and the HTTP/1.0 tells the server that this is a HTTP/1.0 request. After entering this line, press RETURN twice - the first ends the request line, and the second marks the end of the optional request headers (in this case, we did not enter any request headers). The server will respond by sending a number of response headers, followed by the text of the requested document. It is often more convenient to send a 'HEAD' request instead of 'GET'. This makes the server behave exactly as if it was handling a GET, but it doesn't bother to send the actual document. This makes it much easier to see the response headers, and means you do not have to wait to download the document itself. For example, to see what response headers that www.apacheweek.com sends for /, use: HEAD / HTTP/1.0 HTTP/1.0 200 OK Date: Fri, 16 Aug 1996 11:48:52 GMT Server: Apache/1.1.1 UKWeb/1.0 Content-type: text/html Content-length: 3406 Last-modified: Fri, 09 Aug 1996 14:21:40 GMT Connection closed by foreign host. The first response line is the status - in this case '200' means the request is okay. The rest are response headers, which give information either about the server or the resource. For example, Server: gives the server version, and Last-Modified: is the last modification date of the file. New in HTTP/1.1 The basic operation of HTTP/1.1 remains the same as for HTTP/1.0, and the protocol ensures that browsers and servers of different versions can all interoperate correctly. If the browser understands version 1.1, it uses HTTP/1.1 on the request line instead of HTTP/1.0. When the server sees this is knows it can make use of new 1.1 features (if a 1.1 server sees a lower version, it must adjust its response to use that protocol instead). HTTP/1.1 contains a lot of new facilities, the main ones are: hostname identification, content negotiation, persistent connections, chunked transfers, byte ranges and support for proxies and caches. Every request sent using HTTP/1.1 must identify the hostname of the request. For example, if the URL http://www.apache.org/ is used, the request must include the fact that the hostname part is 'www.apache.org'. In previous versions of HTTP, the server never knew the hostname used in the URL. Letting the server see the hostname allows the implementation of non-IP virtual hosts. For example, if two names, www.apache.org and www.someoneelse.com, point to the same IP address, a HTTP/1.1 server can use the hostname it receives to return different content for each request. HTTP/1.0 servers cannot differentiate between these two requests. The hostname must be passed to the server either as a full URI on the request line, or on the new Host: header. For example, to test how www.apache.org responds to a HTTP/1.1 request, you could send GET / HTTP/1.1 Host: www.apache.org Note that the HTTP version on the GET request is now 'HTTP/1.1'. If the URI does not include the hostname on the Host: header the server will respond with an error. Content Negotiation refers to the ability to have a number of different versions of a single resource. For example, a document might be available in English and French, with each of these available as either HTML or PDF. The possible responses are called representations or variants. There are actually two sorts of content negotiation: Server-driven Negotiation Here the server decides (or guesses) on the best representation to send to the browser, based on information the browser provides in the request Agent-driven Negotiation Here the server does not guess on the best representation, but instead returns of list of the representations it has. The browser can then either automatically request one of these, or present a choice to the use. The first type, server negotiation, has been implemented in Apache since the summer of 1995 and is explained in a special feature from Apache Week issue 25. However, the HTTP/1.1 specification is the first place it is officially documented. The second type, agent negotiation, is not fully documented. The HTTP/1.1 specification just contains basic definitions of some of the headers to be used, but no details. The details of content negotiation are being specified in an Internet draft. This draft also expands on how server-driver negotiation works, and defines how caches can perform negotiation on behalf of either the server or the user agent. Many pages today include inlined documents, usually images but increasingly also sounds and other types such as Shockwave presentations. These pages can be slow to download because each item needs to be requested separately from the server, each on a separate connection. Typically, for each inline document the browser needs to connect to the server, ask for the document, wait for it to be received, and disconnect from the server. (Although some browsers can do multiple requests in parallel). This can be slow, especially across the Internet when there is a delay involved in each connection and disconnection. To help make pages with inline documents quicker to download, HTTP/1.1 defines persistent connections where a number of documents can be requested over a single connection, one at a time. An early implementation of persistent connections was known as keep-alive, and Apache as well as a number of other servers and browsers support this sort of connection. However, persistent connections are first officially documented in HTTP/1.1, and will be implemented slightly differently from keep-alives. For a start, in HTTP/1.1, persistent connections are the default. Unless the browser explicitly tells the server not to use persistent connections, the server should assume that it might be getting multiple requests on a single connection. Persistent connections are controlled by the Connection header. Unless a Connection: close header is given, the connection will remain open. This can be tested by connecting to www.apache.org and sending a simple request, for example: % telnet www.apache.org 80 HEAD / HTTP/1.1 Host: www.apache.org HTTP/1.1 200 OK Server: Apache/1.3.0 ... where the connection will remain open for a short period before closing (this is a server-configurable time out). If the same request is sent with a Connection: close header the connection will close immediately after the request headers have been sent. Normally, when sending back a response the sever has to know everything about the response it is about to send before it sends it. For instance, servers should set the Content-Length header on each response to the length of the response itself. This can be difficult for the server to do if the content is dynamically created (e.g. if it is the output of a CGI script). So in practice servers (including Apache) often do not send a Content-Length with dynamic documents. This has not been a problem with HTTP/1.0, but for persistent connections to work in HTTP/1.1, the Content-Length must be known in advance. The server could find out the length of the output of a CGI script by reading it into memory until the script has finished, then setting the Content-Length and returning the stored content. This might be acceptable for small content, but could be a problem if the CGI produces a lot of output. One possible way around this is to use the new chunked encoding method. This lets the server send output a bit at a time. Each bit (or chunk) is small enough for its content-length to be known before it is sent. Using chunked encoding will let servers send out dynamic content that is either large or produced slowly without having to disable persistent connections. In addition, after a chunked-encoded document has been completely sent, additional response headers can be transmitted. This could allow dynamically produced headers to be associated with the document, even if they are not available until after the script (or whatever produced the document) has finished. Byte ranges allow browsers to request parts of documents. This can be used to continue an interrupted transfer, or to obtain just part of a long document (say, a single page). Byte ranges are implemented by the Range header. For example, to request just the second 500-bytes of a document, the request would include: Range: bytes=500-999 A single request can also ask for more than one range at once (for example, it could ask for the first 500 bytes and the last 500 bytes of a file). When the server replies, it will send back each part in a single response, using MIME multipart encoding to distinguish the parts. HTTP/1.1 includes a lot of information and new features for people implementing proxies and caches. Until now, the operation of proxies and caches has been largely undocumented. In addition to documenting how they are supposed to work, HTTP/1.1 also includes a range of new features to make implementing proxies and caches easier, and in particular to reduce network traffic by allowing proxies and caches to send more 'conditional' requests and to do transparent content negotiation. A conditional request is like a normal request, except the sender (the proxy or cache server) includes some information about whether it really needs the document. For example, a proxy or cache can send an entity-tag which identifies a document it already has, and the server only sends back the document if the cache does not already have this document. Conditional requests can also be based on the last-modified time of the document. There are a lot of other changes between 1.0 and 1.1, including More status response codes New request methods: OPTIONS, TRACE, DELETE, PUT Digest authentication Various new headers such as Retry-After: and Max-Forwards: Definition of the media types message/http and multipart/byteranges How this will Affect Servers and Browsers Users of the Web will notice the following major changes when browsers and servers are available which implement HTTP/1.1: Non-IP virtual Hosts Virtual hosts can be used without needing additional IP addresses. Content Negotiation means more content types and better selection Using content negotiation means that resources can be stored in various formats, and the browser automatically gets the 'best' one (e.g. the correct language). If a best match cannot be determined, the browser or server can offer a list of choices to the user. Faster Response Persistent connections will mean that accessing pages with inline or embedded documents should be quicker. Better handling of interrupted downloads The ability to request byte ranges will let browsers continue interrupted downloads. Better Behaviour and Performance from Caches Caches will be able to use persistent connections to increase performance both when talking to browsers and servers. Use of conditionals and content negotiation will mean caches can identify responses quicker. Using Apache Imagemaps Imagemaps are an easy way to provide an graphical front-end. We explain how to use Apache's imagemap module and the Apache extensions to the NCSA map file format. Using Apache Imagemaps Imagemaps can provide a graphical interface to a web site. If the mouse is clicked over an imagemap image the co-ordinates of that click are sent to the server. The server can decide what page to return based on the location of the click. Traditionally, imagemaps have been implemented at the server end with a CGI program (usually called 'imagemap'). This is configured with a map file which listed what regions on the image correspond to what documents to return. Apache can use CGI imagemaps, but it is more efficient to use the internal imagemap module. This module, compiled in by default, means that the server does not need to run a separate process to handle the image clicks. It is fully upwardly compatible, and also adds some new features. Both of these approaches implement what are called server-side imagemaps because all the processing happens on the server. The main problem with server-side imagemaps is that the user does not get any indication of which areas of the image contain links. An extension to HTML allows client-side imagemaps which tell the browser what areas on the image correspond to what documents. The browser can then highlight or show the active areas as desired. It is possible to use both client-side and server-side imagemaps at once, so that the maximum number of browsers are supported. Older versions of Apache came with an imagemap program in the cgi-src directory. This could be compiled and placed into a CGI directory (typically cgi-bin). The internal imagemap module is faster than using the CGI program and it has replaced all of the functionality. If you are using the imagemap program, you can easily move over to using the imagemap module. First, ensure that an appropriate AddHandler line is enabled in your srm.conf file (see the following section). Then all you need to do is update the HTML documents that refer to the imagemap program. You will probably be using something similar to this: <A HREF="/cgi-bin/imagemap/maps/mapfile"> <IMG SRC="image.gif" ISMAP></a> You need to first of all rename your mapfile to have a suitable extension (as given on the AddHandler imap-file line, for example, .map) if is does not already have this extension. Then change the HTML like this: <A HREF="/maps/mapfile.map"> <IMG SRC="image.gif" ISMAP></a> Note that the HREF is now simpler because the /cgi-bin/imagemap part is not given. The imagemap module is a core part of Apache, and is compiled in by default. To use it, you first need to configure the Apache server. You should pick a file extension to use for imagemap configuration files, typically .map. The AddHandler command below should be added to your srm.conf file: AddHandler imap-file map You will need to restart the server after making this change, by sending it a -HUP signal. Now, any request for a file ending in .map will be treated as an imagemap request. To actually create an imagemap you need to do two things: Create a 'map' file which maps areas of the image onto documents Add the code to an HTML page to tell the browser which image to use and what mapfile. The map file is a text file containing the information needed for the server to map points on the image onto documents to return (or URL's to redirect to). It can also contain statements to control the behaviour of the imagemap. The imagemap module uses map files in standard NCSA format, with optional extensions. Areas and positions on the image can be mapped onto documents or URLs with the following commands. All co-ordinates start at the top-left of the image, position (0,0). These statements can be modified to make use of Apache imagemap extensions (such as to give a 'menu text'). This will be covered later. rect url x1,y1 x2,y2 The rectangle (x1,y1) to (x2,y2). poly url x1,y1 x2,y2 .... The polygon formed by the points given. circle url x1,y1 x2,y2 The circle with its center at (x1,y1) and point (x2,y2) on the circumference. point url x1,y1 The closest point to the clicked position, if the click is not inside any circle, poly or rect. The url part of each of these statements is the document to return if the point clicked was inside the respective area (or in the case of 'point', the closest). It can be either a absolute URL (starting http://, or a URL relative to the document root (starting /), or a relative URL (not starting with a /, and possibly including ../ components to go to parent directories). If the URL is relative, it is taken relative to the directory containing the imagemap configuration file, not the original HTML document (if different). However this can be changed by the base statement, see below. There are various ways to create the co-ordinates for the map file. One is to do it by hand, using positions obtained by (say) an image editing program. Alternatively there are various programs available which will let you mark the shapes on an image and then write out the correct statements, such as those listed in Yahoo's Imagemaps category. The statements which can be used to control the behaviour of the imagemap are: base [ url | map | referer ] Use url as the base for any relative URLs within the map file. Alternatively, the word map can be used, which makes URLs relative to the directory containing the map file (this is the default). Alternatively, relative URLs can be made relative to the HTML document which included the imagemap image, with referer. This only works with browsers which support the Referer request header (most modern browsers support this). default [ url | error | nocontent | referer | menu ] This tells the server what to do if the point clicked was not inside any rect, poly or circle, and there were no point statements. It can either by a URL, or one of these values: error: return a 500 Server Error status; nocontent: return a 204 No Content status, which will cause most browsers to keep the current document; referer: return the document given by the Referer request header, which will be the HTML document which contained the imagemap; menu: return a text (HTML) version of the URLs in the map file. The default is nocontent. The final part of creating an imagemap is to add suitable HTML code to an HTML document. Images are placed using the code <IMG SRC="...">. To place an imagemap, surround this tag with a <A HREF...> tag which refers to the map file, and include the attribute ISMAP in the <IMG SRC...>. For example: <A HREF="/docs/home.map"><IMG SRC="/graphics/image.gif" ISMAP></A> where docs/home.map is the URL of the map file, relative to the server's document root. The ISMAP attribute in the <IMG SRC...> tag tells the browser that this is an imagemap. When the image is clicked, it sends a request for the given HREF URL, followed by the position of the image click, such as: GET /docs/home.map?20,35 if the image was clicked at position (20,35). One of the big problems with imagemaps in past has been that they do not work with text-only browsers. The imagemap module is written to provide support for text-only browsers, which usually ignore the ISMAP attribute. The imagemap module recognises this and will return a text (HTML) document containing a menu of the possible selections from the map file. In addition, a menu document can be returned if the user of a graphical browser selects a point outside any of the defined areas, if the statement "default menu" is given in the map file. The type of menu returned can be configured with the ImapMenu directive. This can be placed in a <Directory> or <Location> section, or in a .htaccess file. It takes a single argument which gives the type of menu to return: none Do not show a menu formatted Output a formatted document, with a suitable heading and with the map lines shown as <pre> text. semiformatted Format the map lines as <pre> text, and also show comment text on other lines (comments start with a hash character, #), but do not output a header. unformatted Do not format map lines as <pre> text, and output text from comment lines, but do not output a header. The semiformatted and unformatted options let you add additional text and mark-up to the map document. The difference between these two is the with semiformatted, the map links are output as <pre> sections, which forces them onto separate lines. The unformatted option does not impose any restrictions, so it is possible to build up a map document with multiple links on a line, for instance. The links in the menu document correspond to the URLs for each of the areas defined in the map file. The text of the link will be the URL itself. However this can be replaced with more meaningful text by giving this text as a argument before or after the co-ordinates. For example: rect /welcome.html 1,1 20,20 "Welcome to this site" The imagemap module supports three directives: the first configures the type of menu to return (if any). This is the ImapMenu directive already covered. The other two directives provide alternate ways of setting the base and default actions (see the base and default map configuration statements, above). The corresponding directives are ImapBase and ImapDefault, and they take the same arguments. The directives can be given in <Directory> and <Location> sections, and in .htaccess files. Say you have an image which contains two areas you want to make active (see the example image, right): a circle, which should lead onto a contents page (contents.html) and a square which gives information about your company (about.html). The basic map file to do this would be: circle contents.html 25,25 0,25 rect about.html 50,0 100,50 This would be included in a HTML document like this: <A HREF="/maps/home.map"><IMG SRC="/img/logo.gif" ISMAP></A> If the user clicks inside the circle or square area, they will get the associated document, relative to the mapfile location. The requested files would be: /maps/contents.html and /maps/about.html. This probably is not what is wanted. The URLs in the map file could be given as relative to the document root, for example: circle /contents.html 25,25 0,25 rect /about.html 50,0 100,50 Alternatively, the base statement could be used to set the base URL, as in: base / circle contents.html 25,25 0,25 rect about.html 50,0 100,50 Rather than putting the URL in the map file like this, it might be better to make all the URLs relative to the location of the HTML document containing the imagemap, with base referer If the user clicks an area outside the circle and the square the will, by default, get a HTML menu of the URLs in the map file. Users of non-graphics browsers will also get this menu. To make it more readable, add some descriptions: base referer circle contents.html 25,25 0,25 "Contents" rect about.html 50,0 100,50 "About our company" which will produce the following map document: Contents About our company (In this example the links do not go anywhere). The map document produced will just contain these two links. To make it more elaborate, you can either include your own mark-up text (on comment lines), or set the ImapMenu directive to the value formatted. To include your own mark-up, put it on lines which start with a # character: base referer # <h1>Menu Bar</h1> circle contents.html 25,25 0,25 "Contents" rect about.html 50,0 100,50 "About our company" # Select one of the options above which produces: Menu Bar Contents About our company Select one of the options above This works because the default value for the ImapMenu option is semiformatted, which outputs comment text (after the # symbol) as part of the map document. For more elaborate formatted, you could include ImapMenu unformatted in your access.conf or .htaccess file, and use, say: base referer # <h1>Menu Bar</h1> # Select an option: circle contents.html 25,25 0,25 "Contents" # or rect about.html 50,0 100,50 "About our company" which produces: Menu Bar Select an option: Contents or About our company Client-side imagemaps move the processing of the co-ordinate information to the browser. The HTML includes the information about the areas on the image and the documents they lead onto. This means the browsers can give positive feedback when the mouse is over an active area. This obviously only works in browsers which support it, but it is possible to use a single image as both a server-side and client-side imagemap. Here is an example image setup for both server- and client-side imagemap: <A HREF="/docs/home.map"><IMG SRC="/graphics/image.gif" ISMAP USEMAP="#thismap"></A> <MAP NAME="thismap"> <AREA SHAPE=CIRCLE COORDS="25,25,25" HREF="contents.html"> <AREA SHAPE=RECT COORDS="50,0,100,50" HREF="about.html"> </MAP> Note that the circle here uses the centre point and a radius, rather than a point on the circumference. An example USEMAP imagemap is shown to the right. The format for client-side imagemaps is defined in RFC1980. Gathering Visitor Information: Customising Your Logfiles Apache 1.2 makes it easy to create multiple customised log files so you can record details of who is browser your site. Gathering Visitor Information: Customising Your Logfiles Every time a browser hits your site it leaves a trail in your access log. This file is enough to tell you how many hits you received and gives you some basic information about the browser, such as their hostname. But there is a lot more information readily available that you could be gathering. Want to know which browser is most common on your site, or what languages your readers can understand? In Apache 1.2 logging information like this is easy. First published in Apache Week issue 51 (7th February 1997). Apache uses the TransferLog command set create a single log file for storing details of every request. However Apache's logging capabilities are far more advanced: it can write the log file in any format, it can write multiple log files (each with a different format), and it can send log messages to an external process via a "pipe". This feature will explain first how to customise the format of your existing log file, then show how to create multiple log files. Finally it will cover how logging works when you have virtual hosts, where you can chose whether to log a virtual host into the main log files or have separate log files for each host. The traditional format for web log files looks like this: jupiter.eu.c2.net - - [03/Feb/1997:00:06:59 +0000] "GET / HTTP/1.0" 200 4571 jupiter.eu.c2.net - - [03/Feb/1997:00:07:00 +0000] "GET /img/awlogo.gif HTTP/1.0" 200 12706 (There are two lines here, both starting with "jupiter.eu.c2.net". If you see more than two, the lines have been wrapped on the screen). This format is called the common log format and is standard across most web servers (although it is not very well documented). There are various tools to analyse data in this format, and it is not too difficult to write custom tools (in, say, perl) to extract the data. But the lack of a common field delimiter makes such tools more complex than necessary and prevents the use of simple Unix programs such as cut. You can customise this format. There are probably two common reasons for doing this: firstly, to make the format simpler by using a common delimiter character, and secondly to log addition information such as the browser type at the end of each line (placing it at the end means the file can still be analysed by standard log analysis programs). You customise the format by telling Apache a format to use. Special character sequences are used to represent specific information. For example, the sequence %h will be replaced with the name of the remote host. The common log format is defined like this: %h %l %u %t "%r" %>s %b Additional sequences here are %l (the remote username, if using identd), %u (the HTTP authenticated username, if any), %t (the time in common-log format), %r (the request), %s (the returned status) and %b (the number of bytes in the document served). Say, for example, you would prefer a file format with a common delimiter character between each field, so that you could use cut or write very simple perl scripts to extract the data. Using the common log format above as a guide, you could use %h|%l|%u|%t|%r|%>s|%b Here the | character is being used as a delimiter. Note that this can cause problems if this occurs within a field (which is possible in the %r request field). To set this format for your log file, you use the LogFormat directive. For example LogFormat "%h|%l|%u|%t|%r|%>s|%b" The % sequences introduced so far let you log various aspects of the request. There are some more sequences (covered below) that log additional aspects of the request. However one of the most important features of the custom log format is being able to log any of the request headers supplied by the browser. This lets you log things like the users language preferences, browsers type and the page they just came from. Logging a request header is doing using the %{}i sequence. You put the name of the request header between the braces. For example, to log the browser type, you would use %{user-agent}i This information is typically added to the end of the common log format in Apache 1.1.1 (in Apache 1.2, you can put it in a separate log file, which is much more convenient. This is explained later). To add the user-agent information to the end of the common log format, use LogFormat "%h %l %u %t \"%r\" %>s %b %{user-agent}i" If the browser does not send a user-agent, the text "-" will be logged as the user-agent. Otherwise you will get the browser name, such as "Mozilla/3.0Gold (Win95; I)" or "Mozilla/2.0 (compatible; MSIE 3.01; Windows 95)" (the former is Netscape Gold version 3, the latter Microsoft Internet Explorer version 3, pretending to be Netscape 2). In addition to %{...}i, there is a corresponding sequence %{...}o to log any of the response headers (in these sequences, the i means incoming and the o outgoing headers). Adding extra fields onto the end of the common log file format can be inconvenient, especially if you already have software which processes the log files in their current format. Luckily, Apache offers a completely customisable log file interface: you can create any number of logs files each in a different format. It is now almost trivial to add a log file for (say) user-agents or requested languages, without needing to compile in a new module or modify the Apache source code. You can even log all the common log file information into both common log format (for existing analysers) and in a delimited format at the same time! The interface to all this is via a single, simple directive: CustomLog. This directive takes both a file name to log to, and a custom format. For example, to log user-agents to a file called agents in the logs directory, you would use: CustomLog logs/agent "%{user-agent}i" Other useful log files can also be created. This next two directives create a referrer log and a log of language preferences of your clients: CustomLog logs/referer "%{referer}i -> %U" CustomLog logs/language "%{accept-language}i" You can tell the format to only log particular fields if the response status is (or is not) a particular value. For example, to only log the language preference for 200 or 304 statuses, use %200,304{accept-language}i. You can put a exclamation mark (!) straight after the % to reverse the condition (i.e. to only log if the status was not 200 or 304). The time logged by %t is in common log file format. If you want to use another format, use %{format}t, where format is a date and time format as used by strftime (see man strftime for more information). In some cases, the request will be handled by an internal redirect (this is common for things like requests satisfied by a DirectoryIndex file). In these cases, the configuration options can apply to either the original response, or the one actually delivered. The characters < and > after the % determine whether to log the original value, or the redirected value. For example, in %s you always want the value of the status actually returned, so %>s is used in the common log file definition. Each % sequence knows whether it should use the original response or the real response - for example, %r (the request line) uses the original response. The logging directives, TransferLog, LogFormat and CustomLog can be used inside virtual hosts. The way they interact with the logs set up outside the virtual hosts is like this: If there are no TransferLog or CustomLog directives inside the virtual host, log requests for this host to the logs defined in the main server. Otherwise log requests to the log files defined in this virtual host and do not use any of the log files defined in the main server. If Logformat is used in a virtual host, the format it defines is used for all TransferLog files defined inside that virtual host Otherwise the log format defined outside the virtual host is used by the TransferLogs defined inside the host, defaulting to the common log format if no LogFormat is defined in the main server. Here are all the % sequences allowed in the configurable log format in Apache. %b bytes sent, excluding HTTP headers %f filename %h remote host %{Header}i The contents of Header: header line(s) in the request sent from the client %l remote username (from identd, if supplied) %{Note}n The contents of note "Note" from another module %{Header}o The contents of Header: header line(s) in the reply %p the port the request was served to %P the process ID of the child that serviced the request %r first line of request %s response status. For requests that got internally redirected, this is status of the original request: use %>s for the returned status %t time, in common log format time format %{format}t The time, in the form given by format, which should be in strftime format %T the time taken to serve the request, in seconds %u remote user (from auth; may be bogus if return status (%s) is 401) %U the URL path requested %v the name of the server (i.e. the virtual host) Orton, Joe Web Authoring and HTTP Joe Orton explains WebDAV, the distributed authoring protocol for HTTP. Feature: Web Authoring and HTTP Traditionally, HTTP has only been used for web browsing, not web authoring. In situations where the author of a web site does not have direct access to the file-system which is being served, a protocol is used such as NFS, or a version control system which allows remote access, such as CVS. Alternatively, less privileged authors, who are using a dial-up Internet Service Provider, might be given FTP access to an area on a web server. Before giving a description of WebDAV, it is useful to give a brief introduction to HTTP itself. The protocol consists of a request: a request message sent by the client to the server, followed by a response: the reply to the message, sent from the server back to the client. There are three important elements of an HTTP request: the method, the URI, and the headers. The method describes the type of the request. The HTTP specification, RFC 2616, defines eight different methods, from the familiar GET, to the obscure TRACE. The URI identifies the resource on which the method is intended to operate. Headers provide any extra information about the request that is required. A syntactically valid (but meaningless) HTTP request and the response is given below. It uses the FOOBAR method, includes three headers "Host", "Something", and "Another", and is target at the resource "/sample/uri.html". The response uses the "501 Method Not implemented" status code, telling the client that the server does not understand the request. FOOBAR /sample/uri.html HTTP/1.1 Host: www.somewhere.com Something: else Another: header HTTP/1.1 501 Method Not Implemented Date: Mon, 16 Oct 2000 15:19:09 GMT Server: Apache/1.3.12 (Unix) DAV/1.0.2 Connection: close Allow: GET, HEAD, OPTIONS, TRACE ... During web browsing, the only HTTP methods that are normally used are GET, to retrieve documents, and POST, to submit form data back to the server. The WebDAV specification, RFC 2518, describes a set of new methods which allow clients to publish documents, and manipulate a remote repository in a variety of ways to meet the needs of web authoring. The methods fall into three groups: PROPFIND and PROPPATCH; for querying and manipulating properties. LOCK and UNLOCK; for locking purposes. MOVE, COPY and MKCOL; for basic repository manipulation. In addition to the new methods, WebDAV refines the definition of the PUT and DELETE methods, which are already present in the HTTP specification. The PUT method, as covered in a previous feature article, provides the most basic form of web publishing. This method is used to upload new or changed documents to the server. WebDAV introduces the concept of a collection of resources to HTTP. A collection is analogous to a directory in traditional file-system terms: it has a name which ends in a /, and is a container for both normal resources, and also other collections. Collections can be created using the MKCOL method, which is similar to creating directories using the mkdir command. MKCOL /dav/newcollection/ HTTP/1.0 Host: test.webdav.org HTTP/1.1 201 Created Server: Apache/1.3.11 (Unix) DAV/1.0.2 Content-Type: text/html Date: Mon, 16 Oct 2000 09:10:06 GMT ... The last two methods required for basic web authoring are the COPY and MOVE methods. These methods can operate in one of two ways: on a collection resource, they can recurse down an entire tree of resources, or alternatively, they can just operate on a single resource (of any type). The Depth HTTP header is used by the client to indicate which mode of operation is desired for a particular request; Depth: infinity meaning operate recursively, and Depth: 0 meaning operate only on a single resource. WebDAV allows you to define properties on resources. Two types of properties are used: live properties, which are defined by the server, store information like the last date on which the document was modified. Dead properties are used by clients as simple data stores. An example of a dead property is the name of the author of the page. The first method which is used with properties is PROPFIND: used to simply request all properties available on a document, or alternatively, just a specific set of properties. XML is used in the request body to give the parameters for the PROPFIND request, and also in the response, to list the property names and their values. The Depth header is also used with PROFIND requests: taking the values 0 and infinity as before, meaning in this case "give properties for a single resource only", or "give properties for all resources in this collection and below" respectively. The value 1 is also allowed, which requests properties on a collection resource, and it's immediate descendants only, without recursing into any child collections. A simple request for the properties "getlastmodified" and "getcontentlength" is given below: (the values returned for these properties are highlighted in italics) PROPFIND /dav/test.html HTTP/1.1 Host: test.webdav.org Depth: 0 Content-type: text/xml Content-Length: 174 <?xml version="1.0" encoding="utf-8" ?> <propfind xmlns="DAV:"> <prop> <getlastmodified/> <getcontentlength/> </prop> </propfind> HTTP/1.1 207 Multi-Status Server: Apache/1.3.11 (Unix) DAV/1.0.2 Content-Type: text/xml; charset="utf-8" Date: Fri, 13 Oct 2000 13:51:25 GMT <?xml version="1.0" encoding="utf-8"?> <D:multistatus xmlns:D="DAV:"> <D:response xmlns:lp0="DAV:" xmlns:lp1="http://apache.org/dav/props/"> <D:href>/dav/test.html</D:href> <D:propstat> <D:prop> <lp0:getlastmodified>Fri, 13 Oct 2000 12:51:56 GMT</lp0:getlastmodified> <lp0:getcontentlength>105</lp0:getcontentlength> </D:prop> <D:status>HTTP/1.1 200 OK</D:status> </D:propstat> </D:response> </D:multistatus> The PROPPATCH method, similarly, uses an XML request body to specify the changes which should be made to a set of properties. PROPPATCH requests are made up of a combination of the following two operations: delete a named property submit a new value for a named property A lot of web authoring will involve more than one person working on a site at the sime time. Under these circumstances the lost update problem can occur, where two authors download a document and make some changes, then later, both authors upload their changes again, one set overwriting the other. WebDAV provides a mechanism which can be used to prevent this situation, by allowing authors to lock a document while they are editing it. Once an author has locked a document, they are guaranteed that nobody else will be able to upload changes to the document. The WebDAV specification makes locking support optional for server implementors. The level of server support for WebDAV is defined to in one of two classes: and Class 1, all requirements are met for basic web authoring, and Class 2, which extends Class 1 to include locking support. The mod_dav module adds WebDAV support to an Apache 1.3 server. mod_dav has been under development for two years, and is currently at version 1.0.2. The module has also been integrated into the Apache 2.0 source tree, and is distributed as part of the recent alpha 7 release. Commercial WebDAV servers are available from Microsoft, Xythos, and Novell, amongst others. The on-line storage market has eagerly embraced WebDAV, with sites like Sharemation, MyDocsOnline, and Driveway all offering access to private or shared WebDAV repositories for free. Microsoft are providing strong support for WebDAV on the client side: Internet Explorer 5 is provided with "Web Folders", which allow the user to view and manipulate a WebDAV repository inside the web browser. Office 2000 also supports editing web pages in-place using DAV, and makes use of the locking methods to prevent the lost update problem as described above. Microsoft's web publishing package FrontPage, ironically, lacks WebDAV support. Adobe GoLive 5 also supports WebDAV. There are several Open Source WebDAV projects. cadaver provides a command-line interface similar to the ubiquitous ftp client. For Macintosh users, Goliath has a familiar Finder-like interface. For more information, refer to the list hosted at the webdav.org site hosts of open source and commercial projects with WebDAV support. Module Soup Customise Apache to do what you want it to do by adding in extra modules, or remove modules you do not need. Module Soup Apache's 'modular' architecture makes is possible for anyone to add new functions to the server. In fact, most of the code that comes as part of the Apache distribution is in the form of modules, and can be removed or replaced. For example, if the 'asis' function is never needed, the asis module (mod_asis) can be removed, making the server executable smaller and potentially reducing the load on the server host. There are a large number of modules now written for Apache. Besides those included with the distribution, modules are also written to add functions not already in the code, or to do things which are needed on some sites but are not of widespread use. Some of these modules are written by Apache developers. Most of them, however, are written by other users of Apache who want to adapt its functionality for their needs. In this article, we will look at a range of Apache modules which can be added to the server. First though, we show how to add a new module. It is easy to add a module to Apache: Obtain the module source code file and place in the Apache src directory Add the module definition to the Apache 'Configuration' Re-compile Apache Install the server executable and re-start the server So first you need to download the new module. Most modules come as a single source file, called mod_something.c. Place this file in Apache's src directory. If the module comes as more than one file (for example, the PHP/FI module) follow the instructions that come with the module. Having got the module source, Apache needs to be configured so that it will compile this code. To do this, edit the Configuration file in the src directory, and add a suitable Module line. This will have the format Module name_module mod_something.o The first argument, name_module, must match the name given in the module's source code - look for the 'module definition' near the end of the file, which will look like this: module name_module = { NULL, ... }; The name_module text in the Configuration file must match the name_module text in the module source exactly. The second argument on the Module line is the filename of the module, with the final .c replaced by .o. After editing Configuration, re-compile Apache by running ./Configure make Finally, stop your current server (with kill -TERM pid), install the new httpd executable, and start it running (e.g. ./httpd -d /usr/local/httpd). If you have not looked at the standard modules which come with Apache, you might be missing some functions you could find useful. In addition, you might be compiling in some things you never use. All the standard Apache modules are listed in the Configuration file. The next release of Apache will come with a few more standard modules, such as a module to rewrite URLs on the fly, and a module to add PICS content-rating labels to responses. Modules can be found in several different places: In the Apache 'src' directory In the Apache 'contrib/modules' directory In the 'Module Registry' Other sites (try a search engine and look for "Apache Module"). To simplify finding modules to do what you want, here is the Apache Week guide to add-on modules by function. These are taken from all the above sources, and are presented as an example of what is available. We cannot guarantee that these modules with do what they say they do, or even that they work with all versions of Apache. If a module named below is not a link, then that module is distributed with Apache 1.1.1. Otherwise the link will take you to that module (if the link is to a .c or .tar file, save it to a file, else the link goes to an HTML page or FTP directory). Authentication There are a whole range of options for different authentication schemes. The usernames and passwords can be stored in flat files (with the standard mod_auth), or in DBM or Berkeley-DB files (with mod_auth_dbm or mod_auth_db respectively). For more complex applications, usernames and password can be stored in mSQL, Postgres95 or DBI-compatible databases, using mod_auth_msql, mod_auth_pg95 or mod_auth_dbi. If passwords cannot be stored in a file or database (perhaps because they are obtained at run-time from another network service), the mod_auth_external.c module lets you call an external program to check whether the given username and password is valid. If your site uses Kerebos, mod_auth_kerb allows Kerebos-based authentication. For LDAP authentication, see mod_auth_ldap. The mod_auth_anon module can be used to allow an 'anonymous-ftp' style access to authenticated areas, where users give an anonymous username and a real email address as password. There are also modules to hold authentication information in cookies, and to authenticate against standard /etc/passwd and NIS password services. See the Module Registry. Blocking Access mod_block.c blocks access to pages based on the 'referer' field. This can be used to help prevent (for example) your images being used on other people's pages. For more complex cases, mod_rewrite can be used to implement blocking based on arbitrary headers (e.g. referer and user-agent), as well as on the URL itself. Counters There are a number of counter modules available, including mod_counter.c and mod_cntr. Some server-side scripting languages, such as PHP/FI can also provide access counters. Faster CGI Programs Perl CGIs can be sped up considerably by using the mod_perl modules, which build a perl interpreter into the Apache executable, and optionally allows scripts to start up when the server starts. Alternatively, the mod_fastcgi module implements FastCGI on Apache, giving much better performance from a CGI-like protocol. Languages and Internationalisation The Russian Character Set (RCS) module provides support for Russian character sets, while mod_fontxlate can translate characters in single-byte character sets, for countries with multiple non-standard character sets. Miscellaneous mod_speling.c attempts to fix mis-capitalised URLs, by comparing with files and directories in a case-insensitive manner. A module which makes your ftp archive into web pages is available at mod_conv.tar.gz. Server-Side Scripting There are several different modules which allow simple (or not so simple) scripts to be embedded into HTML pages. XSSI is an extended version of standard SSI commands, while PHP and NeoScript are more powerful scripting languages. Throttling connections mod_simultaneous.c limits the number of simultaneous accesses to particular directories, which could be a way of implementing limits for images directories. mod_bandwidth provides a similar service. mod_throttle can be used to slow down responses for users who exceed a given "bytes per second" download rate. URL rewriting The mod_rewrite module is a powerful (and complex) way of mapping the request URL onto a new URL on the fly, using regular expressions and optionally mapping files in text or DBM format. It can also implement conditional rewrites based on other request headers (e.g. User-Agent). Converting from NCSA The differences between Apache and NCSA HTTPd. Also, how to convert an existing NCSA HTTPd installation over to Apache. Converting from NCSA The two most popular Web servers according to the Netcraft Survey are Apache and NCSA HTTPd. Both servers are widely used, although according to the server survey Apache is used on over twice as many sites as NCSA, and the market share of NCSA is dropping while Apache's is growing. This feature is designed to explain the differences between NCSA HTTPd and Apache, so that users of either server can decide if the other meets their requirements better. We then look in detail at the directives changed between NCSA and Apache, which can be used by existing NCSA users if they decide to convert to Apache. Or it can also act as a guide to converting the other way. NCSA version 1.3 was the base for Apache development. Initially, Apache was a drop-in replacement for the NCSA HTTPd, however as both have developed there are now some differences between the two servers. Since then, much of Apache's code has been considerably rewritten, in particular to allow the functionality to be extended with modules. This feature explains how the current versions of Apache and NCSA HTTPd differ, what features Apache adds, and those it lacks. This is followed by a detailed list changes between NCSA and Apache. The versions used for the comparison are Apache 1.3 and NCSA HTTPd 1.5.2. Perhaps the most important difference between Apache and NCSA is that Apache is extensible via a programming API. The means that the functionality of Apache can be extended almost arbitrarily, via modules. The list of Apache features given here concentrates on the functions provided by the server in its default configuration, or with the addition of modules distributed as part of Apache. However there are a lot of additional modules which can be added to perform specific tasks. See our feature on additional modules for an idea of the extensibility of Apache. Unless stated otherwise, the features listed in this section are available with the default server configuration. If the module is marked as optional, then it is part of the official Apache distribution, but not compiled in by default. If a module is described as third party then it is not part of the Apache distribution. Leaving aside the third-party modules, the main features that Apache supports and NCSA does not are: Additional authentication options: anonymous, from a Berkeley DB file, from an mSQL or Postgres95 database All directives can appear in any of the configuration files Automatically set the mime type of responses based on the file contents (using mod_mime_magic) Call a CGI program when file of particular mime type is accessed, with Action directive Configurable logging format (with LogFormat) and multiple log files (with CustomgLog) Correct some typos in URLs with the optional "spelling" module (mod_speling) Create a user clickstream log (optional mod_usertrack module) Customise CGI environment variables (optional mod_env module) Dynamic module loading (optional mod_so module) Enhanced server-side includes (SSI) Imagemap extensions - internal support (like NCSA 1.5) with additional directives (ImapMenu, ImapBase, ImapDefault) Info module which displays the compiled in modules and current configuration Listen on selected addresses and ports (Listen directive) Pipe any log file to another process, instead of writing to a file Proxy module to provide HTTP and FTP proxying. Can also operate as a "reverse proxy" to load-balance multiple servers. Restrict access by URL with <Location> sections, which compliments <Directory>. Restrict access by filename with <Files>. All of these can also match against regular expressions. Rewrite URLs based on complex criteria (including conditionals), with mod_rewrite Server pool tuning with MaxSpareServers and MinSpareServers Server-based content negotiation, based either on a file listing the variants, or automatically generated from file extensions Set actions for files with particular extensions (SetHandler and AddHandler directives) Set environment variables based on any received headers or other information about the request, with SetenvIf. Set mime type for all files in directory with ForceType directive Status module to see the status of the child processes and what request they are currently servicing Turn DNS lookups on/off at run time (HostnameLookups directive) USER_NAME environment variable set when SSI execs a CGI, giving owner of SSI file Use CERN format 'metafiles' to add header info to response (optional mod_cern_meta module) Unbuffered CGI output (actually Apache does buffer CGI output for efficient use of the network, but will send output to the client as soon as the CGI is no longer providing more output) Year 2000 compliant <VirtualHost> sections can contain almost any configuration directive, with no need for <SRMOptions> sections Apache does not implement these features: Kerberos Parsing output of CGI for SSI directives Authentication against NIS usernames and passwords (although there are third party modules which do this) Some features that are available in both NCSA and Apache are implemented differently in the servers. The detailed list of changed directives, below, gives more information. This is a summary of the main changes. .htaccess files restricting by host .htaccess files written using the examples on the NCSA site which restrict by host, not by user, may not work with Apache. The examples to restrict by host also include the AuthName and AuthType directives, which are only used in user authentication. The fix is to remove any of these commands from .htaccess files which only restrict by hostname (any AuthUserFile and AuthGroupFile directives should also be removed). DBM User and Group Files Apache supports DBM user and group files for authentication if the optional DBM module is used (mod_auth_dbm). This is configured by different directives, unlike NCSA which uses the same directives with a second argument to specify DBM format. (Apache 1.2 will also allow use of the same directive syntax as NCSA) Digest Support Digest authentication can be added to Apache with the optional digest module (mod_digest). FastCGI support This is available for Apache with an third party module (from the fastCGI site). Non-IP Virtual Hosts In Apache, these are implemented using the normal <VirtualHost> sections. The name to respond to is given in the <VirtualHost> directive or on a ServerAlias directive. The NameVirtualHost directive must be given to specify which interfaces are used for name-based virtual hosts. Apache does not implement <Host>. Log Files Apache logs to the transfer log in the standard common log format. It does not support the LogOptions directive to build user agent and referrer information into the log file. However, the log format can be completely customised with the LogFormat directive and multiple logs can be created with CustomLog. KeepAlive The directives to support keepalive (persistent connections) use a different syntax. Server Pool Both Apache and NCSA 'pre-fork' a pool of servers to handle requests. However, in Apache the main (parent) process does not handle any part of the request. In NCSA, the parent process receives each request then hands it to a suitable child. In Apache, a pool of 'spare' servers is maintained, and the number of servers is configurable. XBITHACK This is a runtime directive in Apache. This section lists all the directives that NCSA supports. For each directive, we say whether that directive exists in Apache, and if it does, whether there is any change in meaning or syntax. Where directives do not exist in Apache, we either give an alternative method of implementing it in Apache, or state that the feature related to that directive is not implemented (if it will be implemented in Apache 1.2, we note it here). Apache does not distinguish between the three configuration files that NCSA HTTPd uses. That is, in Apache, any directive can appear in any of the configuration files (and in fact it is possible to put all the directives into a single file, if desired). However, this list of directives is split into sections for each of the configuration files, and the directives listed in the same order as given in the NSCA documentation. Directives valid in NCSA's Server Configuration file (httpd.conf): ServerType: same Port: same User: same Group: same ServerAdmin: same ServerRoot: same ServerName: same StartServers: same (but NCSA does not use the same method for its pool of servers) MaxServers: use MaxClients instead, same syntax and meaning MaxRequestsPerChild: same TimeOut: same, except Apache resets the timeout on sending each time data is written (when sending a file), so this is not an overall timeout. AccessConfig: same ResourceConfig: same TypesConfig: same IdentityCheck: same, except can be set in .htaccess files BindAddress: same syntax. Can however be used with virtual host configurations. See also new Listen directive for more control over addresses bound to <Host>: not valid. Implement non-IP virtual hosting using normal <VirtualHost> section and NameVirtualHost <VirtualHost>: same, except Apache does not support the errorlevel argument (it effectively defaults to 'required'). <VirtualHost> can take multple hosts and IP addresses. <VirtualHost> is used to implement non-IP vhosts (see NCSA Host directive) when combined with NameVirtualHost. Almost all directives are valid within a <VirtualHost> section, so the NCSA <SRMOptions> section is not needed in Apache. <SRMOptions>: not applicable. Apache does not distinguish between the three config files, so directives are valid in all. You can just remove the <SRMOptions> and </SRMOptions> lines. ErrorLog: same, except Apache can log to a pipe (ErrorLog |program) TransferLog: same. Apache can also log to a pipe (i.e. another process) with "TransferLog |program". Log file is in standard 'common log format'. No LogOptions Combined format to include user agent or referer information, howeve the log format can be set with LogFormat directive and multiple log files created with CustomLog AgentLog: available if mod_log_agent compiled in. Syntax same, except Apache may log to a pipe, AgentLog |program. RefererLog: available if mod_log_referer compiled in. Syntax same, except Apache may log to a pipe, RefererLog |program. RefererIgnore: available if mod_log_referer compiled in. Syntax same. PidFile: same LogDirGroupWriteOK: not implemented. LogDirOtherWriteOK: not implemented. LogOptions: not valid in Apache. To specify formats, use the mod_log_config module and LogFormat instead. For separate agent and referer logs, use mod_log_agent and mod_log_referer modules. KeepAlive: on Apache, argument is the maximum number of requests per connection. Use a value of 0 to disable keepalives. KeepAliveTimeout: same syntax. If not given, Apache defaults to 15, NCSA 10 MaxKeepAliveRequests: not valid. Use KeepAlive instead, except a value of 0 in NCSA means stay alive forever, in Apache it disables keepalives completely AssumeDigestSupport: not valid (but it doesn't do anything in NCSA anyway) Annotation-Server: not valid Directives valid in NCSA's Resouce Configuration file (srm.conf): DocumentRoot: same UserDir: same, except apache can also use a full-path with * to represent username (e.g. UserDir /home/*/public_html). Also Apache can redirect to a full URL. AccessFileName: same Redirect same (but the order that Alias and Redirects are applied may be different). Apache can only redirect to a full URL, not a relative URL. RedirectPermanent: same, but "Redirect permanent" is prefered RedirectTemp: same, but "Redirect temp" is prefered Alias: same ScriptAlias : same AddType same, except Apache can have multiple extensions listed AddEncoding same, except Apache can have multiple extensions listed DefaultType: same DirectoryIndex: same (Apache can use multiple names, as can HTTPd 1.5). Apache can list names as URLs relative to the server root. FancyIndexing: same, but IndexOptions preferred DefaultIcon: same ReadmeName: same HeaderName: same AddDescription: same AddIcon: same AddIconByType: same AddIconByEncoding: same IndexIgnore: same IndexOptions: same ErrorDocument: same, except Apache can also output a static string with ErrorDocument "string, or redirect to a full URL. Apache passes on more REDIRECT_xxx env variables (all variables existing at time of the redirect are renamed REDIRECT_variable). But it does not pass on the error message in QUERY_STRING, or REDIRECT_REQUEST (use REDIRECT_URL instead). Apache can put ErrorDocument in .htaccess. Directives valid in NCSA's Access configuration file (access.conf, or .htaccess files where allowed): <Directory>: same Options: same, except Apache also supports MultiViews option (for server-side content negotiation) AllowOverride: same, except Apache does not use Redirect (use FileInfo instead to control Redirects in .htaccess file) AuthName: same, except in Apache any realm name containing spaces must be enclosed in double quotes AuthType: same (basic only) digest supported by optional mod_digest AuthUserFile: same, except Apache does not support the second argument (standard, dbm or nis). Use AuthDBMUserFile instead for DBM format (1.2 will implement second arg to AuthUserFile). There are third party modules which implement NIS authentication. AuthGroupFile: same differences as AuthUserFile. AuthDigestFile same, if optional mod_digest compiled in <Limit>: same. Note that in Apache, the directives valid inside <Limit> can also appear outisde, in which case they apply to all methods order: same deny: same (but see allow for note about partial comparisons) allow: same, except Apache applies comparisons against full components only, eg bar.edu matchs x.bar.edu, but does not match x.foobar.edu. require: same referer: not valid in Apache. To restrict by referer, or any other request header, use third party module mod_rewrite (to be distributed in Apache 1.2) satisfy: same OnDeny: not valid. Can be implemented by specifying an ErrorDocument 401 Other changes: The XBITHACK functionality is configurable at runtime with XBitHack directive All configuration directives can be used in any of the config file Apache does not set the SERVER_ROOT, REMOTE_GROUP or ANNOTATION_SERVER CGI variables Content Negotiation How to use Apache's content negotiation to transparently serve files in different languages, media types or character sets. Content Negotiation Explained Content Negotiation is an often over-looked feature of Apache, but correctly used it can let you present documents in different languages and formats based on what the user wants. Apache is one of the few servers that actually implements content negotiation. However there are a few problems caused by browsers which do not do the right thing. We explain how to use negotiation correctly, and why some browsers make this difficult. Content negotiation is a very powerful tool where the browser says what type of information it can accept, and the server decides what (if any) type of information to return. The term type is used very loosely here, because negotiation can apply to several aspects of the information. For example, it can be used to choose the appropriate human language for a document (say, French or German), or to choose the media type that the browser can display (say, GIF or JPEG). In order for the server to deliver the correct representation of the data, the browser must send some information about what it can accept. A browser used on a French-language machine, for instance, should indicate that it can accept data in French (of course, this should also be user-configurable). The most common use of content negotiation at the moment is to select data based on media type. Here, the browser says what sort of data it can display. For example, when requesting an inline image, the browser could tell the server that it can accept GIF and JPEG images. Infact, the browser might prefer to JPEG over GIF images because they are quicker to download, so it can specify this as well. The ability to indicate what content types a browser can accept is particularily important now that plug-ins can extend the browser capabilities. Unfortunately many current browsers don't supply the correct information to the server. To use negotiation, you need two things. Firstly, you need a resource that exists in more than one format (for example, a document in French and German, or an image stored as a GIF and a JPEG), and secondly you need to configure Apache to know that each of these files is actually the same resource. Apache has two methods for doing this: either using a special index file to identify the various versions of the information, or using the MultiViews facility where Apache gets the information it needs from file extensions. The first method involves creating a variants file, usually referred to as a var file. This lists each of the files which contains the same resource, along with details of what representation it is. Any request for this var file causes Apache to return the best file, based on the contents of the var file and the information supplied by the browser. To get Apache to use variant files, first uncomment the following line in srm.conf: AddHandler type-map var and restart the server as normal. As an example, say there is a file in English and a file in German containing the same information. The files could be called english.html and german.html (they are both HTML files). So create a var file listing each of these files, and specifying which languages they are in. Create a var file called (say) info.var containing: URI: english.html Content-Language: en URI: german.html Content-Language: de This file consists of a series of sections, separated by blank lines. Each section contains the name of the file (on the URI: line) and header information used in the negotiation. Now, when a request for info.var is received, the server will read the var file and return the best file, based on which languages the browser has said it can accept. Similarly, the var file could be used to select files based on content type (using Content-Type:) or content encoding (using Content-Encoding:), or any combination. The Content-Type: line in a variants file can also give any other content type parameters, such as the subjective qualify factor. This will be used in the negotation when picking the 'best' match. For example, an image available as a JPEG might be regarded as having higher quality then the same image in GIF format. To tell this to the server, the following .var contents could be used: URI: image.jpg Content-Type: image/jpeg; qs=0.6 URI: image.gif Content-Type: image/gif; qs=0.4 Here the qs parameters give the 'source quality' for these two files, in the range 0.000 to 1.000, with the highest value being the most desirable. A browser than indicates it can handle both GIF and JPEG files equally would see the JPEG version rather than the GIF. Using variant files gives complete control over the scope of the negotiation, however it does require the file to be created and maintained for each resource. An alternative interface to the negotiation mechanism is to get Apache to identify the negotiation parameters (language, content type, encoding) from the file extensions. Instead of using a var file, file extensions can be used to identify the content of files. For example, the extension eng could be used on English files, and ger on German files. Then the AddLanguage directive can be used to map these extensions onto the standard language tags. To use this feature, the MultiViews option must first be turned on in the directory, either in access.conf or a .htaccess file. Note that Options All does not turn on multiviews. After enabling multiviews, the directives which map extensions onto representation types can be given. These are AddLanguage, AddEncoding and AddType (content types are also set in the mime.types file). For example: AddLanguage en .eng AddLanguage de .ger AddEncoding x-compress .Z AddType application/pdf pdf (the last line is shown as an example only, this is actually set in the mime.types on recent Apache versions). When a request is received, the server looks at all the files in the directory which start with the same filename. So a request for /about/info would cause the server to negotiate between all the files names /about/info.* For each matching file, the server checks its extensions and sets the content type, language and encodings appropriately. For example, a file called info.eng.html would be associated with the language tag en and the content type text/html. The source quality is assumed to be 1.000 for all files (this can actually be set on the mime type, like "text/html;qs=0.5" but this confuses most browsers so is probably best not used). The extensions can be listed in any order, and the request itself can include one or more extensions. For example, the files info.html.eng and info.html.ger could be requested with the URL info.html. This provides an easy way to upgrade a site to use negotiation without having to change existing links. Of course, for negotiation to work browsers must send the correct information. While most make a reasonable attempt there are some problems. For negotiation to work, browsers must send the correct request information. For human languages, browsers should let the user pick what lanuguage or languages they are interested in. Recent betas versions of Netscape let the user select one or more languages (see the Options, General Preferences, Languages section). For content-types, the browser should send a list of types it can accept. For example, "text/html, text/plain, image/jpeg, image/gif". Most browsers also add the catch-all type of "*/*" to indicate that they can accept any content type. The server treats this entry with lower priority than a direct match. Unfortunately, the */* type is sometimes used instead of listing explicitly acceptable types. For example, if the Adobe Acrobat Reader plug-in is installed into Netscape, Netscape should add application/pdf to its acceptable content types. This would let the server transparently send the most appropriate content type (PDF files to suitable browsers, else HTML). Netscape does not send the content types it can accept, instead relying on the */* catch-all. This makes transparent content-negotiation impossible. In addition, most browsers do not indicate a preferences for particular types. This should be done by adding a preference factor (q) to the content type. For example, a browser which can accept Acrobat files might prefer them to HTML, so it could send an accept type list which includes text/html: q=0.7, application/pdf: q=0.8. When the server handles the request, it would combine this information with its source quality information (if any) to pick the 'best' content type to return. The new HTTP/1.1 specification defines how content negotiation works for the first time. It also adds some new facilities which are not yet available in any browser or server. This includes the ability for the server to return a list of possible matches if it cannot identify the best one to use. Apache implements the server end of HTTP/1.1 content negotiation. Publishing Pages with PUT Apache can support publishing pages with PUT, but it requires some work. Publishing Pages with PUT One of the most common questions we get asked is whether Apache supports web publishing with the PUT method. Netscape Navigator Gold, AOLPress and Amaya all support this method of publishing pages. Technically the answer is yes, Apache supports that method. However it does not come with any scripts or programs which actually implement the publishing behaviour. This article explains what the PUT method is, how it can be used in Apache, and what is required to support publishing with it. It also gives a basic script to handle publishing, and explains why this script should be used very carefully to prevent security problems. First published in Apache Week issue 59 (4th April 1997). When a browser requests a normal page from a server, it uses the "GET" method. This is the standard way to get back information from a server. The information itself may come from a static page, a CGI program, a server-side include page or any other source handled by the server. By definition it is safe for a browser to obtain a page by GET as many times as it likes - it will never cause any permanent action on the server (such as entering a product order). To perform a permanent action on the server, the "POST" method is used. This method must be handled by a program or script, and the browser should not re-request a POST page without getting the user to confirm it. This POST method is used when a script or program requires a lot of form data input or when the request makes the server perform a real action such as entering an order. The "PUT" method is similar to the POST method in that it can cause information to be updated on the server. The difference is that the POST method is normally handed a script which is explicitly named by the resource (that is, something that already exists), while a PUT request could be directed at a resource which does not (yet) exist. Another difference is that the POST method can be used in response to a form, while the PUT method can only contain a single data item. The PUT method is suited for publishing pages. There is some confusion about whether Apache supports the PUT method. In fact, Apache handles PUT exactly like it handles the POST method. That is, it supports it, but in order for it to do anything useful you need to supply a suitable CGI program. This is on contrast to the GET method, which Apache supports internally by sending back files or SSI documents. If you have a script which is capable of handling PUT requests, you can easily configure Apache to support that script. This is done with the Script directive. This specifies a script (i.e. a CGI program) to be run whenever a PUT request is received. For example, if you put your CGI program which handles PUT requests into /cgi-bin/put, you would add this like Script PUT /cgi-bin/put into your srm.conf or access.conf (depending on whether you want your entire contents to handled by this script, or just a specific subdirectory). Note that you also need to make sure that this script is executable, by either placing it in a ScriptAlias directory, or giving it a suitable extension and turning on CGI execution for that extension. The CGI script has to be able to accept a page sent it, and look and the request URL to decide where to place the file. If it is successful it should return a status of 201 or 204 if everything went ok. The basic operation of a PUT script should be: Check that request comes from the PUT method Get the file to update or create from PATH_TRANSLATED Read the data (read CONTENT_LENGTH bytes from standard input) Write the data to the file Return a 201 or 204 status. A simplistic script to implement PUT handling like this is available in put1. Among aother limitations, this script does not check to see if you are attempting to upload a CGI script or if the destination is a directory. However the main failing is that it implements no security checks, and if you have a secure setup it will not even have permission to update the files. Configuring Apache is the easy part: the hard part is creating a server environment and script which are secure. Some of the main security requirements are: Make sure the PUT script can only be run by authorised users Make sure that the script can update only web content files Make sure the authorised users can only update their pages, not other people's pages on the same server The first issue can be addressed by making sure that the script is protected by username and password authentication. The second issue is more complex. To be able to update the files on your server the script must have enough permission to write or create the content files. This in itself is a security risk, since it means if a bug or security hole is found in any of your other CGI programs anyone on the Internet could potentially change any of your files. On most servers the httpd process runs a some relatively unprivileged user, such as "nobody". This user should not own or have write access to any of the files on the server. So the first problem with generating a secure PUT script is determining how the script can get permissions to update files owned by a different user. One way of doing this, new in Apache 1.2 betas, is to use the "suEXEC" code. This allows a script to be run as a different user. This comes with Apache but is not installed by default, because of the security risks it can create if used inappropriately. You need to install it, and arrange it so that the PUT script is executed as the user that owns your web files. In this case, it would be sensible to ensure that this user does not have write access to any other parts of the file system, such as your Apache configuration files or .htaccess files. The final security issue applies if you have multiple content providers (such as different customers) where you cannot trust them not to try to update each other's pages. There are several ways to add fix this: If the customers are in different virtual hosts, use the suEXEC mechanism to give each customer a different Unix username and execute the script as that user. Use a different PUT script for each customer, with individual access authentication for each user, and hard-code the paths that they are allowed to update into the script. Add lots of careful checks into the PUT script to ensure that each REMOTE_USER can only update pages in their area Netscape Navigator Gold, AOLPress and Amaya can publish pages with the PUT method. Assuming you have a PUT script which provides a level of security you are happy with, this section explains how to use these programs to publish pages. Other Web publishing program should be similar. To publish pages, you need to configure your server as given above. This section shows how to do this in more detail with better user security. First, decide which areas of your document tree you want to allow people to publish to. For this example, we will assume people can publish to any page on the server. You need to add a Script PUT directive into the <Directory> section for the directory where you want to enable PUT uploading, and put the PUT script into a user-authenticated directory. For example <Directory /usr/local/etc/httpd/htdocs> Script PUT /cgi-bin-putusers/put.cgi </Directory> <Directory /usr/local/etc/httpd/cgi-bin-putusers> AuthType Basic AuthName "Authorised PUT Publishers" AuthUserFile /usr/local/etc/httpd/htpasswd-putusers Require valid-user </Directory> ScriptAlias /cgi-bin-putusers /usr/local/etc/httpd/cgi-bin-putusers You will have to modify this for your setup. You also need to enter a username and password into the htpasswd-putusers file using htpasswd. Note that there are many other ways to configure user authentication for a PUT script, including using a <Files> to apply a restriction to just the PUT script, or using <LIMIT PUT> to limit just the PUT method scripts within your existing cgi-bin directory. With this configuration, all PUT requests will be handled by the named script (/usr/local/etc/httpd/cgi-bin-putusers/put.cgi). Now all you need to do is author a page then select the publish function. In AOLPress and Amaya, you do File, Save (or File, Save As) and type the full URL of the location to publish the file to (e.g. http://www.my_server.com/first.html). In Navigator Gold, select File, Publish. In the "Upload Files to this location" box, enter the full URL of the page to create. For example, if your server is called www.my_server.com and you want to upload to a file called "first.html" in the document root, you would enter http://www.my_server.com/first.html Also enter the username and password you created in the htpasswd-putusers file. With AOLPress, select File|Save As, then type the full URL of the page to upload into the "Location" box. There are few scripts available which implement PUT handling securely. For this reason the general recommendation for using publishing functions is to use FTP rather than HTTP where possible. However if you want to implement PUT-based publishing, you might like to start which one of these programs: A PUT program in C designed for the CERN server mod_put Apache module for PUT and DELETE The issues raised in the above section on security apply to these programs as well, so before you use them review the source code, install them in a user-authenticated area, and make sure that when run from the httpd server they only have write permission to the content files you want to be able to update. Using Server Side Includes Server Side Includes make adding dynamic content to your documents easy. We show how to use SSI on your site, and the extensions that Apache supports. Using Server Side Includes While standard HTML files are fine for storing pages, it is very useful to be able to create some content dynamically. For example, to add a footer or header to all files, or to insert document information such as last modified times automatically. This can be done with CGI, but that can be complex and requires programming or scripting skills. For simple dynamic documents there is an alternative: server-side-includes (SSI). SSI lets you embed a number of special 'commands' into the HTML itself. When the server reads an SSI document, it looks for these commands and performs the necessary action. For example, there is an SSI command which inserts the document's last modification time. When the server reads a file with this command in, it replaces the command with the appropriate time. Apache includes a set of SSI commands based on those found in the NCSA server plus various extensions. This is implemented by the includes module (mod_includes). By default, the server does not bother looking in HTML files for the SSI commands. This would slow down every access to a HTML file. To use SSI you need to tell Apache which documents contain the SSI commands. One way to do this is to use a special file extension. .shtml is often used, and this can be configured with this directive: AddHandler server-parsed .shtml AddType text/html shtml The AddHandler directive tells Apache to treat every .shtml file as one that can include SSI commands. The AddType directive makes such that the resulting content is marked as HTML so that the browser displays it properly. An alternative method of telling the server which files include SSI commands is to use the so-called XBitHack. This involves setting the execute bit on HTML files. Any file with a content type of text/html (i.e. an extension .html) and with the execute bit set will be checked for SSI commands. This needs to be turned on with the XBitHack directive. For either method, the server also needs to be configured to allow SSIs. This is done with the Options Includes directive, which can be placed in either the global access.conf or a local .htaccess (although the latter must first be enabled with AllowOverride Options). Since some SSI commands let the use execute programs which could be a security risk, an alternative option, IncludesNOExec lets SSI commands work except for any which would execute a program. All SSI commands are stored within the HTML in HTML comments. A typical SSI command looks like this:  In this case the command is flastmod, which means output the last modified time of the file given. The arguments specify the file "this.html" (which might be the name of the file containing this command). The whole of the command text, including the comment marker  will be replaced with the result of this command. In general, all commands take the format:  where arg1, arg2, etc are the names of the arguments and value1, value2 etc are the values of those arguments. In the flastmod example, the argument is 'file' and it's value is 'this.html'. Often commands can take different argument names. For example, flastmod can be given a URL with the argument virtual, to get the last modified time from the server. For example:  to get the last modification time of the home page on the server (this is useful if the page being accessed might have a different file name, for instance). Besides flastmod, there are SSI commands which get the size of a file or URL, the contents of a variable (passed in by the server), the contents of another file or URL, or the result of running a local file. These are documented in the NCSA tutorial on server side includes. When SSI commands are executed, a number of 'environment variables' are set. This include the CGI variables (REMOTE_HOST etc), and some more, such as DOCUMENT_NAME and LAST_MODIFIED. These can be output with the echo command (so a better way of getting the last modification time of the current file would be ). Apache extends the standard (NCSA-compatible) SSI language considerably. Some of the extensions include: Variables in commands: Apache allows variables to be used in any SSI commands. For example, the last modification time of the current document could be obtained with  Setting variables: the set command set be used within the SSI to set variables. Conditionals: SSI commands if, else, elif and endif can be used to include parts of the file based on conditional tests. For example, the $HTTP_USER_AGENT variable could be tested to see the type of browser and different HTML codes output depending on the browser capabilities. Here are some examples of using SSI: Displaying document information The following code puts the document modification time on the page: Last modified:  Adding a footer to many documents Add the following text to the bottom of each of the documents:  Hide links from external users Use the if command and the REMOTE_ADDR CGI variable to see if the user is in the local domain:  <a href="internal-documents.html">Internal Documents</a>  (Where 1.2.3 is the IP address prefix of the local domain). Apache and Secure Transactions All about Apache and SSL, including US export restrictions, RSA licensing, ciphers, key escrow, certificates and authorities. Feature: Apache and Secure Transactions We explain what SSL is, why Apache does not have it built in, and why it is such a complex issue. We examine the restrictive US government rules and commercial interests that together restrict what can be imported and exported from the US and Canada. First published in Apache Week issue 24 (19th July 1996). Last updated 1st September 1998. Most of the information passed across the Internet is not particularly sensitive. In fact, most if it is specifically designed to be as widely read as possible. But some information is sensitive. For example, when ordering from a site via credit card, the credit card number is transmitted across the Internet from the browser to the server. In theory, a third party could intercept this information at some point on the network between the browser and the server. To prevent this, some form of encryption can be used so that even if someone intercepts the data they cannot decode it back to the original credit card number (or what ever else it was that was encrypted). Obviously both the browser and the server need to use the same encryption method. The most widely implemented encryption system for the Web at present is SSL. SSL stands for Secure Socket Layer, a protocol developed by Netscape for secure transactions across the Web. It uses a form of public key encryption, where the information can be encoded by the browser using a publicly available public key, but can only be decoded by someone who knows the corresponding private key. Any product can incorporate SSL technology without paying any royalties. Extending Apache to handle SSL is a programming job, made relatively easy by the availability of a free SSL implementation, called SSLeay. However, the US government effectively prevents Apache from doing this. Although it is the SSL standard that defines how the encryption is applied to Web transactions, the actual encryption itself is performed by a number of cipher algorithms. When an SSL browser and SSL server first communicate they mutually pick a cipher algorithm that both support. Some commonly used ciphers are listed in this table: CIPHER BITS DESCRIPTION 3DES 168 These are well-proven, 168-bit, triple-encryption ciphers. Supported by products based on SSLeay such as Stronghold and SafePassage but not by products from Microsoft or Netscape. IDEA 128 This cipher uses 128-bit keys but it is not commonly found in web browsers or servers. It is possible, but very slow, to use triple-IDEA with 384 bit keys. In the USA and Europe a license from Ascom AG is required to use these ciphers. RC4 and RC2 128 These ciphers use 128-bit keys, which normally offer a high degree of security. Inside the USA a license from RSA is required to use these ciphers. Export RC4 and RC2 40 These ciphers use 40-bit keys but are otherwise identical to their equivalent 128-bit versions. Servers and browsers produced by Netscape and Microsoft support these ciphers. Inside the USA a license from RSA is required to use these ciphers. An interactive tool from Netcraft is available that can query any secure Web site and show which ciphers it supports. Experts agree that 40 bit encryption does not provide an adequate level of safety and there have been several publicised hacks (See C|Net story). A panel of cryptographic experts including Whitfield Diffie, the inventor of public key cryptography, issued a report in January 1996 that said a minimum of 75 bits was necessary for "adequate protection against the most serious threats" and 90 bits was necessary to thwart advances in hacking techniques for the next 20 years. The US Government imposes export restrictions on arms, in a set of rules called ITAR (International Traffic in Arms Regulations). Amongst the restricted arms is "strong" encryption software. (See the EFF archive on ITAR). Software that implements SSL in the US cannot be exported because of these rules (actually, it can be exported to Canada, but no further). SSL enabled software can be exported outside of the US if the software can only encrypt using a maximum of a 40 bit key. Commercial server vendors in the US such as Netscape and Microsoft export secure servers using this weekened 40 bit encryption. Recent legislation allows for registered companies to export software that uses 56 bit keys, but only if they allow the US government to access the data under certain circumstances. This is normally done by allowing a third-party to store or recover the keys - a system referred to as "key escrow". Higher levels of encryption can also be exported to approved financial institutions (primarily banks). The US and other governments are worried that they cannot access information once it has been encrypted. They would like to be able to decrypt all encrypted data. For some time, the US government has only supported encryption schemes which would allow them to decrypt the encrypted data if necessary, such as the "Clipper" chip. In normal (secure) encryption, the only people that can decrypt the data are the sender and recipient, who between them have the necessary keys. But in key escrow schemes a third-party will also have the ability to decrypt the data (this third-party may be the developer of the encryption product, the US government, or some other "trusted" organisation). Key escrow is also referred to as key storage or key recovery. From January 1997 the US government has been allowing the export of encryption technology up to 56 bits, but only if the exporter agrees to key escrow. This would allow the US government to decrypt any data encrypted with these exported 56 bit systems. Companies which wish to export 56 bit encryption products need to be specially licensed by the US government. Apache is developed by an international team of individuals, using a server in the US. The ITAR rules mean that if the Apache server included SSL it could not be exported outside the US. This would prevent the non-US developers from continuing to work on it, and would stop anyone outside the US from using Apache. A solution to this problem adopted by some free software developers is to run a parallel development effort outside the US. The US development would not contain any SSL or encryption technology, while the non-US version would. The main problem with this arrangement is ensuring the parallel development of the two versions, and it would also require a non-US site to host the development. The problems with the export restrictions of ITAR are not limited to Apache or other free software. Many US corporations are concerned that their competitors in other countries are able to make and sell encryption-enhanced products which they are forbidden to export. (See C|Net report). In the meantime, while Apache remains an international software development based on a server in the US, it cannot incorporate SSL. There are patches to link Apache with SSL (using SSLeay), such as mod_ssl and Apache-SSL. These are legally useable for free anywhere in the world, except for the US. The problem with using this version in the US is not the export regulations (which only apply to export, not import), but rather because of the sometimes confusing issues of encryption patents and certificate authorities. Commercial servers such as Netscape base their SSL implementations on ciphers that are developed and patented by RSA Data Security in the US. Use of this technology normally requires a license fee inside the US. If Apache-SSL or mod_ssl is imported into the US, then any user would have to arrange to pay the appropriate license for the patented encryption methods which are part of SSLeay (although non-commercial users can use a license-free implementation of RSA, called RSAref). It may be difficult for an individual to license RSA. The alternative to paying the RSA license individually is to buy a commercial version of Apache with SSL for which RSA has already been licensed by the developer. Examples of such products are the Apache module Raven and the web server Stronghold. Stronghold is developed outside the US so it can also be used with full 128-bit encryption outside the US and Canada. Raven is not available outside the US and Canada with 128-bit security. Outside the US, no license fee is required for the use of the RSA methods because they are only patented inside the US and SSLeay uses an independant implementation of the cipher algorithms. This means that outside the US Apache-SSL and mod_ssl can be used for free. Having got a server, the final thing required before it can be used for secure transactions is a certificate. A server certificate is a piece of digitally-encrypted information that lets the browser know what organisation it is accessing. To prevent people just making up certificates and pretending to be official organisations, certificates can be obtained from a certificate authority, who use their position as a third-party to verify that the organisation using the certificate is who they say they are. Probably the best know authority is Verisign in the US. In fact, early versions of Netscape Navigator (version 1) would only accept certificates from Verisign. Other certificate authorities can be used but unless they are recognised by the browser manufacturers they will either be rejected when a user tries to connect or the user will be given a long sequence of warning screens. An example of this is Thawte, whose certificates are accepted by Navigator version 3 and Internet Explorer version 3.01 but not previous versions of either browser. If the server operator wants their certificates to be accepted transparently by all versions of Netscape and Internet Explorer they will have to get certificates signed by Verisign. To get a certificate from Verisign the server in use must be approved. Most commercial secure servers will have been submitted for approval by their developer, and certificates are available for Stronghold. Verisign will also issue certificates for web servers using the free SSLeay libraries, such as Apache-SSL. To get a secure server based on Apache, first decide on your certificate authority. If you want every browser to connect seamlessly you'll need a certificate from Verisign. If you don't mind that older browsers will have to go though the Netscape security wizard or be unable to connect you could use Thawte. If you are in an Intranet environment you can distribute browsers with your certificate authority already configured so you may wish to issue your own certificate. Then: Inside the US and Canada Either Buy a Verisign-accredited, RSA licensed server (such as Stronghold) or add Raven to Apache, and buy a certificate, or Download Apache and Apache-SSL or mod_ssl patches, compile, pay RSA license for RSA-patented technology, and buy a certificate or sign own certificate (however RSA may not license RSA to individuals) Outside the US and Canada Either Buy a Verisign-accredited server from a non-US vendor (e.g. Stronghold) and buy a certificate, or Download Apache and Apache-SSL or mod_ssl patches, compile, and buy a certificate or sign own certificate What the Web Server Surveys Reveal We look behind the headline figures of two popular web server surveys with an in-depth analysis of which Apache versions are being used and how long it takes the Apache community to adopt new releases. What the Web Server Surveys Reveal ApacheWeek has often reported on the success of the Apache Web Server as shown by the E-Soft Web Server and Netcraft surveys, and how they have consistently shown Apache to be the most popular and more widely deployed server than all the others combined. In this 200th issue of ApacheWeek, we look behind the headline figures of those surveys with an in-depth analysis of which Apache versions are being used and how long it takes the Apache community to adopt new releases. Although both surveys show the total number of sites using Apache, the E-Soft survey figures also reveal some interesting facts about which versions of Apache are in use, and that take up of newer releases is not immediate. Plotting the number of sites using 1.3.x versions month for month from release date indicates migration from older versions is slow. As a percentage of Apache powered sites, in the case of almost all versions, their use continues to remain constant for a few months even after a new release. Take into account the number of sites using Apache is increasing every month; and the actual number of sites using older releases continues to rise for anything up to three months after a new release becomes available. Graph 1: Individual release take up It wasn't until April this year, with Apache 1.3.9 released 9 months earlier, that the use of a single 1.3 version exceeded that of older 1.1 and 1.2 versions. Even today, only 6% of sites are using the most recent release, 1.3.12, and over 25% of sites are still powered by older Apache versions from the 1.0, 1.1, and 1.2 generations. Graph 2: Apache releases in use, May 2000 One of the most interesting findings from the survey is to see how new releases may influence the take-up of Apache as a server. Looking at the monthly increase in the number of sites powered by the server, some of the largest rises follow particular release dates. The month following the release of Apache 1.3.3 (released on October 9 1998) saw one of the highest monthly increases in use. Apache 1.3.3 was a minor upgrade to Apache 1.3.2, but fixed one quite important problem; various error responses, such as "404 Not Found" displayed the full path to the missing file. Other problem fixes included the spelling module - which in 1.3.2 did not return the list of possible matches when more than one file is similar to the requested URL - and a problem where missing .htaccess files could result in a "Forbidden" response. Some platform specific bug fixes - including the Windows zombie processes problem - were also fixed. Graph 3: Monthly increase in sites powered by Apache Apache 1.3.12, the most current version, has also seen a huge increase in use in the month following its release. This addressed security issues raised by a CERT advisory on cross-site scripting which wasn't specific to Apache and had wide reaching consequences for anyone who uses or writes scripts for web servers. Patches were quickly made available for the previous version (1.3.11) followed shortly afterwards with the release of 1.3.12 at the end of February. Once again, it was shown that the contributors to open source projects can respond as efficiently as commercial developers to major security issues. The surveys can't tell us whether the increases are attributable to upgraders or new adopters, and it is purely speculative as to whether the rapid provision of a security fix to a problem contributed to the migration from other servers to Apache. However, the E-Soft Survey shows there was an increase of 76,000 sites using Apache in March 2000, and 36,000 sites using 1.3.12. What can not be disputed is the phenomenal success of the Apache web server, now with a share of the server software market that commercial vendors only dream of. Whichever version is in use, it's all part of the ever-growing Apache community which Apache Week will continue to support. First published in Apache Week issue 35 (4th October 1996). Hints and Tips Apache Week regularily contains information about how to get the most out of the Apache server. To save you having to wade through all the past issues, here is a summary of the hints and tips we've carried, plus a few more for good measure. Hints and Tips If you are planning on upgrading to 1.3, read our Guide to Apache 1.3. See our feature on Content Negotiation As it implies, the <Directory> directive only applies to directories. Restricting access to particular files The <Location> directive can be used to restrict access based on the request URL. So it can applied to individual files. For example, to prevent access to the file /prices/internal.html by anyone outside 'domain.com', you could use <Location /prices/internal.html> order deny,allow deny from all allow from .domain.com </Location> The NCSA tutorial on .htaccess files shown an example .htacces file like this: AuthUserFile /dev/null AuthGroupFile /dev/null AuthName EnterPassword AuthType Basic <Limit GET> order deny,allow deny from all allow from .my.domain </Limit> This is designed to restrict access based on browser address, and not require any user authentication. The problem is that Apache will ask for user authentication, which fails because none has been setup. Apache does this because of the Auth* directives, which are unneccessary. The fix is to remove the Auth* lines. There are a number of things which can be done to tune the performance of the server. One quick and effective thing to try is to reduce the number of .htaccess files it tries to access on every request. Whenever Apache handles a request, it processes .htaccess files which determine access authorisation, and can set other options (e.g. AddType). It checks and processes .htaccess files in the same directory as the file it is serving, and also in all the parent directories. For instance, if you request the URL /docs/about.html and your document root is /usr/local/etc/httpd/htdocs, Apache tries to process .htaccess files in all these directories: / /usr /usr/local /usr/local/etc /usr/local/etc/httpd /usr/local/etc/httpd/htdocs /usr/local/etc/httpd/htdocs/docs Normally, there will be no .htaccess files above the document root, but Apache still needs to check the filesystem to make sure. This can be eliminated by using the trick that if the AllowOverride option is set to None, Apache doesn't bother checking for .htaccess files. So set AllowOverride to None for directory /, and turn AllowOverride back on for whatever settings are really needed for the directory /usr/local/etc/httpd/htdocs. For example, the following code in access.conf would speed up Apache: <Directory /> AllowOverride None </Directory> <Directory /usr/local/etc/httpd/htdocs> AllowOverride All </Directory> The second directory section turns on AllowOverrides, so that .htaccess files are processed again. The 'All' can be replaced with whatever level of configurability is wanted. If you have web documents in different directories besides the document root, you will need to turn on .htaccess file in them as well (if desired). For instance, if you are using UserDir to allow access to files in home directories, you will need to set a suitable AllowOverride (and possibly other restrictions) with something like: <Directory /home/*/public_html> AllowOverride FileInfo Indexes IncludesNOEXEC </Directory> Sending the parent Apache process a USR1 signal will make it close the current log files, and re-open them, without loosing any connections currently in progress. This should be used instead of a HUP signal in any log rotation script. The script should first move the current log files to new names (the logs are still open at this stage). Then it should send a USR1 signal to the parent Apache process. The parent will tell the child process to die when they have finished processing their current request, and will open the log files for newly created children (since the old files have been renamed, the opened files will be newly created). As the old children finish their current requests they will close their handle to the (old) log files, and exit. When all the children are dead you can safely process the old log files (for example, by compressing it). Since you cannot know for definite when the old children have all died, the best way to do this is to make your log rotation script sleep for a while after sending the USR1 signal. An alternative way to implement log rotation is to get Apache to send log messages to a program of your choice via a pipe. This program can then decide how and when to rotate the log files. A program which may be useful for doing this is available as cronolog (not part of Apache). Apache comes with an uses three different config files (the srm.conf, access.conf and httpd.conf files). However it treats them all identically. So all the configuration could take place in a single file - httpd.conf (which is the first one read). This file should include the directives AccessConfig /dev/null ResourceConfig /dev/null to prevent it complaining about the missing srm.conf and access.conf files. CGI programs always run as the same user that owns the Apache server process. This is set with the User directive in the config file, and is typically a normal user such as 'www', or the 'nobody' user. In most cases, this is fine, since CGI scripts should run with few privileges to limit any potential malicious damage to the system. However, in some cases it would be nice to be able to run CGI programs as other users. For example: On a virtual host system, with multiple customers, CGIs could run as the customer's user, to let them read and write to the customer's files. On other systems with multiple users, CGIs in home directories could run as that user. The ability to run CGI programs as other users is referred to as 'running setuid', after the Unix filesystem ability to run a program as another user. The biggest problem with having a setuid CGI facility on a web server is security. It has to be very careful to ensure that the program running setuid cannot be invoked to do malicious damage to the system. Having setuid programs on a system can be dangerous, particularly if you do not trust all the other users on the system (which would be the case with both the example above). The risk is that other users could run the setuid program manually (from the command line) and give it an environment or command arguments that make it perform undesired activities. The risks of setuid programs are well-known to Unix system administrators, but a lot of web administrators do not have so much experience of Unix or setuid security programming. The suEXEC program included with Apache provides one method of running CGI programs as other users. The are some files that should probably never be served up to the user: files called .htaccess, .htpasswd, *.pl, *~ and so on. This can be done by preventing access to these files using a <Files> section. For example <Files .htaccess> order allow,deny deny from all </Files> An easy way to add new capabilities to the server without too much programming is to use some sort of "parsed HTML". This is a souped-up version of server side includes which lets you use variables, conditionals, loops and so on. Like SSI, these scripts get parsed on the server so they work with all browsers. There are several implementations of HTML scripting now available: SSI (part of Apache); NeoScript (linked to apache by a module); PHP. New directives have been added to force all the files in a particular directory to be processed by a given handler, or to be returned with a particular type. To set a handler, use SetHandler, and to set a mime type use ForceType. Note that these directives force the given type or handler to be applied to all files in the section, irrespective of the usually extension mapping rules. For example, a download directory could use "ForceType applicaton/octet-stream" in a .htaccess file to make the browser save the files, rather than try and display them. Or, all files in a directory could be treated as CGI programs with <Directory /usr/local/etc/httpd/cgi-bin> SetHandler cgi-script </Directory> Quick Questions How can I get my server to listen to more than one IP address, or more than one port? Use the "Listen" directive. How can I create extra virtual hosts without using extra IP address? Use the "name-based virtual hosts" as specified in the HTTP/1.1 spec. While this does not work with all browsers at present, the directives ServerPath and ServerAlias can be used to make you site work gracefully for older browsers as well. Can I let users access protected areas 'anonymously' (like anonymous ftp)? Yes, Use the anonymous authentication module, mod_auth_anon How can I convert the requested URL into some other format? The rewrite module provides a powerful means for translating URLs into other URLs or filenames. How do I set up Apache to handle PUT (or DELETE) requests for my authoring program? Use "Script PUT cgi-script" to call a CGI program to implement the PUT request. Can I implement NCSA's 'Satisfy' function? Yes, this is now in 1.2. Why does Apache 'lock-up' when accessed from Netscape 2? This might be due to a bug in Netscape when using Keep-Alives. The work-around is to turn off keep-alives on the server. In Apache 1.2, use the following directive: BrowserMatch Mozilla/2 nokeepalive Using User Authentication Restrict your documents to people with a valid username and password. Using User Authentication There are two ways of restricting access to documents: either by the hostname of the browser being used, or by asking for a username and password. The former can be used to, for example, restrict documents to use within a company. However if the people who are allowed to access the documents are widely dispersed, or the server administrator needs to be able to control access on an individual basis, it is possible to require a username and password before being allowed access to a document. This is called user authentication. Setting up user authentication takes two steps: firstly, you create a file containing the usernames and passwords. Secondly, you tell the server what resources are to be protected and which users are allowed (after entering a valid password) to access them. A list of users and passwords needs to be created in a file. For security reasons, this file should not be under the document root. The examples here will assume you want to use a file call users in your server root at /usr/local/etc/httpd. The file will consist of a list of usernames and a password for each. The format is similar to the standard Unix password file, with the username and password being separated by a colon. However you cannot just type in the usernames and passwords because the passwords are stored in an encrypted format. The program htpasswd is used to add create a user file and to add or modify users. htpasswd is a C program that is supplied in the support directory of the Apache distribution. If it is not already compiled, you will to compile it first. Run make htpasswd in the support directory to compile it (you might need to modify the Makefile first, since any configuration you did when compiling the server itself is not available to this makefile). After compilation, you can either leave the htpasswd binary where it is, or move it to a directory on your path (e.g. /usr/local/bin). In the former case, you will need to remember to give the full pathname to run it. The examples here will assume that it is installed somewhere on your path. To create a new user file and add the username "martin" with the password "hampster" to the file /usr/local/etc/httpd/users: htpasswd -c /usr/local/etc/httpd/users martin The -c argument tells htpasswd to create new users file. When you run this command, you will be prompted to enter a password for martin, and confirm it by entering it again. Other users can be added to the existing file in the same way, except that the -c argument is not needed. The same command can also be used to modify the password of an existing user. After adding a few users, the /usr/local/etc/httpd/users file might look like this: martin:WrU808BHQai36 jane:iABCQFQs40E8M art:FAdHN3W753sSU The first field is the username, and the second field is the encrypted password. To get the server to use the usernames and passwords in this file, you need to configure a realm. This is a section of your site that is to be restricted to some or all of the users listed in this file. This is typically done on a per-directory basis, with a directory (and all its subdirectories) being protected (Apache 1.2 and later also let you protect individual files). The directives to create the protected area can be placed in a .htaccess file in the directory concerned, or in a <Directory> section in the access.conf file. To allow a directory to be restricted within a .htaccess file, you first need to ensure that the access.conf file allows user authentication to be set up in a .htaccess file. This is controlled by the AuthConfig override. The access.conf file should include AllowOverride AuthConfig to allow the authentication directives to be used in a .htaccess file. To restrict a directory to any user listed in the users file just created, you should create a .htaccess file containing: AuthName "restricted stuff" AuthType Basic AuthUserFile /usr/local/etc/httpd/users require valid-user The first directive, AuthName, specifies a realm name for this protection. Once a user has entered a valid username and password, any other resources within the same realm name can be accessed with the same username and password. This can be used to create two areas which share the same username and password. The AuthType directive tells the server what protocol is to be used for authentication. At the moment, Basic is the only method available. However a new method, Digest, is about to be standardised, and once browsers start to implement it, digest authentication will provide more security than the basic authentication. AuthUserFile tells the server the location of the user file created by htpasswd. A similar directive, AuthGroupFile, can be used to tell the server the location of a groups file (see below). These four directives have between them tell the server where to find the usernames and passwords and what authentication protocol to use. The server now knows that this resource is restricted to valid users. The final stage is to tell the server which usernames from the file are valid for particular access methods. This is done with the require directive. In this example, the argument valid-user tells the server that any username in the users file can be used. But it could be configured to allow only certain users in: require user martin jane would only allow users martin and jane access (after they entered a correct password). If user art (or any other user) tried to access this directory - even with the correct password - they would be denied. This is useful to restrict different areas of your server to different people with the same users file. If a user is allowed to access the different areas, they only have to remember a single password. Note that if the realm name differs in the different areas, the user will have to re-enter their password. If you want to allow only selected users from the users file in to a particular area, you can list all the allowed usernames on the require line. However this means you are building username information into your .htaccess files, and might not been convenient if there are a lot of users, and . Fortunately there is a way round this, using a group file. This operates in a similar way to standard Unix groups: any particular user can be a member of any number of groups. You can then use the require line to restrict users to one or more particular groups. For example, you could create a group called staff containing users who are allowed to access internal pages. To restrict access to just users in the staff group, you would use require group staff Multiple groups can be listed, and require user can also be given, in which case any user in any of the listed groups, or any user listed explicitly, can access the resource. For example require group staff admin require user adminuser which would allow any user in group staff or group admin, or the user adminuser, to access this resource after entering a valid password. A group file consists of lines giving a group name followed by a space-separated list of users in that group. For example: staff:martin jane admin:art adminuser The AuthGroupFile directive is used to tell the server the location of the group file. Note that the maximum line length within the group file in about 8000 characters (actually 8kB). If you have more users in a group than will fit within that line length, you can have more than one line with the same group name within the file. Using htpasswd to create a text list of users, and maintaining a list of groups in a plain text file is relatively easy. However if the number of users becomes large, the server has a lot of processing to do to find a user's group and password details. This processing has to be done for every request inside the protected area (even though the user only enters their password once, the server has to re-authenticate them on every request). This can be slow with a lot of users, and adds to the server load. Much faster access is possible using DBM format files. This allows the server to do a very quick lookup of names, without having to read through a large text file. However managing DBM files is more complex. Apache Week will cover the use of DBM authentication in a future issue. While Apache by default can only access user details in plain text files, various add-on modules are available to allow user details to be stored in databases. Besides DBM format (available with the mod_auth_dbm module), user and group lists can be stored in DB format files (with mod_auth_db). Or full databases can be used, such as mSQL (with mod_auth_msql), Postgres95 (mod_auth_pg95) or any DBI-compatible database (mod_auth_dbi). It is also possible to have an arbitrary external program check whether the given username and password is valid (this could be used to write an interface to check against any other database or authentication service). Modules are also available to check against the system password file, or to use a Kerberos system. See the feature on Adding Modules for more information. In the example .htaccess file above, the require directory is not given inside a <Limit> section. This is valid in Apache, and means it applies to all request methods. In other servers and most example .htaccess files, the require directive is given inside a <Limit> section, such as this: <Limit GET POST PUT> require valid-user </Limit> In Apache it is better to omit the <Limit> and </Limit> lines, to ensure that the protection applies to all methods. However, this format can be used to limit particular methods. For example, to limit just the POST method, use AuthName "restrict posting" AuthType Basic AuthUserFile /usr/local/etc/httpd/users <Limit POST> require group staff </Limit> Now only members of the group staff will be allowed to POST. Other users (unauthenticated) can use other methods, such as GET. This could be used to allow a CGI program o be accessed by anyone, but only authorised uses can POST information to it. It is possible to use both username and hostname restrictions at the same time. Normally Apache will require that both restrictions are satisfied, that is, that the user comes from an allowed host or domain name and that they supply a valid username and password. However the Satisfy any directive can be used in the .htaccess file or <Directory>, <Location> or <Files>, section. When this directive is given, anyone coming from the allowed domains will be given access without having to enter a username and password. All other users (from the "denied" domains) will be prompted for a username and password. The method used in HTTP for user authetication is quite simple. Since HTTP is a stateless protocol - that is, the server does not remember any information about a request once it has finished - the browser needs to resend the username and password on each request. Here is how it works. On the first access to an authenticated resource, the server will return a 401 status ("Unauthorized") and include a WWW-Authenticate response header. This will contain the authentication scheme to use (at the moment, only Basic is allowed) and the realm name. The browser should then ask the user to enter a username and password. It then requests the same resource again, this time including a Authorization header which contains the scheme name ("Basic") and the username and password entered. The server checks the username and password, and if they are valid, returns the page. If the password is not valid for that user, or the user is not allowed access because they are not listed on a require user line or in a suitable group, the server returns a 401 status as before. The browser can then ask the user to retry their username and password. Assuming the username and password was valid, the user might next request another resource which is protected. In this case, the server would respond with a 401 status, and the browser could send the request again with the user and password details. However this would be slow, so instead the browser sends the Authorization header on subsequent requests. Note that the browser must ensure that it only sends the username and password to further requests on the same server (it would be insecure to send those details if the user moved onto a different server). The browser needs to remember the username and password entered, so it can send them with future requests from the same server. Note that this can cause problems when testing authentication, since the browser remembers the first username and password that works. It can be difficult to force the browser to ask for a new username and password. While authentication does allow resources to be restricted to particular users, there are potential security issues. Some of these are: Care must be taken to ensure that the resource is restricted against all methods. Use of <Limit GET>, for instance, leaves POST and other request methods unprotected. The username and password are stored in a plain text file. While the password is encrypted, it is not completely safe against decryption, so the file should not be accessible to other users on the system. More importantly, it should not be placed under the document root where users from other sites could access it. The username and password is as secure as any username/password system, in that end-users should not tells others their password, or write it down, or make it easily guessable. The Basic authentication scheme transmits passwords across the Internet unencrypted, so they could be intercepted. The Digest method, see below, is intended to address this issue. The Digest Authentication scheme will make the sending of passwords across the Internet more secure. It effectively encrypts the password before it is sent such that the server can decrypt it. It works exactly the same as Basic authentication as far as the end-user and server administrator is concerned. The use of Digest authentication will depend on whether browser authors write it into their products. Apache can already do Digest authentication, when compiled with the mod_digest module (supplied with the Apache distribution). For more information about how user authentication works on the Internet, see the HTTP/1.0 and HTTP/1.1 documents, available from the Apache Week links page. Also available there is a link to the draft Digest Authentication specification. For basic information about setting up user authentication, see the NCSA Tutorial (most of which also applies to Apache). For modules which allow usernames, groups and passwords to be stored in database format files, or databases themselves, see this Apache Week feature on Adding Modules. Using Virtual Hosts How to obtain and set up virtual hosts. Feature: Using Virtual Hosts One of the most important facilities in Apache is its ability to run 'Virtual Hosts'. This is now the essential way to run multiple web services - each with different host names and URLs - which appear to be completely separate sites. This is widely used by ISPs, hosting sites and content providers who need to manage multiple sites but do not want to buy a new machine for each one. In this issues we explain how to go about setting up a virtual host on your machine, what you need to do to get the hostname working, and how to configure Apache. There are two types of virtual hosts: IP-based and non-IP-based. The former is where each virtual host has its own IP address. You will need a new IP address for each virtual host you want to set up, either from your existing allocation or by obtaining more from your service provider. Once you have extra IP addresses, you tell your machine to handle them. On some operating systems, you can give a single ethernet interface multiple addresses (typically with an ifconfig alias command). On other systems you will have to have a different physical interface for each IP address (typically by buying extra ethernet cards). IP addresses are a resource that costs money and are increasingly difficult to get hold of, so modern browsers can now also use 'non-IP' virtual hosts. This lets you use the same IP address for multiple host names. When the server receives an incoming Web connection it does not know the hostname what was used in the URL, however the new HTTP/1.1 specification adds a facility where the browser must tell the server the hostname it is using, on the Host: header. If an older browser connects to a non-IP virtual host, it will not send the Host: header, so the server will have to respond with a list of possible virtual hosts. Apache provides some help for configuring a site for both old and new browsers. Having selected an IP address, the next stage is to update the DNS so that browsers can convert the hostname into the right address. The DNS is the system that every machine connected to the Internet uses to find the IP address of host names. If your hostname is not in the DNS, no-one will be able to connect to your server (except by the unfriendly IP address). If the virtual host name you are going to use is under your existing domain, you can just add the record into your own DNS server. If the virtual host name is in someone else's domain, you will need to get them to add it to their DNS server files. In some cases, you will want to use a domain not yet used on the internet, in which case you will have to apply for the domain name from the InterNIC and set up the primary and secondary DNS servers for it, before adding the entry for your virtual host. In any of these cases, the entry you need to add to the DNS is an address record (an A record) pointing at the appropriate IP address. For example, say you want the domain www.my-dom.com to access your host with IP address 10.1.2.3: you will need to add the following line to the DNS zone file for my-dom.com: www A 10.1.2.3 Now users can enter http://www.my-dom.com/ as a URL in their browsers and get to your web server. However it will return the same information as if the machine's original hostname had been used. So the final stage is to tell Apache how to respond differently to the different addresses. Configuring Apache for virtual hosts is a two stage process. Firstly, it needs to be told which IP addresses (and ports) to listen to for incoming web connections. By default Apache listens to port 80 on all IP addresses of the local machine, and this is often sufficient. If you have a more complex requirement, such as listening on various port numbers, or only to specific IP addresses, the BindAddress or Listen directives can be used. Secondly, having accepted an incoming web connection, the server needs to be configured to handle the request differently depending on what virtual host it was addressed to. This usually involves configuring Apache to use a different DocumentRoot. If you are happy for Apache to listen to all local IP addresses on the port specified by the Port directive, you can skip this section. However there are some cases where you will want to use the directives explained here: If you have many IP addresses on the machine but only want to run a web server on some of them If one or more of your virtual hosts is on a different port If you want to run multiple copies of the Apache server serving different virtual hosts There are two ways of telling Apache what addresses and ports to listen two: either you use the BindAddress directive to specify a single address or port, or you use the Listen directive to any number of specific addresses or ports. For example, if you run your main server on IP address 10.1.2.3 port 80, and a virtual host on IP 10.1.2.4 port 8000, you would use: Listen 10.1.2.3:80 Listen 10.1.2.4:8000 Listen and BindAddress are documented on the Apache site. Having got Apache to listen to the appropriate IP addresses and ports, the final stage is to configure the server to behave differently for requests on each of the different addresses. This is done using <VirtualHost> sections in the configuration files, normally in httpd.conf. A typical (but minimal) virtual host configuration looks like this: <VirtualHost 10.1.2.3> DocumentRoot /www/vhost1 ServerName www.my-dom.com </VirtualHost> This should be placed in the httpd.conf file. You would replace the text '10.1.2.3' with one of your virtual host IP addresses. If you want to specify a port as well, follow the IP address with a colon and the port number (eg '10.1.2.4:8000'). If omitted, the port defaults to 80. If no <VirtualHost> sections are given in the configuration files, Apache will treat requests from the different addresses and ports identically. In terms of setting up virtual hosts, we call the default behaviour the 'main server' configuration. Unless overridden by <VirtualHost> sections, the main server behaviour will be inherited by all the virtual hosts. When configuring virtual hosts, you need to decide what changes need to be made in each of the virtual host configurations. Any directives inside a <VirtualHost> section apply to just that virtual host. The directives either override the configuration give in the main server, or supplement it, depending on the directive. For example, the DocumentRoot directive in a <VirtualHost> section overrides the main server's DocumentRoot, while AddType supplements the main server's mime types. Now, when a request arrives, Apache uses the IP address and port it arrived on to find a matching virtual host configuration. If no virtual host matches the address and port, it is handled by the main server configuration. If it does match a virtual host address, Apache will use the configuration of that virtual server to handle the request. For the example above, the server configuration used will be the same as the main server, except that the DocumentRoot will be /www/vhost1, and the ServerName will by www.my-dom.com. Directives commonly set in <VirtualHost> sections are DocumentRoot, ServerName, ErrorLog and TransferLog. Directives that deal with handling requests and resources are valid inside <VirtualHost> sections. However some directives are not valid inside <VirtualHost> sections, including BindAddress, StartSevers, Listen, Group and User. You can have as many <VirtualHost> sections as you want. You can choose to leave one or more of your virtual hosts being handled by the main server, or have a <VirtualHost> for every available address and port, and leave the main server with no requests to handle. Non-IP virtual hosts are configured in a very similar way. The IP address that the requests will arrive on is given in the <VirtualHost> directive, and the host name is put in the ServerName directive. The difference is that there will (usually) be more than one <VirtualHost> section handling the same IP address. In order for Apache to know whether a request arriving on a particular IP address is supposed to be a name-based requests, the NameVirtualHost directive is used to tell Apache the IP addresses for name-based requests. A virtual host can handle more than one non-IP hostname by using the ServerAlias directive, in addition to the ServerName. Apache and the Year 2000 How will Apache and the web in general cope with the year 2000? The Year 2000 Problem The year 2000 is predicted to bring chaos to software which is unable to handle dates beyond 1999. The question is what effect the change of century will have on the Internet, Web and Apache in particular. This feature shows what the risks are. First published in Apache Week issue 56 (23 June 1997). The theory of the year 2000 problem is that many older programs use only two digits for the date, such as "97" or "06". This might be part of the internal storage, input fields, output display, or network communcation protocol. If a program does use a two digit date, it might either not accept year 2000 dates such as "02", or it might make incorrect comparisons (thinking that 02 is earlier than 97, because it assumes that 02 is 1902). There are some areas where two digit years are widely used - for example, on credit card expiry dates - and the software which handles these dates will have to be capable of knowing that smaller values for the date are really in the 21st century. There are three things which can affect how Apache treats year 2000 issues: Apache code itself The HTTP and other protocols that Apache implements The underlying operating system The Apache code internally never stores years as two digits - it processes dates and times as standard Unix time epochs (the number of seconds since 1st January 1970). When it outputs the year (e.g. to the log file) it writes years as four digits. The HTTP protocol may be more troublesome. It allow for three different date formats in requests and responses, one of which uses a two-digit year. Dates are used on every response, in fields such as "Date", "Last-Modified" and "Expires", and requests can contain dates in the "If-Modified-Since" and similar fields. The date formats listed in HTTP/1.1 and HTTP/1.0 are: Sun, 06 Nov 1994 08:49:37 GMT (defined in RFC 822 as updated by RFC 1123) Sunday, 06-Nov-94 08:49:37 GMT (defined in RFC 850 and RFC 1036) Sun Nov 6 08:49:37 1994 (as defined in ANSI C's asctime() format) The first format is the only one that HTTP/1.1 servers are allowed to generate, and Apache uses it. This format includes a four-digit date. However to be compatible with older browsers and servers, Apache recognizes the other formats. The main problem will be older applications which generate RFC850 format dates - these only have a two digit date field. RFC850 format was used in early web servers and browsers, and the replacement with RFC1132 format in in early 1990's was not fully documented until HTTP/1.0 was published in 1996. However if Apache sees this format and the year is before 1970, it assumes that the first two digits of the four digit year are "20" rather than "19". The final area which affects Apache's ability to handle dates is the underlying operating system. If the OS has problems with dates past year 2000, Apache will as well. Most Unix systems store dates internally as 32 bit integers which contain the number of seconds since 1st January 1970. This allows dates up to the year 2038 to be stored. For dates past 2038, the OS will have to be updated to store dates in larger fields (for example, as a 64 bit value). There may also be problems before 2038 with OS calls which accept or return year numbers. For example, many date functions use a structure called tm which contains a field tm_year. This field holds the number of years since 1900, so for example the year 2002 will be stored as 102. This should not be a problem, provided that the OS and applications do not assume that the tm_year value is always a two-digit year between 1900 and 1999. All modern operating systems should be ok. Binary file (standard input) matches Apache in the News 2000 All the important news stories about Apache from the year 2000 Apache in the News 2000 Since becoming the #1 Web server, Apache has featured in a number of reviews and articles. Here are the ones for the year 2000 If you have seen a story about Apache on the Web or in the press let us know so that we can include it here. InfoWorld.com, "Brian Behlendorf: Apache co-founder talks about open source" "the fact that we don't have a multibillion-dollar marketing organization means that, sure, Microsoft is going to be able to claim things or do things that we can't, but that hasn't hurt us so far." InfoWorld.com, "Apache founders hit Vegas in search of cash" "Behlendorf said the ASF may need to look for a little cash to keep up with the demands that developing the leading Web server requires" Apache Week, "Report from ApacheCon Europe 2000" " As in all conferences, there were various technical glitches when presentation laptops froze and batteries ran out, some inexperienced speakers, and not enough seats but these were all minor issues considering the excellent detailed technical knowledge that was imparted by the speakers." Apache Today, "Apache Guide: ApacheCon Europe" "Last week, I was in London for ApacheCon 2000. In a break from my usual subjects, this will be a brief overview of the conference, touching on the highlights and some of the things that were talked about there." NetworkWorldFusion, "Tips on pitching Apache to the big wigs" "Apache cares about trademarks and it's helped us maintain a pretty good product," Behlendorf said. NetworkWorldFusion, "IBM pitches its open source side" "IBM Tuesday set out its open source agenda at ApacheCon Europe 2000. The message seemed to boil down to the notion that in a networked world, open source is good and IBM not only knows that but embraces the open-source programming community." NetworkWorldFusion, "Sun says Java moving towards full open source" "Sun is moving toward making its Java technology fully open source, a company executive said Tuesday, addressing an audience of programmers here at the ApacheCon Europe 2000." Network Computing, "The 10 Most Important Products of the Decade" "...Apache Web Server earns its place for changing the rules on the server side. The future of Apache hinges on its ability to function as an e-commerce server. If the past five years are any indication, Apache Web Server will deliver the whole shopping cart--and probably sooner than its competitors do." InfoWorld, "E-business innovators" "By general acclaim, it has done more to stimulate Web development -- and therefore e-commerce -- than any other Web-based server." Edd Dumbill's Weblog (O'Reilly) , "Dynamics of the Apache XML Project" Edd Dumbill, editor of XML.com, writes about the "Dynamics of the Apache Group" in his Weblog. The focus of the article is on news that the Apache XML project could create another parser and looks at the the internal dynamics of the group members and some of the conflicts. "IBM and Lotus in particular are responsible for the XML parser, Xerces, and the XSLT processor, Xalan. Sun also play a significant part in Apache's Java projects. Though nobody has suggested that Apache is in any way in the sway of these organizations as a consequence of their donations, it seems inevitable that the corporate and hacker cultures may well clash. This weekend seems a good example of this." Qube Corner, "AOLserver faster than Apache?" Qube Quorner reveal that Apache 1.3.12 comes second to AOLserver 3.0 in terms of requests/second and transfer speeds. Benchmarks do not give a true picture of the speed of a web server, since they provide an environment unlike the real use of the software. Commercial software is often tuned to perform well in benchmarks, so a good performance simply indicates that the software works well for that benchmark, not that it has good real-world performance. News Alert, "US Toyota and Lexus dealers adopt Apache technology" Over the last week, there have been a large number of stories about Internet Appliances for both home and business use. An increasing number of these units are now being run on open source platforms such as Linux. Dell have announced that Toyota in the US are to be equipped with Dell PowerApp.web servers to provide customised content to their dealer network. C|Net News.com, "IBM donates Net communications technology " As reported by C|Net, the Apache Software Foundation has received technology from IBM which will help developers create services using an open, vendor-neutral process. IBM's Java-built Simple Object Access Protocol (SOAP) will be contributed to the open source Apache XML project. The system provides a simple method of using XML to send message and access web services across distributed networks. "We want to move at Internet speed and respond to the needs of the developer community by making it available to the open-source community," said Marie Wieck, IBM's director of e-markets infrastructure. "It's valuable to further adoption." CNet Investor, "Apache Software Foundation join Java commmittee" CNet Investor reported that Sun Microsystems have set up two executive committees to oversee their Java Community Process(SM) community-based Java technology development programmes. The first committee will oversee the Java technologies for the desktop/server space and the other will oversee the Java technologies for the consumer/embedded space. "As is evident by the depth, diversity and strength of the JCP program's Executive Committee members, the future of Java technology specifications is in capable and caring hands," said George Paolini, vice president of Java Community Development at Sun Microsystems, Inc. ZD Net, "Red Hat Leads The Way To IA-64 Itanium Linux" Red Hat Inc. this week released public alpha code of a full version of Linux for Intel's new IA-64 Itanium processor. The release of the software combined with the release of Intel's "Itanium Processor Microarchitecture Reference" gives developers access to all the information they need to start working on Itanium development. "On May 17, Red Hat Inc. released an alpha version of a complete IA-64 Linux distribution to developers. This edition, built within the Trillian Project, is the first alpha public code release of a full IA-64 Linux from kernel to drivers to such popular applications as Apache." ZDNet, "Picking The Right Web Server Is Key" ZDNet examine web server platforms in their article, "Picking the Right Server is Key". They compare Windows 2000 Advanced Server, Netware 5.1, Red Hat Linux using Apache, Solaris using iPlanet, and Solaris using Apache. "There are other compelling reasons to choose Linux/Apache. For one thing, you'll never find a back door, as with the recent IIS debacle, in open-source code. And it's getting so easy to install that the hardcore Linux gurus are grumbling about dumbing down." SecuritySpace.com, "April Web Server Survey" If you are a regular reader of Apache Week you'll know that Apache has been the top web server in all the probe-based web surveys for some time, now with over 60% market share. The April survey from E-Soft also gives some other interesting statistics for modules in use; the most popular being the PHP scripting language in use on 29% of Apache sites. "The Apache module report documents the market share of Apache, internet's most popular web server, for a variety of add-on modules. Since most add on modules modify the web server "signature" that is returned on each web page, we are able to see who's using PHP, perl, SSL mods, language converters, language mods, etc." Userland, "Scripting News / Manila" UserLand hosts an interesting open forum about commercial software, which originally started as an email discussion between Dave Winer and Brian Behlendorf. In Dave's own comments he picks out some of the discussion and his own point of view, accusing Apache of being boring. "Apache is like MS-DOS. Lots of people use it, we do too. But where's the Lotus 1-2-3? Apache is boring! Where's the revolution for writers and thinkers?" Linux Today, "VNU Net: Apache Server Commentary [Book Review]" A short review of the new book "Apache Server Commentary" is available. The book is aimed at developers and contains source code listings of the Apache server. "This is one in a series of books which sets out to give an insight into the various Open Source products currently on the market. It is aimed at those who either want to write extension modules to Apache or customise the underlying code. In fact, Apache Server Commentary appears to be little more than a reference guide for those who already understand the concept of Apache and just want help on specific modules. It certainly isn't the architectural document I was expecting." InformationWeek.com, "Open Source Moves To The Mainstream" The article discusses the secure server survey from e-soft which shows Apache with 63% market share but notes that the "battle over E-commerce territory has been a little more difficult for open source, perhaps an indication that security-minded companies prefer to use commercial products". "One of the leading open-source success stories is the Apache Web server, which for many sites is the backbone of Web applications. Apache is a flagship open-source project, continually developed by a self-selected group of coordinated volunteer programmers. It costs nothing to use. As of March, Apache is deployed on more than 7.8 million domains, or some 60% of Internet Web sites." INRIA, "Elliptic Curve Discrete Logarithms: ECC2K-108 - SOLVED!" Apache Week reported in issue 180 on the attempt to solve the Elliptic Curve Challenge from Certicom. The solution was found at the end of March, and the Apache Software Foundation will receive a donation of US$8000 from the prize. "The biggest public-key crypto crack ever has just finished! Certicom have confirmed that the solution is correct." Linux Magazine, "Brian Behlendorf on the Apache name" Linux magazine have an interview with Brian Behlendorf, one of the initial Apache group founders. In addition to talking about the founding and sucess of Apache, Brian explains that the Apache name never meant "A patchy server", instead it "just sort of connoted: 'Take no prisoners. Be kind of aggressive and kick some ass.'" "While there would still be a World Wide Web without the Apache Web server, pundits have suggested that it would belong to Microsoft. Since drawing up the plan for the Apache project in 1993, Apache Software Foundation President Brian Behlendorf has helped lead the volunteer development team that proved that you can take on Microsoft and win -- just so long as you change the rules." Linux Magazine, "A Conversation With the Man Behind the Animal Books" The article discusses the evolving open source industry and pays particular attention to Apache. "I think Apache plays an enormously important role here. Because it has dominant market share, it keeps the Internet open. I think it's more important for Apache to have dominant market share than for Linux. If Linux is dominant too, that's better, but I'd hate to see us lose Apache. That's a really important battleground." ZD Net - EWeek, "Solaris 8 weds reliability to must-have upgrades" PC Week mention Apache being bundled with Solaris in Solaris 8 weds reliability to must-have upgrades. "Apache Web server is also bundled with Solaris 8, but neither PC Week Labs nor Sun recommends its use in high-transaction environments." Slashdot, "Reflections On ApacheCon 2000" ASF member Jim Jagielski gives his personal opinion of ApacheCon 2000 in "Reflections on ApacheCon 2000". "It's been a week now since ApacheCon 2000 ended. There's been some discussion over the events, with the release of Apache 2.0a being the main topic of conversation. But AC2K was more than just the venue that 2.0a was announced. It was an important and noteworthy conference in it's own right." NetWorldFusion, "The Netware Version Of Apache" The NetWare version of Apache is examined in a Network World Fusion Newsletter. Over the past few years Novell have shipped a couple of different Web servers with NetWare, but now Apache is available for this system. "The NetWare version of Apache 1.3 is still in the "experimental" stage, and it (so far) only runs on NetWare 5 or 5.1. Nevertheless, if you support a major Web site and ... if you want to take advantage of the hundreds of Web server applications available (also for free) for Apache - it would be worth your effort to download and test the new Apache in your environment." Apache Week, "Report from ApacheCon 2000" " In total, just over 1000 people attended the conference and this included a large number of Apache Software Foundation members. At the very first session of the conference, the opening plenary, the previous record for the most Apache developers in the same place at the same time was broken." Melbourne Linux Users Group Inc, "ApacheCon 2000" The Melbourne Linux Users Group posted a number of pictures from the conference. "The ApacheCon show was very well done. The exhibit floor featured many cool companies and the keynote and PHP presentations I attended were very informative. Here are some pics of the event." Open Source IT, "The Buzz At Apache Conference: World Domination" ApacheCon 2000 is still in the news as Open Source IT reports on ApacheCon 2000 in "The Buzz at Apache Conference: World Domination". "More than 1,000 Apache developers and users gathered at ApacheCon 2000 in Orlando last week to discuss -- among other things -- the progress the Apache Web server is making towards World Domination." O'Reilly, "ApacheCon 2000: Day One, Day Two, DayThree" O'Reilly published a detailed report on each day of the conference; Wednesday, Thursday, and Friday. "The conference is being held at the Caribe Royale Resort Suites, which despite a strong conference turnout, is mainly inhabited by lots of parents and their young children, due to the proximity to Disney World." LinuxPlanet, "ApacheCon: Fuelling The Web Revolution" The article gives a brief overview of the conference and highlights one of the popular talks on open source from IBM. "ApacheCon is the yearly convention dedicated to Apache and Apache products. There are over 1,000 visitors this year, and the show creators were sitting around saying things to me like, "Wow, this is going so mainstream so fast." God, I hope so. It'd be a terrible thing for something that has captured 60 percent of the Internet Web-server market share to not be mainstream." Wired.com News, "A Patchy Start: Apache's Strong" The article examines why Apache is not as well known as other projects such as Linux and finds that the companies providing support and services based on Apache are not as visible. "Apache is the Web's most widely used and -- outside of the Nerd Zone -- its most unknown application. It has achieved dominance in a crucial market that Microsoft and Netscape have struggled mightily to conquer. Both companies have invested massive amounts of money and programming skills into server software programs -- and yet it's Apache, a freeware application, that is installed on just over half of all publicly accessible Web servers." Security levels Cox, Mark J A quick summary of security levels that Apache Week apply to Apache web server vulnerabilities Security levels Cox, Mark J A quick summary of security levels that Apache Week apply to Apache web server vulnerabilities Apache Week rates the impact of each security flaw that affects the Apache web server. We've chosen a rating scale quite similar to those used by other major vendors in order to be consistent. Basically the goal of the rating system is to answer the question "How worried should I be about this vulnerability?". Note that the rating chosen for each flaw is the worst possible case across all architectures. In the past for example we've had flaws that have a Critical impact on some BSD architectures, whilst no real impact on others. To determine the exact impact of a particular vulnerability on your own systems you will still need to read the security advisories to find out more about the flaw. We use the following descriptions to decide on the impact rating to give each vulnerability: A vulnerability rated with a Critical impact is one which could potentially be exploited by a remote attacker to get Apache to execute arbitrary code (either as the user the server is running as, or root). These are the sorts of vulnerabilities that could be exploited automatically by worms. A vulnerability rated as Important impact is one which could result in the compromise of data or availability of the server. For the Apache web server this includes issues that allow an easy remote denial of service (something that is out of proportion to the attack or with a lasting consequence), access to arbitrary files outside of the document root, or access to files that should be otherwise prevented by limits or authentication. A vulnerability is likely to be rated as Moderate if there is significant mitigation to make the issue less of an impact. This might be because the flaw does not affect likely configurations, or it is a configuration that isn't widely used, or where a remote user must be authenticated in order to exploit the issue. Flaws that allow Apache to serve directory listings instead of index files are included here, as are flaws that might crash an Apache child process in Apache 1.3 All other security flaws are classed as a Low impact. This rating is used for issues that are believed to be extremely hard to exploit, or where an exploit gives minimal consequences. Vendor patches to Apache 1.3 Cox, Mark J We take a peek inside ten popular vendor distributions of Apache 1.3 to find out what has been added Vendor patches to Apache 1.3 Cox, Mark J We take a peek inside ten popular vendor distributions of Apache 1.3 to find out what has been added We decided to take a look at what custom patches vendors add to the versions of Apache 1.3 they ship. The Apache Software Foundation would rather that vendors of Apache didn't add any third-party modifications to Apache at all - it adds to brand confusion. You might think you are getting a copy of the Apache web server but you're actually getting something that is based on the Apache web server. There are hundreds of distributions and hundreds of vendors so in order to make this manageable we started out by looking at just Linux vendors that have publicised security updates for Apache in the first few months of 2003 to the bugtraq mailing list. Where a vendor has multiple versions of products we tried to look at the most recent version of Apache 1.3 (since most vendors do not yet ship Apache 2). Our survey consisted of Conectiva, Debian, EnGarde, Gentoo, Mandrake, OpenPKG, Red Hat, SCO, SuSE, and Trustix. At the time of the survey, not all the Linux vendors were shipping Apache 1.3.27. Several shipped older versions for which they had backported security fixes. Mandrake, Debian, and Conectiva included Apache 1.3.26 with backported patches for , , and . SuSE included Apache 1.3.23 with backported security fixes for only and . SuSE also add a backported patch for mod_proxy () All the vendors shipped with EAPI, the interface that links Apache to mod_ssl, and most bundled some selection of extra modules. All the vendors shipped a custom httpd.conf file or made patches to the default file. Examining the configuration file changes was outside the scope of this survey since these are things that can be easily changed by the user. All the vendors except OpenPKG and SuSE pointed the magic mime types file at the system /etc/mime.types file, with many adding additional types using AddType directives in httpd.conf. SysV init is a standard process used by Linux distributions to control which software the init command launches or shuts off on a given runlevel. These sometime get confused with the apachectl command which provides similar functionality. All the vendors except OpenPKG included custom init scripts or patches with their Apache packages. All the vendors provided patches to help build Apache on their particular Linux distribution and to customise it to their environment. Conectiva, Gentoo, and Mandrake added a serverroot configuration option and then used that to help build Apache. Most vendors patched apxs and changed file and directory locations. Debian, Gentoo, Mandrake, Red Hat, and SuSE added dbm patches to ensure that the files created for dbm-based authentication from Perl tools like dbmmanage are in a format that Apache can understand. Conectiva, Debian, EnGarde, Gentoo, Mandrake, Red Hat, and SCO all included a patch for , a vulnerability in htpasswd and htdigest that could allow local users to overwrite arbitrary files via a symlink attack. This vulnerability is not yet fixed in Apache, as it's tricky to get right cross-platform. The vendors patching this themselves only have to worry about the Linux architecture so can add a specific fix. Altering the server version string can help users determine that they are running a vendor-modified version of Apache. It can also help the vendor track market share through surveys like those from Netcraft. Four of the distributions had patches to make sure that they added a customised string to the server version string. These distributions were quite well behaved and did not add their customised string if the ServerTokens directive is set to 'product only' or 'minimum'. Debian GNU/arch (Gentoo/Linux) (Red-Hat/Linux) (Trustix Secure Linux/Linux) Conectiva and SCO were a little more invasive, with Conectiva adding (Conectiva/Linux) to the server version string no matter what the ServerTokens directive was set to. SCO did a similar thing, with their extra string giving the version of an acceleration patch they add. Finally, Mandrake changed the base product name altogether, renaming from Apache to Apache-AdvancedExtranetServer. In Apache 1.3, a compile-time constant defines the maximum possible number of server processes, defaulting to 256. Only three vendors changed this default: Debian set it to 512 processes via a build-time define, EnGarde patch it to 1024, and SuSE set it to 2048 via a define. Debian, Mandrake, SuSE, and SCO build Apache with Large File support, so that on 32-bit systems Apache can use files larger than 2 gigabytes - this is particularly useful for log files. Enabling LFS does slightly change the Apache 1.3 binary module ABI, which can cause problems if using binary modules built against a different version of Apache. After taking account of all the patches and modifications above, we're left with only four vendors that add additional patches. SuSE added: A patch to change the ap_set_content_length API function to accept a length of type off_t instead of long, to improve the support for Large Files mentioned above. Gentoo added: A patch to make the regexp library work with Large File Support on 32-bit systems. This is a modification the affects the ABI. A patch to fix a segmentation fault when using a custom response in a module, () A patch to fix a problem when using server-parsed HTML with suexec where an <--#exec tag with a cmd attribute contains more than one word. (Debian bug 47951) A patch to allow SSL environment variables to be accessible when using mod_ssl and suExec. (similar to ) A patch to cause Apache to not run if user or group directives are found within a VirtualHost but suExec is not configured correctly. (Debian bug 21525) Debian added the same patches as Gentoo and additionally: A fix for a htdigest buffer overflow if arguments passed to it are too long. This is only a security issue if htdigest is used setuid Changes to ApacheBench to support round-robin DNS SCO added: A patch to mod_proxy needed for mod_backhand A patch to add a new API function, ap_call_execute, needed by the old mod-frontpage-VR module the "Accelerating Apache" performance patches from SGI. The "Accelerating Apache" performance patches were first submitted to the Apache Group by SGI in 1999. We reported that they were designed to improve the performance of Apache when measured specifically by the SPECweb96 benchmark. The patches were named after the ten fold increase in speed they gave over regular Apache on a dual processor SGI IRIX machine. Some of the patches were folded in to Apache in 2000, but other parts were rejected by the Apache developers. The Accelerating Apache project was dropped by SGI in February 2001. In March 2003 a vulnerability was found in the Oracle modifications to mod_dav. This was not the first security hole that has been introduced by third party modifications to Apache by vendors. However our own research based on issues listed in the CVE dictionary shows that the majority of these vulnerabilities are due to poor configuration defaults rather than patches for new functionality that went wrong: CVE Type of Issue Severity Affected Remote attacker can run arbitrary commands High Oracle Remote attacker can run arbitrary commands High SCO (briefly) Remote attacker can run arbitrary commands High IBM Remote attacker can see files in /usr/doc Low SuSE Linux Remote attacker can see files in /perl Medium Mandrake Linux Remote attacker can read and write any file in docroot High SuSE Linux Remote attacker can obtain the source to CGI scripts Medium SuSE Linux Remote attacker can read .htaccess files Medium Cobalt Remote attacker can see files in /usr/doc Low Debian Linux What we found in our survey was that no two of the ten vendors were alike; some vendors like OpenPKG made only the expected build and configuration changes, whilst others made fairly substantial changes including affecting the ABI. ABI changes mean that you can't reliably take a module precompiled for one distribution and start using it on another. Third party modifications to Apache have been known to cause bugs and security issues. This is often frustrating for the Apache Software Foundation who end up receiving all the bug reports for issues that don't even exist in the official Apache releases. This is one of the reasons why the Apache Software Foundation insists that when vendors make modifications to Apache that they change the name of their version so it is not confused with official Apache releases. One thing that impressed us was how easy it was to identify the changes that the vendors had made. In almost all cases the vendor's source package contained a pristine copy of Apache along with one or more patch files for the various changes. Working out what those changes did and where they came from was another issue though, vendors could do a much better job of labelling the origin of, and reason for, each of the patches they make. Apache 2.0.44 Released Orton, Joe Apache 2.0.44 was released on the 21st January 2003. This release addresses recent security issues in Apache 2.0.43 Apache 2.0.44 Released Orton, Joe Apache 2.0.44 was released on the 21st January 2003. This release addresses recent security issues in Apache 2.0.43 Apache 2.0.44 was released on 21st January 2003 and is now the latest version of the Apache 2.0 server. The previous release was 2.0.43, released on the 3rd October 2002. See what was new in Apache 2.0.43. Apache 2.0.44 is available for download. This is a security, bug fix and minor upgrade release. Due to security issues, any sites using versions prior to Apache 2.0.44 on Windows should upgrade to Apache 2.0.44. Read more about the other security issues that affect Apache 2.0. Apache was vulnerable to a denial of service attack via a request for MS-DOS device name on Windows 9x and Me. Apache allowed arbitrary code execution via crafted POST request containing MS-DOS device name on Windows 9x and Me. Apache could be forced to serve unexpected files on Windows platforms by appending illegal characters such as '<' to the request URL. The following bugs were found in Apache 2.0.43 and have been fixed in Apache 2.0.44: Allow escaping % sign in CustomLog format strings mod_setenvif: fix BrowserMatchNoCase for non-regex patterns. Return appropriate MIME response headers for negotiated responses from a body embedded in a type-map Prevent 416 "Range not satisfiable" response in place of a redirect Prevent files being left open for the duration of a keepalive connection, which could cause a "Too many open files" error mod_ssl: several fixes for memory handling and leaks mod_proxy: fix invalid Content-Length from pages fetched during server-side include processing. LDAP modules: ensure correct load order in httpd.conf (); fix compatibility with Netscape LDAP libraries; fix Win32 build mod_deflate: fix a memory leak when compressing dynamic content; always emit Vary headers mod_isapi: fix several compatibility problems (, ), and fix bug which caused invalid responses or log entries () CGI modules: fix streaming output from "nph-" scripts, for example CGI::IRC (); fix construction of command line from query strings (), handle environment variables which contain newlines in mod_cgid (); terminate CGI scripts when connection is dropped () Caching modules: many bug fixes (including ), and an HTTP compliance fix () Add an --enable-v4-mapped configure option to allow or disallow connections from IPv4-mapped addresses to IPv6 addresses, on applicable platforms (, ) Add IndexOptions IgnoreCase option to mod_autoindex () Add EnableSendfile directive to disable use of sendfile() when necessary (for instance when serving an NFS share) Add ProxyBadHeader directive to dictate handling of invalid HTTP responses headers Add SERVER_ADDR keyword to mod_setenvif, to represent the server IP address for a particular request Performance improvements Add -S command-line option to httpd, equivalent to -t -DDUMP_VHOSTS Apache Related Links This document contains a set of pointers of interest to people using or developing with Apache. From here, you can link to all the relevant standard definitions, documentation on most aspects of using Apache, module information, and even some links to how Apache is reported by the media. Organisations W3C who maintain W3 standards development Apache project page Document Access HTTP is the protocol for transfering Web pages. Current version is 1.1, which is now an RFC on the standards track. It replaces the widely implemented 1.0. Note: this is not related to Apache version numbers! HTTP 0.9 (of historical interest only) HTTP 1.0 [RFC1945] (or in HTML PS format) HTTP 1.1 [RFC2616] Use and interpretation of HTTP version numbers [RFC2145] Basic and Digest Access Authentication [RFC2617] PEP: an Extension Mechanism for HTTP [Internet Draft] Transparent Content Negotiation [RFC2295] and Remote Variant Selection Algorithm 1.0 [RFC2296] See also: other HTTP Internet drafts, the W3C HTTP specifications Uniform Resource Identifiers or Names (URI, URN) are the generic names for Uniform Resource Locators (URLs), used to identify resources on the WWW and Internet. Uniform Resource Identifiers (URI): Generic Syntax [RFC2396] A Trivial Convention for using HTTP in URN Resolution [RFC2169] URN Syntax [RFC2141] Uniform Resource Locators [RFC1738] Relative Uniform Resource Locators [RFC1808] Cookies let you maintain state with the client, or track 'clickstreams'. HTTP State Management Mechanism [RFC2109] Internet Draft intended to replace RFC2109 Netscape's Original Cookie specification (no longer available) Content Hypertext Markup Language is the protocol used to design Web hypertext pages. Current widely used version is 2.0, often with extensions. Version 3.2 summarises the current practise. HTML 3.2 W3C Reference Specification, more information HTML 4.0 more information Cascading Style Sheets (CSS) W3C Recommendation, more information. Internationalization of HTML [RFC2070] Hypertext Markup Language 2.0 [RFC1866] HTML Tables [RFC1942 experimental] Netscape extensions to HTML 2.0 and HTML 3.0 Microsoft HTML, DHTML and CSS information See also: HTML Internet drafts, W3C HTML specifications CGI is the common gateway interface, which specifies how web servers can call external applications (scripts, programs or other gateways). CGI information and tutorials (NCSA) CGI specification CGI provides a simple way of running programs on the server when a request is received. However they can be inefficient because they need to be started each time a request is made. There are various ways of creating more efficient dynamic responses. JServ module for Java programs mod_perl for efficient Perl scripts and modules FastCGI: a faster version of CGI (Apache module available) Server-Side Includes are a way of writing commands into normal HTML files. When the HTML file is served to the user, the SSI commands are parsed and executed. Apache implements standard SSI, or you can use an alternate module for more advanced SSI implementations Using Server Side Includes Apache Week feature Dynamic Page Langaues Apache Week feature Apache SSI commands NCSA tutorial PHP: a full programming language, available as CGI or Apache module NeoScript: scripting language module Meta-HTML: scripting language CGI ePerl: CGI which allows perl to be embedded into HTML Imagemaps come in several flavours: old-style NCSA cgi-bin program, new Apache imagemap module and client-side imagemaps. Using Imagemaps Apache Week feature Apache imagemap module NCSA imagemap cgi-bin program: NCSA imagemap tutorial Client-side imagemaps [RFC1980] RFC1766 Language Tags (Specification of tags to identify content language) RFC1700 IANA Assigned Numbers (IANA allocates MIME types, character set identifiers) RFC2279 UTF-8, a transformation format of ISO 10646 (An expanded character set compatible with US-ASCII) RFC2046 MIME Media Types (MIME types are used to identify content type) RFC2083 PNG Specification (A portable, lossless, compressed format for graphics) All RFCs Inclusion of a link from this document to an external site does not imply endorsement by Apache Week or Red Hat, who cannot be held responsible for the contents of the remote site. Lists of resources may not be exhaustive. ApacheCon 2002 Las Vegas Weinstein, Paul Paul Weinstein visited the Las Vegas ApacheCon in November 2002 and gives his highlights of the interesting news and events ApacheCon 2002: Day 2 Paul Weinstein visited the four day Apache conference in Las Vegas in November and gives his highlights. The first day of the conference was taken up by tutorials, the presentations started on the second day Some 500 miles and 19 months after the last conference on the state of the world for Apache, developers and users gathered in Las Vegas to converse again about the world's most popular web server. After a day of tutorials, Coar, Ken, Apache Software Foundation member and Conference Chair introduced this year's conference to the over 300 attendees. The conference included 60 presentations, 16 Birds of a Feather, 3 keynotes, and free access to the Comdex convention floor. After a brief break, Ken Coar introduced Tim O'Reilly, Founder and President of O'Reilly and Associates and his topic "Watching the Alpha Geeks." O'Reilly opened with a quote from Sci-Fi writer William Gibson, "The future is here, it's just not evenly distributed yet." saying that Gibson describes exactly how one can understand the ever evolving world of computer technology. O'Reilly's premise is that the evolution of technology follows a simple pattern that can be seen with the adoption and evolution of the personal computer: Hackers such as those who formed the famous Homebrew Computer Club started tinkering and developing computers for personal use as they pushed the technological envelope; These explorations evolved into businesses such as Apple and Microsoft as entrepreneurs start to make the new technology easier for ordinary users; As dominant players emerge that integrate the new technology into a platform such as the Wintel platform where barriers can be raised to keep other entrepreneurs from integrating into the new platform or a healthy ecosystem of corporations can evolve to help the new platform develop; And finally the hackers and entrepreneurs turn their attention to new areas, looking for new frontiers such as that of the Internet and its growth into a new computing platform. O'Reilly moved on to what he sees going on now within the world of hackers and the next group of entrepreneurs, with the growing world of wireless networks, web services and the open source world. So why then have companies struggled with trying to bring the wireless world to the public or struggled to build a model around open source software? Because according to O'Reilly, these companies are still trapped thinking in the old model of cheap hardware and proprietary software that defined the growth of the PC world, and that just as companies such as IBM had to shift from their world of mainframes and other proprietary hardware, the business leaders of today need to change their point of reference in order to fare better in these new, emerging worlds. But most importantly, O'Reilly noted, was that the programmers who build these new technologies, define these emerging technologies, are designing the architecture of the next iteration of the computing world. This, O'Reilly feels, is where the world of Apache can help: by showing what models work in the evolving computer industry, that of adhering to standards, of building a small, but robust application with a modular design. In other words, what the hackers and programmers have succeeded in doing with the Apache server, related projects and how it is done, shows exactly what can and does work in the technological world of tomorrow. The schedule of sessions about Apache on Tuesday included a talk by Mark Cox on Revealing Apache Security Secrets, Jim Jagielski's talk on Migrating to Apache 2.0, a presentation on the new Proxy module for Apache 2.0 by Graham Leggett, along with Theo Schlossnagle and George Schlossnagle who put together a session on deploying scaleable network architectures. The evening ended with a welcoming reception giving food and drinks for attendees to enjoy while they socialized and viewed the exhibit floor. ApacheCon 2002: Day 3 Paul Weinstein visited the four day Apache conference in Las Vegas in November and gave his highlights from the third day of the conference. Wednesday's late morning keynote featured John Fowler, CTO of Software for Sun whose speech "Sun and Open Source: A Bright Future" allowed Fowler to discussed Sun's commitment to Open Standards and the Open Source community. Fowler noted that since Sun's founding over two decades ago, the use of open standards and community participation has been of major importance. Fowler believes that since the founding of Sun there has been an overall shift within the computer industry from developing and selling new technology to that of building solutions that implement open standards. This shift is allowing technology that might originate from competing vendors to work together, providing an overall solution a customer can use, instead of having various vendor components that might solve one problem or another, but overall don't communicate or work together. Moreover, Fowler believes that the Apache project is a prefect example of open standards at work since the server is widely used and of such a benefit because of what standards it implements and how it handles those implementations. In relation to the open source community at large, Fowler noted the major contributions Sun has made not only to Apache and related projects such as Tomcat, but also in non-Apache related projects such as the Gnome desktop and OpenOffice.org. Fowler feels that the work Sun has done with projects such as Apache have fundamentally changed how Sun operates, noting that open source communities can magnify the impact of a software project, not just in how many developers contribute or what is contributed but also in actual deployment of a project's technical solutions, because of the overall openness of the community. A number of large and small companies shared their unique view of Apache and the open source world on the expo floor during the three days of talks. AMD and Covalent took the most advantage of the conference by announcing a co-development project that includes Red Hat to port the Apache code base from the 32-bit architecture that allows it to run on the most commonly found x86 microprocessors to the 64-bit architecture that AMD is developing for its Opteron line of processors To help highlight John Fowler's speech the Sun booth was dedicated to the various open source projects, both Apache and non-Apache as well as exhibiting the versatility of it's Java programming language again in conjunction with the Apache server as well as on its own. Apple highlighted its Apple Developer Connection, which assists developers in deploying desktop and server systems based on Apple's Macintosh OS X platform. Apple of course has a number of web and network related tools available and includes the Apache Web Server by default in both the desktop and server versions of OS X. Sams Publishing and BreakPoint Books were on hand to sell Apache and other web related books for the conference attendees. The books available covered just about any subject, from basic CGI programming to Java Servlets to Apache 2.0. A few other retail vendors filled out the low key expo floor including Daemon News which was featuring BSD Mall and Hackerthreads.com. Wednesday, the busiest of the three days, brought Derek Ferguson's talk on Integrating Apache with Microsoft's .Net and a session on the next version of the XML parser Xerces given by Andy Clark. The afternoon sessions included George Schlossnagle's discussion about how to get the best performance from PHP, a talk by Gerald Richter on Embperl as well as talk by me, Paul Weinstein, on how to use and run a private certificate authority for authentication with Apache. ApacheCon 2002: Day 4 Paul Weinstein visited the four day Apache conference in Las Vegas in November and gave his highlights from the final day of the conference. Thursday, the final day for ApacheCon featured a keynote from Richard Thieme whose speech, "New Ways of Thinking About Security: Open Source Thinking in a Bunged-up World" picked up where Tim O'Reilly left off by reiterating the idea that open source is more than just about code, but in reality is a way of living and thinking. This open source way of thinking is at its fundamental level based on the methods of communication that are commonly used within open source projects. Thieme also noted that, these projects and more importantly those that contribute and use open source technology, have become fluid individuals who's own identity is more modular, less ridged than of past generations, primarily because of the modular, distributed communication systems that are now are commonly used. Just as O'Reilly sees his 'Alpha Geeks' as the early adaptors of technology, Thieme sees these early adaptors of open source and the open source ethic as a new social network emerging from preexisting boundaries. Because of this, Thieme thinks that security issues from around the world need to be seen in this new distributed world view. He noted that ApacheCon was indeed about a community coming together in a physical location, but really is about sharing secrets and how the Apache community shares its secrets, or chooses not to, can help those who are charged with building the next generation of security policies and laws. In other words issues of security, privacy and even intellectual property need to be built based on these new emerging communities and boundaries, thus being beneficial instead of building policies and laws that enforce old political and social boundaries that no longer make sense in the new world based on modular, world of networked communities. Presentations on Apache for Thursday included Greg Stein's session introducing WebDAV and Apache as well as Rob McCool's presentation on the Stanford University's project to deploy machine readable content on the web. Mads Toftum's session on doing URL manipulation using mod_rewrite, Mark Wilcox's session on implementing LDAP along with presentations on data management in Apache 2.0 by Cliff Woolley and performance turning Apache by Thomas Wouters helped round out afternoon. No doubt the highlight for many at this year's ApacheCon attendees was the Closing Session where Ken Coar raffled off a number of goodies supplied by the conference vendors including books, AMD processors and other wonderful swag. But most importantly to those in attendance and to the Apache community at large came the announcement that 2003 will see two ApacheCon conferences, the return of ApacheCon Europe which will occur in the spring at a location yet to be determined and ApacheCon US which will return to Las Vegas in November. Overall most attendees seemed impressed with the return of ApacheCon. While the production of the event was modest compared to previous conferences the quality of the presenters and the presentations where of the same high quality one would expect. Indeed, with so many interesting talks it was easy to find people cutting out of one presentation to hear the end of another and this report only mentions the more typical Apache topics available for attendees. Most importantly, ApacheCon has shown that it is still The Apache Event for Apache developers and users to come together and discuss everyones favorite web server. Photos from ApacheCon 2002 Apache 2.0.43 Released Orton, Joe Apache 2.0.43 was released on the 3rd October 2002. This release addresses recent security issues on non-Unix platforms, some minor bugs found in the 2.0.40 release, and adds some new features. Apache 2.0.43 Released Orton, Joe Apache 2.0.43 was released on the 3rd October 2002. This release addresses recent security issues on non-Unix platforms, some minor bugs found in the 2.0.40 release, and adds some new features. Apache 2.0.43 was released on 3rd October 2002 and is now the latest version of the Apache 2.0 server. The previous release was 2.0.42, released on the 24th September 2002. See what was new in Apache 2.0.42. Apache 2.0.43 is available in source form for compiling on Unix or Windows, for download from the main Apache site or from any mirror download site. This is a security, bug fix and minor upgrade release. Due to security issues, any sites using versions prior to Apache 2.0.43 should upgrade to Apache 2.0.43. Read more about the other security issues that affect Apache 2.0. Fix the security vulnerability regarding a cross-site scripting vulnerability in the default error page when using wildcard DNS. Fix the exposure of CGI source when a POST request is sent to a location where both DAV and CGI are enabled. Fix the security vulnerability regarding some possible overflows in ab.c which could be exploited by a malicious server. The following bugs were found in Apache 2.0.42 and have been fixed in Apache 2.0.43: The UserDir directive has been fixed to again take a list of user names to enable userdir access for, as per 1.3. Flushing behaviour has been improved, to ensure that available response output is flushed when no new output is pending; helping streaming CGIs and other dynamically-generated content mod_auth_ldap has been fixed to retry connections to the LDAP server if it becomes unavailable. Fix for a locking problem in mod_ssl's session cache code which could cause infinite loops on some platforms Fixes for mod_cache to prevent a segfault when attempting to cache some combinations of content (for instance, when using SSI tags which execute CGI scripts), and to correct the CacheMaxStreamingBuffer directive for virtual hosts The default server root directory in suexec has been fixed to match the default install root mod_proxy was fixed to not strip WWW-Authenticate headers on 4xx error responses which prevented server authentication to be performed via the proxy A new module, mod_logio, has been added which allows logging of the number of bytes sent and received by the server. A -p option has been added to apxs to allow programs to be be compiled using this tool. Apache 2.0.40 Released Cox, Mark J Apache 2.0.40 was released on the 9th August 2002. This release addresses recent major security issues on non-Unix platforms, some minor bugs found in the 2.0.39 release, and adds some new features. Apache 2.0.40 Released Cox, Mark J Apache 2.0.40 was released on the 9th August 2002. This release addresses recent major security issues on non-Unix platforms, some minor bugs found in the 2.0.39 release, and adds some new features. Apache 2.0.40 was released on 9th August 2002 and is now the latest version of the Apache server. This is the fourth stable release of Apache 2.0, following up on 2.0.39 which was released on 18th June 2002. Read our special feature for more information about the history of Apache 2.0. Apache 2.0.40 is available in source form for compiling on Unix or Windows, for download from the main Apache site or from any mirror download site. This is a security, bug fix and minor upgrade release. Due to security issues, any sites using versions of Apache 2 on Unix prior to Apache 2.0.39 should upgrade to Apache 2.0.40. Sites using any versions of Apache 2 on other platforms should upgrade to 2.0.40. Certain URIs will bypass security and allow users to invoke or access any file depending on the system configuration. () A path-revealing exposure is present in multiview type map negotiation (such as the default error documents) where a module would report the full path of the typemapped .var file when multiple documents or no documents could be served. () A path-revealing exposure in cgi/cgid when Apache fails to invoke a script. The modules would report "couldn't create child process /path-to-script/script.pl" revealing the full path of the script. () The new features in this release (added since 2.0.39) are: mod_rewrite can now set cookies using the CO extension Performance improvements for the code that reads request headers Proxy FTP now works over IPv6 Changes to the internationalized error documents; they are no longer included by default in the sample configuration file. Add a new directive, MaxMemFree. MaxMemFree makes it possible to configure the maximum amount of memory a particular childs allocator will hold on to for reuse. This directive is useful when uncommon large peaks occur in memory usage. Support the -w flag on to keep the Win32 console open on error Add the ability to enable or disable a filter via an environment variable. Apache on Netware will now pull requests off of the listen queue as fast as winsock will allow without latency introduced by the accept mutex During installation Apache will preserve existing installation directories. Binaries, the build directory, the headers, and the man pages are all copied. Everything else, the config, htdocs, manual, error, icons, and cgi directories are not installed if the directories already exist The bugs fixed in this release include: Fix a long-standing bug in 2.0, CGI scripts were being called with relative paths instead of absolute paths. Apache 1.3 used absolute paths for everything except for SuExec, this brings back that standard Restore the ability to specify host names on Listen directives. Accept multiple leading /'s for requests within the DocumentRoot. Fixed a mod_include error case in which no HTTP response was sent to the client if an shtml document contained an unterminated SSI directive Prevent infinite recursion if an ErrorDocument gets an error Fix segfault in mod_mem_cache most frequently observed when serving the same file to multiple clients on an multi-processor machine Various fixes to the experimental module mod_ext_filter including: Look in the main server for filter definitions when running in a vhost if the filter definition is not found in the vhost, . Fix a segmentation fault if the content-type was not set, , and ignore any content-type parameters when checking if the response should be filtered. Fix infinite loop due to two HTTP_IN filters being present for internally redirected requests. Fixed the Content-Length filter so that HTTP/1.0 requests to CGI scripts would not result in a truncated response. Fix proxy so that it is possible to access ftp: URLs via a proxy chain. Fix perchild to work with apachectl by adding -k support to perchild. Fix the long-standing bug in ab where ab -t10 would loop for 10000 seconds instead of 10 as documented. Also fix an off-by-one-second error Fixed parsing of strings to longs which allows HTTPD to deal with larger files correctly mod_deflate now checks to make sure that 'gzip-only-text/html' is set so that BrowserMatch can be used to control the module Add a filter_init parameter to the filter registration functions so that a filter can execute arbitrary code before the handlers are invoked. This resolves a problem where mod_include requests would incorrectly return a 304. A problem with the keepalive enumeration caused problems when mod_dav sends error responses Various minor fixes to the htpasswd utility including The following platform-specific changes have been made: Solved the reports of .pdf byterange failures on Win32. Support WinNT CGI invocation through ScriptInterpreterSource 'registry' for script interpreter paths and names with non-ascii characters in the executable filepath Fix WinNT cgi 500 errors when QUERY_ARGS or other strings include extended characters (non US-ASCII) in non-utf8 format. This brings Win32 back into CGI/1.1 compliance, and leaves charset decoding up to the cgi application itself When deciding on the default address family for listening sockets, make sure we can actually bind to an AF_INET6 socket before deciding that we should default to AF_INET6. This fixes a startup problem on certain levels of OpenUNIX. O'Reilly Open Source convention in San Diego Weinstein, Paul Paul Weinstein visited the five day O'Reilly Open Source Conference in San Diego this week and gave his highlights of the interesting news and events O'Reilly Open Source Conference: Day 3 Paul Weinstein visited the five day O'Reilly Open Source Conference in San Diego this week and gave his highlights. The first two days of the conference were taken up by tutorials. Tim O'Reilly introduced (photo) the first Keynote speaker for this year's Open Source Conference, Lawrence Lessing, as "my favorite keynoter." Lessing, a professor of law at Stanford Law School, is a vigilant defender of freeing content from the growing limitations of copyright law within the United States. He began by confessing that this would be his second to last keynote and therefore wanted to leave a four-part refrain with the audience: Creativity and innovation always builds on the past The past always tries to control the creativity that builds on it Free society tries to protect the future by limiting the control of the past Ours is less and less of a free society It seems history has shown that creativity and innovation always build on the past. A prefect example of this property of culture, according to Lessing, can be seen in Walt Disney's "Rip, Mix and Burn[ing]" of fairy tale classics in the Twentieth Century. Yet, at the same time the past always tries to control what can be created. Again Lessing sighted that Disney, or in this case the Walt Disney Corporation, has successfully lobbied a number of the 11 total extensions of copyright law, imposing limitations on creative works from 17 years to 95 years. Thus the Walt Disney Corporation has kept others from doing to Mickey Mouse what Disney did to the Brothers Grimm. Worst, according to Lessing, is that technology has helped in the expansion of control that the reworking of copyright law has started. A perfect example is Adobe's E-Reader which limits the ability to cut-and-paste text, making it difficult for someone to even quote text for a research paper, something that is not only possible with print, but has been legally upheld as a "fair-use." A silver lining would be that within a free society one could take a stand against these abuses. After all who wants to live with Hollywood's "insane rules being applied to the whole world?" The problem is that applying direct pressure for change within the United States can be difficult. As retiring US Congressman JC Watts described it "If you're explaining you're losing." Lessing then asked, "What have you done? How many of you have given the EFF more than you've given to the other side [for music CDs or movies on DVD?]" This last refrain of Lessing's points to exactly why members of our free society should care about the limitations of our laws and technology, "never in our history has so few people controlled so much of our culture." What needs to be done is to "Free Culture" and "Create like it's 1790" when copyrights only extended to a narrow number of years and when copyright was understood to be a limitation on businesses a not a limitation on what individuals could do in creating culture. A perfect stage was now set for the second keynoter of the morning, Richard Stallman who Tim O'Reilly admitted to butting heads with on occasion, but who had a "very creative way to deal with the problems of today." RMS took right to telling everyone, "Unlike some of you, I am not an open source developer. I'm an activist in the free software movement." In the 1980s RMS was dealing with the death of the free community that he knew in the 70s. What choice did he have while all the operating systems where proprietary? His solution, he started the Free Software Foundation, "This was the only thing I could do," he conceded. RMS sees "a possibility of freedom" if "you make sure all of your software is free." While the strides with GNU/Linux have been great, the "job isn't done till all the software is free." But what does RMS mean when he says the software has to be free? To this he listed four conditions that have to be meet: Freedom Zero is the right to be able to run the software any way you want Freedom One is the ability to understand and change the software Freedom Two is the ability to share the software, changes or no, with friends Freedom Three is the ability to help build your community using the software "Geeks like to think that they can ignore politics, you can leave politics alone, but politics won't leave you alone," RMS noted, echoing Dr. Lessing, "we have to reject" efforts by politicians just as DRM - Digital "Rights" Management. According to RMS, the DRM isn't about rights; it's about theft, theft of our freedoms. RMS then took the rest of his time to poke fun at the image that some people have about his attitude of being "holier than thou." After dressing himself in an outfit appropriate for a holy figure, RMS pronounced himself "Saint iGNUcius of the Church of Emacs" and provided a prayer to bless one's computer. One should "exorcise evil proprietary operating systems" doing so would put one on the road to sainthood. (photo) During lunch on Wednesday Tim O'Reilly took time to ask questions of RealNetworks Chairman and CEO Rob Glaser about Real's announcement that they will be providing parts of their code for their next generation media platform Helix to the open source community. Glaser first reviewed the announcement for the audience: Helix is a platform for streaming media Helix Community has been created for work on the components for this new platform The client application source code will be available in 90 days with the encoder and server source code to come out at the end of 2002. Helix Universal Sever, a commercial product from Real, delivers all types of media formats such as Windows Media, mp3, even Ogg Vorbis. When asked "Why Now" Glaser replied that within RealNetworks there has always been strong support internally for open source, but Real need to make sure that open sourcing part of their code-base worked such that Real could still provide a value-added business to their base technology. Moreover embracing the open source community helps make sure open standards such as RTP and RTSP are implemented properly. Glaser continued by discussing the dual-licensing approach of using a GPL-inspired license called the RealNetworks Public Source License along with a Java-style license called the RealNetworks Community Source License saying, "We studied a lot not just how to connect with the community, but also how to build a licensing model that would allow our commercial partners to build and maintain compatible applications." O'Reilly Open Source Conference: Day 4 Thursday started with two keynotes about the role of open source technology in the world of Bioinformics. Ewan Birney, of the European Bioinformatics Institute, started by giving a crash course on how Bioinfomatics is a fusion of Biology, data gathering, and computer science and computer technology. As an example Birney noted that one of EBI's projects is to provide the Human Genome data for all to see. In doing so EBI uses a combination of open source technologies such as mySQL, Linux, Perl, Python, Apache and mod_perl. While, the code developed to run the site is available under a BSD-style license, the greater result is that the 3 Gigabytes of information that details how to make a Human is open to anyone, without restriction. Jim Kent, a research scientist at University of California, Santa Cruz continued by noting "I don't think you can have science without open source." Kent observed that the practices of science and those of the open source community are virtually the same, "People can't do [reproduce meaningful results] unless they can see your source" and peer review helps generate better science as well as better software. O'Reilly Open Source Conference: Day 5 Friday, no doubt, was the day that made the conference for many attendees as they saw how open source can assist in the production of movies such as The Lord of the Rings trilogy, heard Bruce Sterling rant about the computer industry and watched Bruce Pernes keep himself from being fined half a million dollars for breaking the DMCA - Digital Millennium Copyright Act. Milton Ngan from Weta Digital, the special effects house created by Peter Jackson, helped open the final day by discussing how open source tools are used to produce the Lord of the Rings. First, however, he entertained the audience by providing a preview of the next Lord of the Rings release, The Two Towers In creating effects for a movie the first step for Weta is to scanning in the whole file for digitalization, "a process that takes two weeks," according to Ngan. The production system consists of 125 SGI machines running Irix, 200 Linux machines and 25 NT boxes. Rendering an effect completely takes around 20 hours and is then played back one a handful of Macintosh for review. Once finished it takes another 2 weeks to transfer back to film. The open source tools Weta uses included Perl and mySQL for data storage and manipulation. Ngan also noted "Apache and PHP are used for running [Weta's] Intranet." Using open source tools in such a rugged environment "pushes the boundaries, which helps solidify the tools." Weta Digital indeed tries to give back to the open source community when possible, but Ngan noted that there is little sharing of tools within the Computer Graphics Imagery industry, "everyone has created their own solution." Moreover, while Weta does own the tools it created and New Line Cinema owns the images created by those tools, the focus and dedication of resources is in the post-production work for Lord of the Rings. If Weta Digital is not selected to for any other production work it will simply cease to exist, thus limiting the resources available to prepare their code for release to the community. Bruce Sterling started his talk on "A Contrarian Position on Open Source" by conceding that he was the token novelist, a non-programmer, talking to programmers about how to program, something akin to "a non-miner going down a mine and asking, 'Why don't you take some time to plant something down here and brighten the place up?" Sterling took an opposing view to the "Cathedral and the Bazaar" metaphor of relating the open source methodology or "bazaar" to commercial "closed-source cathedral." "It's not really about a bazaar. Open Source is about hanging out with the cool guys - very tribal and very fraternal." Which means the price for using open source software such as Linux is "having to spend time with Linux Geeks." In fact if open source technology is analogous to anything it's "just like in a refugee camp, one puts in a long amount of time for nothing." But then again, what is the alternative? Foreshadowing Bruce Pernes' talk Sterling observed that a computer running Microsoft Windows is more akin to an airport. There are "men with automatic weapons, surveillance cameras all over the place. You can't sob as you kiss your mother goodbye at the airport, because it's all on videotape. Then a security check, assumes you've swallowed dynamite and will kill any one you see. All the while attendants ask you snidely 'Where do you want to go today? As if they're doing you some sort of favor." The real problem is that "the computer industry wants to be hot and sexy." 'Information wants to be Free' or 'Information is the Economy' are slogans heard all the time. Yet this isn't what computers are about, freeing information or making money. "Computers are about relationships," they are an enabling technology not an end unto themselves. Days before, Bruce Perens, who currently works as a Senior Strategist and Evangelist of Linux and open source software with Hewlett-Packard, was scheduled to talk; Perens started making the news with his plan to violate the DMCA by describing how to work around DVD player controls. Since the DMCA prohibits making information available on how to circumvent copyright controls, HP asked Perens to take a pass at opening himself and HP to litigation. "I care more about this than getting myself fired," Perens stated, "but the fact is that getting myself fired today would hurt Hewlett-Packard's Linux program." With the disclaimer that the talk he was about to present was his own personal opinion and not that of HP, Perens vocalized some of the problems he sees in the computer industry. His desire to discuss how to work around DVD controls such as the 'Zone Coding' constraint systems that limit what geographical region a DVD can be viewed in, was designed to highlight how the DMCA, "has no exception for fair use" and removes the personal choice of allowing someone to "purchase a DVD in England on vacation and watch it at home in America." Perens continued by stating his concerns with Microsoft's Palladium initiative which "is built on the assumption that the computer user can't be trusted, thus your own computer must prevent you from doing harm" and could be the "end of open computing." After all how can one run a system akin to Linux when a "chip on the motherboard mediates your access to information" and "all digital content is encrypted for mediation by the chip." People may not even be able to print out information from a web page for use away from one's computer without paying a fee. The "unpleasant sociopolitical implications are that this Supply-Side Thinking that dominates politics today devalues the customer, citizen, individual." Perens then picked up the common theme from those before him. "What Can You Do?" his presentation slide asked. Since "policy effects all of us and since we as individuals don't get the choice of voting with our wallets," we need to make our voice heard the 'old fashion way'. "Become pen pals with your politician - use paper not email, vote" and probably most importantly, "talk about this to the people around you." Apache 1.3.26 Released Cox, Mark J Apache 1.3.26 was released on the 18th June 2002. This release addresses a recent security issue, some minor bugs found in the 1.3.24 release, and adds some new features. Apache 1.3.26 Released Cox, Mark J Apache 1.3.26 was released on the 18th June 2002. This release addresses a recent security issue, some minor bugs found in the 1.3.24 release, and adds some new features. Apache 1.3.26 was released on 18th June 2002 and is now the latest version of the Apache 1.3 server. The previous release was 1.3.24, released on the 22nd March 2002. See what was new in Apache 1.3.24. Apache 1.3.25 was never released. Apache 1.3.26 is available in source form for compiling on Unix or Windows, for download from the main Apache site or from any mirror download site. This is a security, bug fix and minor upgrade release. Due to security issues, any sites using versions prior to Apache 1.3.26 should upgrade to Apache 1.3.26. Read more about the other security issues that affect Apache 1.3. Fix the chunked encoding security vulnerability. () The main new features in 1.3.26 (compared to 1.3.24) are: Add text/xml, application/xhtml+xml, audio/mpeg, and video/quicktime mime types to the mime types magic file. Added a -F flag which causes the supervisor process to no longer fork down and detach and instead stay attached to the tty. This allows integration with daemontools. The following bugs were found in Apache 1.3.24 and have been fixed in Apache 1.3.26: Allow child processes sufficient time for cleanups but making ap_select in reclaim_child_processes more "resistant" to signal interrupts. In Darwin, place dynamically loaded Apache extensions' public symbols into the global symbol table. This allows dynamically loaded PHP extensions. Fix for a problem in mod_rewrite which would lead to 400 Bad Request responses for rewriting rules which resulted in a local path. Note: This will also reject invalid requests as issued by Netscape-4.x Roaming Profiles (on a DAV-enabled server) Recognize platform-specific root directories (other than leading slash) in mod_rewrite for filename rewrite rules. Disallow anything but whitespace on the request line after the HTTP/x.y protocol string to prevent arbitrary user input from ending up in the access_log and error_log. Also control characters are now escaped. A large number of fixes in mod_proxy including: adding support for dechunking chunked responses, correcting a timeout problem which would force long or slow POST requests to close after 300 seconds , adding "X-Forwarded" headers, dealing correctly with the multiple-cookie header bug, ability to handle unexpected 100-continue responses sent during PUT or POST commands, and a change to tighten up the Server header overwrite bug-fix. Apache 1.3.24 Released Cox, Mark J Apache 1.3.24 was released on the 22nd March 2002. This release addresses a security flaw on Windows, some minor bugs found in the 1.3.23 release, and adds some new features. Apache 1.3.24 Released Apache 1.3.24 was released on the 22nd March 2002. This release addresses a security flaw on Windows, some minor bugs found in the 1.3.23 release, and adds some new features. Apache 1.3.24 was released on 22nd March 2002 and is now the latest version of the Apache server. The previous release was 1.3.22, released on the 24th January 2002. See what was new in Apache 1.3.23. Apache 1.3.24 is available in source form for compiling on Unix or Windows, for download from the main Apache site or from any mirror download site. This is a security, bug fix and minor upgrade release, with a few new features. Users should upgrade if they are running on Windows, will be affected by the particular bugs mentioned below, or would like to use any of the new features. Due to security issues, any sites using versions prior to Apache 1.3.22 should upgrade to at least Apache 1.3.22. Read more about all the security issues that affect Apache 1.3. Apache for Win32 before 1.3.24 allows remote attackers to execute arbitrary commands via parameters passed to batch file CGI scripts. More details in Apache Week issue 288 or The problem occurs because the input is not properly validated; it is possible to append commands as parameters to the batch file CGI script and have the shell interpreter execute them The characters % and \r have been added to the dangerous Win32/OS2 characters list, and the command line is now passed to the interpreter double quoted. In addition Apache now introduces earlier identification of command.com vs cmd.exe, and treats command.com as a 16-bit application As additional protection in case future CGI argument vulnerabilities are discovered, a new directive CgiCommandArgs off has been added to allow administrators to completely disable the query argument passing mechanism in Apache A bug was found that could cause invalid hostnames to appear in Apache log files. If a double-reverse lookup was performed (for example for Allow from .example.com) but failed, then a spoofed dns-reverse-address could appear in the logs. Note this bug doesn't give any access to protected resources, it only affects what gets written to the log file The main new features in 1.3.24 (compared to 1.3.23) are: Add IgnoreCase keyword to the IndexOptions directive to allow filename listings to ignore case The proxy code read chunks from the backend server in a hardcoded amount of 8192 bytes. A new directive ProxyIOBufferSize has been added to specify the size of the read buffer from the remote server Previously the proxy would wait until the response had been delivered to the client completely before closing the backend connection. Now the backend connection is closed as soon as the last byte is read from it, freeing up resources mod_alias writes a warning to the error log if it fixes up a incomplete redirection target (such as turning /foo into http://host/foo). Since this is a supported operation the message has been demoted so that it will only show up at LogLevel Debug When using mod_proxy to access FTP sites it was impossible to reach a higher directory than the logged in directory, as combinations of /../ are interpreted by the browser and not sent to the server. This problem affects other proxies as well. The Squid proxy uses a "Squid %2f hack" which has been adapted to work in Apache. By prepending /%2f to the path of your request, you can make the proxy change the FTP starting directory to / instead of starting at the home directory for the logged in user The main new features that apply to specific platforms are: Provide new logging to assist Win32 users debug CGI scripts. When at LogLevel info the cgi command invoked is logged. When at LogLevel debug the environment variables are also logged Added a logging module for NetWare, mod_log_nw, as NetWare is unable to use the RotateLog utility Added a -e command line directive for NetWare to force all fatal configuration file errors to the logger screen. This allows Apache to shutdown cleanly and completely on an error condition The following bugs were found in Apache 1.3.23 and have been fixed in Apache 1.3.24: Fix a segfault condition in mod_include which could be triggered by improper termination of conditional directives such as #if Fix a problem in mod_proxy where the Server header from the backend system would be replaced by one from Apache. This violated RFC2616. This fix has introduced a further issue which allows modules to override the Server header, but this will be fixed in the next release There is a problem in mod_proxy where each entry of a duplicated header such as Set-Cookie would overwrite the previous value of the header, resulting in multiple header values (like cookies) going missing. A fix was committed to 1.3.24 but doesn't fix the problem Fixes to apxs to allow the -S option to contain quotes, and to rebuild apxs when options have been changed The Location response header, used for external redirects, must be an absolute URI. The Redirect directive tested for that, but RedirectMatch did not and would allow almost anything through Fix a longstanding bug that errors returned by src/Configure would not be noticed by the top level configure script. That was bad for automated production environments, as errors would pass through unnoticed mod_proxy would send a HTTP/1.0 request even though it is now compliant with HTTP/1.1 A number of other changes have been made to FTP handling in mod_proxy including properly escaping file names from directory listings, a cleanup to the output HTML, the output of directory listings in ASCII to avoid issues with EBCDIC servers, and the closing of the data and control channels to the server properly Previous fixes to mod_rewrite in Apache 1.3.23 broke the ability to do random balancing. , The following bugs relate to specific platforms: The Win32 port has had the remaining cases of blocking network IO eliminated A change has been made on TPF to make make the ap_open_logs call the same as other platforms and prevent a possible SIGPIPE in standalone_main Work around a bug in Windows XP that caused data corruption on writes to the network The support for enabling pthreads-based accept() serialization using the AcceptMutex configuration directive suffered from a serious problem on Solaris platforms as the pthreads library was not being linked into the httpd executable. This meant stub versions of the mutex functions are used from the C library, which resulted in no serialization being enforced Apache 1.3.23 Released Cox, Mark J Apache 1.3.23 was released on the 24th January 2002. This release addresses some minor bugs found in the 1.3.22 release, and adds some new features, including HTTP/1.1 support for mod_proxy Apache 1.3.23 Released Apache 1.3.23 was released on the 24th January 2002. This release addresses some minor bugs found in the 1.3.22 release, and adds some new features, including HTTP/1.1 support for mod_proxy Apache 1.3.23 was released on 24th January 2002 and is now the latest version of the Apache server. The previous release was 1.3.22, released on the 12th October 2001. See what was new in Apache 1.3.22. Apache 1.3.23 is available in source form for compiling on Unix or Windows, for download from the main Apache site or from any mirror download site. This is a bug fix and minor upgrade release, with a few new features. Users should upgrade if they will be affected by the particular bugs mentioned below, or would like to use any of the new features. Due to security issues, any sites using versions prior to Apache 1.3.22 should upgrade to at least Apache 1.3.22. Read more about security issues that affect Apache 1.3. The main new features in 1.3.23 (compared to 1.3.22) are: HTTP/1.1 support has been added to mod_proxy after being backported from the Apache 2.0 updates started last April. The updates include support for Cache-Control, content negotiation using Vary, persistent connection handling, and much more. A new directive, FileETag allows the format of the ETag to be controlled via runtime directives. Find out more about this new feature. Addition of a 'filter callback' function to enable modules to intercept the output byte stream for dynamic page caching The following bugs were found in Apache 1.3.22 and have been fixed in Apache 1.3.23: Fix incorrect Content-Length header in 416, "Range Not Satisfiable" responses Revert mod_negotiation handling of path_info and query_args to the 1.3.20 behavior. , , Prevent an Apache module from being loaded or added twice due to duplicate LoadModule or AddModule directives Add run-time validation of the Group directive, to catch invalid but syntactically correct values. The following bugs relate to specific platforms: Versions of FreeBSD from August 2000 include a feature called "accept filters" which delay the return from accept() until a condition has been met. Apache will now use the "httpready" accept filter rather than "dataready" on FreeBSD after 4.1.1-RELEASE where it works correctly. More details of accept filters are available. Some fixes for Netware including link problems with mod_vhost_alias, file locking updates to get mod_auth_dbm to work, and a problem when accessing an empty directory which has option indexes specified producing an access forbidden message On HPUX 11, an ENOBUFS, No buffer space available error occurs when an accept() cannot complete. This error is now ignored so that child processes don't get incorrectly terminated Win32 platforms would incorrectly always return forbidden in response to a OPTIONS * request Unixware 7.0 and later did not have a default locking mechanism defined. This bug was introduced in apache 1.3.4 A number of fixes for Cygwin including a better default mutex as well as better proxy and DBM support A bug on Win32 could cause Apache to stop responding to requests for a period of time if the MaxRequestsPerChild directive was set to anything other than 0. MaxRequestsPerChild of 0 is the recommended setting Win32 will now output an error message if the server hits the ThreadsPerChild limit. This is useful for administrators to detect when their server is running out of threads to handle requests Tsan, Min Min Featured Articles 2001 Our selection of the best featured articles from our weekly newsletters. Read about everything from "Apache and Tomcat" to "Apache Security" Featured Articles 2001 Each week Apache Week brings you our pick of the best Apache related articles from around the web. In this special feature we select our favourites from each category. The Developer Shed kicks off the new "Getting More Out Of Apache" series with virtual hosts and Server-Side Includes. In part 2 of "Getting More Out Of Apache", the Developer Shed shows you how to implement basic user authentication and set up access control groups. It also talks about Apache logging capabilities and the powerful URL rewriting module. "Setting up Apache with mySQL, Frontpage 2000 Extensions, and PHP NHF" is a Newbieized Help File (NHF) written by Dallas Engelken for newbies to get Apache up and running with Frontpage support in no time at all. In "Linux for Newbies, part 22", Gene Wilburn stresses on the benefits of compiling Apache and any related modules by hand. Instructions are given for removing existing Apache and PHP from one's system before compiling them again from source. By doing this, users control how the packages are built and choose the locations for the various parts. If you prefer to build Apache from source manually, you may be interested to refer to Apacompile which basically is a set of instructions and examples for compiling Apache and other common modules such as mod_ssl, mod_auth_ldap and mod_php. There are still some configuration samples yet to be completed. For those using Mac OS, here's a straightforward step-by-step tutorial on building Apache 1.3.22 and PHP 4.0 for Mac OS X 10.1 However, the instructions don't include integrating mod_perl or mod_ssl. The Developer Shed presents step-by-step instructions for building Apache, MySQL, WebDAV and PHP on Mac OS X. All these programs compile and run on Mac OS X due to its BSD-based UNIX core known as Darwin. To avoid confusion, the Apache Web server built is not enhanced with mod_ssl. Noel Davis looks at how to overcome an Apache on Mac OS X security issue which only involves those who store files on Mac OS X's HFS+ file system. Three workarounds are available for this problem. Kevin Hemenway unravels the mystery of the built-in Apache web server that comes with Mac OS X in his first article of a new series about serving web pages from a Mac. You'll learn how to start up Apache, access your personal home page, locate Apache's DocumentRoot, and customise the default web page. This is just the appetiser - there are more to come in the next installment when Kevin gets down to the crux of maintaining a full-fledged web site. Apache on Windows NT, how does it compare to Apache on UNIX or other web servers such as IIS? Apache Today has the answer. Windows users who are interested in using Apache but are discouraged by the apparent lack of online information about this topic may like to check this out. "A Feather in Your NT Cap" persuades users running Microsoft's Internet Information Server (IIS) on Windows NT to migrate to Apache on NT. It lists the three limitations of Apache's ISAPI implementation, describes two main ways of installation, gives an overview of the configuration, and shows you how to start Apache as an NT service. At WebTechniques.com, Jim Jagielski has a few tips for those who are providing web-hosting services in "Customer Number One". He looks at two methods for Apache on how to provide every customer with dedicated server performance and quality guarantees in a shared server environment as if he or she is the only customer. The first uses mod_throttle to control various parameters, such as the number of requests or the total bandwidth used on a per server, virtual host, location, directory or user basis. The second allows CGI scripts to execute under its own user and group ID using suExec. He also discusses the pros and cons of running multiple instances of Apache simultaneously. "Save Your Site from Spambots" teaches you how to use mod_rewrite to redirect "spambots", software packages that crawl the Web harvesting e-mail addresses and adding them to bulk e-mail lists, to a specific page that has "special" messages just for them. Since this method uses the content of the User-Agent: HTTP header to identify the "spambots", it won't prevent "spambots" that masquerade as other browsers from scraping e-mail addresses from your web site. Other solutions are presented as well and the one recommended is "spamtraps" - special addresses that are solely used for catching spammers. The author concludes that the best way to combat unwanted bulk e-mail is to immediately report spam to the ISP from which it originates as many times as it takes until the ISP takes the necessary actions. The administrators at evolt.org are "Using Apache to stop bad robots". In a short article they show how they capture robots that not only ignore the robots.txt file, but deliberately try to index files they are told not to. Morbus Iff develops a "Search Engine Friendly SSI Image Gallery" in his article on evolt.org. The article shows how to create a dynamic image gallery, using only the features built into a core distribution of Apache. WebmasterBase.com looks at the pros and cons of three methods of passing information to your web pages without the use of a query string so that your web site has search engine-friendly URLs. The methods are the implementation of PATH_INFO, .htaccess error pages, and the ForceType directive, and have been tested using PHP with Apache on Linux but they should also work on other platforms. Information Security Magazine presents an article on improving Apache and a case study on companies that swear by (not at) Apache in its April issue. It starts off by refuting the mindset that running Apache guarantees security although it readily admits that Apache deserves its reputation for being a secure Web server. Then it provides the steps for installing Apache and mod_ssl, securing the underlying Linux server, and testing Web applications for vulnerabilities. Sys Admin magazine presents Apache::Motd, an Apache module based on the "Message Of The Day" utility found on UNIX systems. It intercepts user's initial request and displays the contents of the motd file before serving the requested page. Carlos Ramirez, its creator walks us through the installation and configuration process. Linux Gazette provides three different options to redirect a request to another virtual host running on the same webserver. If you want to distinguish yourself from the boys, the solution is to use mod_rewrite under a Virtual Host container. It also shows you how to achieve the same results using a Perl script or the Redirect directive. "Apache CodeRed Countermeasures with PHP: codeRedKiller!" provides a solution on how to prevent Code Red requests from reaching your Apache Web server by using PHP and bash. Basically it uses a PHP script to record the source IP address of the request and then runs a shell script to set up a filter in your firewall to block any further requests from the same source. You could use a simple shell script to parse your Apache error log to obtain the source IP address instead of using PHP. This article also advises you to ensure that the source IP address is not spoofed. The drawback is that all other valid requests from the source IP address will be stopped from reaching your web server permanently until you remove the filter. Fancy a role in Episode 2, Attack of the Code Red 2 Worm? No, this is not a new B-grade movie but how you can be a good internet citizen and let people know that their server has been infected by the Worm. One way is by using Apache::CodeRed written by Reuven M. Lerner. In this article, he explains how the module intercepts requests for /default.ida, determines the host name of the HTTP client, sends only one warning e-mail message in a 24-hour period to SecurityFocus and the administrator of that client, and keeps a list of IP addresses to be ignored. Interested in setting up your own Net radio stations? Start then by reading this introduction to mod_mp3, a module that optimises the Apache Web server for streaming MP3s. Although mod_mp3 is still in its infancy, it already supports file-sharing and all the basic webcasting functions, with many more ambitious features in the pipeline. Chris Bush explains the basics of Tomcat configuration and includes instructions for integrating Tomcat with Apache in "Linux as an Application Server - The Tomcat Way". A good read for those interested in supporting Java Servlet 2.2 and JSP 1.1 with Apache Web Server. "JSP Quick-Start Guide" has been updated recently for use with Apache 1.3.22, Tomcat 4.0.1, and mod_webapp which is the new Apache connector module for Tomcat 4.x. This step-by-step tutorial shows you how to set up and run a JSP-enabled server under Windows. By the end of this, you'll have a basic JSP page working smoothly. This week, it's Apache and Tomcat again as Robert Eksten shows us how to set up Tomcat as an Apache add-on using mod_jk instead of mod_jserv. It is relatively simple as it only installs prebuilt components and the steps do not involve compiling source code. In "The Apache XML Project: How To Get Read All Over", Software Development magazine walks you through a project that uses Java, Jakarta Tomcat and Cocoon to serve XML documents. Lawrence Teo explains how to set up a web-based archive for a mailing list in Issue 72 of Linux Gazette. He uses Apache as the web server, Hypermail to convert the e-mail messages stored in a UNIX mailbox file to a set of cross-referenced HTML files, and cron to update the web-based archive periodically. He assumes that those three components have been installed on your system so only the instructions on how to configure them are provided. At LinuxWorld.com, Joshua Drake gives a guide on "How to save an Apache log file in a PostgreSQL database". The article gives a step by step guide to using the pgLOGd program with Apache. Introduction to WML, Apache, and PHP is a good starting point for developing PHP-enhanced WML applications on the Apache Web Server. Instructions are given on configuring Apache to accept and serve WML enabled decks. By the end of this, you will have your first 'simple' wireless page. PHPBuilder take a look at "using Webalizer to analyze Apache logs". Webalizer is a freely available log analysis tool written in C that is designed for speed; even on a modest machine it can handle tens of thousands of log lines a second. However it can be tricky to get Webalizer installed, so this article takes you step by step through how to get it installed and running. "You Can Get There from Here" part 1 and part 2 show you how to install, configure, and use Squirrelmail on your PHP4 enabled Apache web server. For better security, you can run Squirrelmail on a SSL-enabled Apache web server or implement Apache's basic authentication. "You Can Get There from Here, Part 5" shows you how to install, configure, and use Rolodap on your PHP4 enabled Apache web server. You need to compile PHP4 with LDAP support for this. In case you hadn't guessed it from the name, Rolodap is an electronic version of the traditional desktop rotary file of cards, usually used for registering contact information. John Lim presents his compilation of 22 tips on "Tuning Apache and PHP for Speed on Unix" in PHP Everywhere. The tips can even be applied to Perl and Python too. In "Tuning Your Apache Web Server", Don MacVittie shows us how to configure the directives in the httpd.conf file to achieve maximum performance. Users have to ensure that their hardware can support the volume of connections they are aiming for, before starting with the optimisation. As there are no hard and fast rules for tweaking the settings, the best configuration is obtained by trial and error - benchmarking the server after changing the directives each time. Ibrahim F. Haddad explains the results he got for testing the performance of three open-source web servers: Apache, Jigsaw and Tomcat on his experimental Linux cluster platform. He performs four type of tests, each with a different server and on 1, 2, 4, 6, 8, 10, and 12 CPU systems but only presents three comparison cases: Apache 1.3.14 vs. Apache 2.08a on one CPU, Apache 1.3.14 vs. Apache 2.08a on eight CPUs and Jigsaw 2.0.1 vs. Tomcat 3.1 on one CPU in this report. His conclusion is that Apache is considerably faster and more stable than the other web servers. Are your Web servers up to the strain of real-world usage? "HTTP Benchmarking" describes a sample benchmarking setup and shows you how to use httperf and Autobench to stress-test your systems. Joe "Zonker" Brockmeier walks you through the process of setting up and running a few benchmark tests against Apache using autobench and httperf in "HTTP Benchmarking, Part 2". The tests are performed on both the Debian x86 and SPARC distributions but will apply to any UNIX-based OS running Apache. In "HTTP Benchmarking, Part 3: Tips and Tweaks", Joe "Zonker" Brockmeier shows you how to tweak the Apache Web server to improve performance. Although he focuses on Linux systems, some of the tips can be applied on other systems as well. In "Performance Tuning by Tweaking Apache Configuration", Stas Bekman demonstrates how to fine-tune the MinSpareServers, MaxSpareServers, StartServers, MaxClients, and MaxRequestsPerChild directives to maximise the usage of your system resources and to ensure good performance. He uses the ApacheBench (ab) utility to benchmark the Apache Web server with around ten different combinations of parameter settings in the tweaking process. Jeffrey Carl gives a few tips on handy tools to use when troubleshooting server problems in "The Web Server First Aid Kit". Its approach can be applied to most Unix and Linux systems but it occasionally refers specifically to the Apache Web Server. Some of the problems it tackles are: figuring out the cause of slow response from server, unauthorized entry, and network misconfiguration. eWEEK Labs' latest Web server benchmark tests show that Apache 1.3.19 running on Linux displayed a huge 2.5 factor speedup in just two years of development time. Sys Admin magazine describes how to build an affordable load balancing cluster using the Apache HTTP server and the Apache JServ Java application server. It also provides some interesting benchmark test results. Last November (Apache Week issue 224), we mentioned that APR (Apache Portable Run-time) has spinned off into a separate project. In "Aid From APR", Ryan Bloom explains about its advantages and illustrates his point by comparing a APR segment of code with the native code. In CNet Builder.com, it's Ryan Bloom again as he talks about how Apache 2.0 is more than a web server as it has the potential to serve any protocol. He reveals the benefits of using a single server for multiple protocols and the way to implement it using Apache 2.0. Ryan Bloom kicks off a new series of columns about Apache 2.0 for O'Reilly Network readers with his first column - "Installing Apache 2.0". This piece proves to be merely a rehash of his previous Apache 2.0 articles except for a mention of mod_tls. In "Migrating from Apache 1.3 to Apache 2.0", Ryan Bloom shares his experience of porting the apache.org web server to Apache 2.0 with O'Reilly ONLamp.com's readers. He gives some tips on which Multiprocessing Module (MPM) to use, implementing filters, and how to solve the problem of IPv6 support. O'Reilly ONLamp.com brings you the latest information about filters for Apache 2.0 in Ryan Bloom's column. This article is just an introduction to the subject, covering some of the basic concepts of filtered I/O which is the ability for one module to modify the output of an earlier module, listing three standard filters included in the basic Apache distribution, and explaining what filter types are. Meanwhile, "Writing Apache 2.0 Output Filters" gives enough information for a developer to be able to write an output filter from scratch. According to Ryan, developers have improved the interface over the past few releases so that the complex task of writing filters becomes easier. Moving on from output filters, Ryan Bloom explains about writing input filters in his latest article in the Apache 2.0 series. He highlights three differences between input and output filters, covers the ap_get_brigade function, and walks readers through an example input filter in detail. After reading this, you can start writing your own input filters. In Ryan Bloom's swan song for the Apache 2.0 Basics series, he talks about one of the least publicised new features in Apache 2.0 which is allowing one module to call into another module to execute an operation. In Apache 1.3, for two modules to execute the same operation, the feature has to be implemented in both of the modules, making synchronisation of changes a tedious task. He uses the mod_include and mod_cgi modules to illustrate his points. In "Apache 2.0: The Internals of the New, Improved A PatCHy", Ibrahim F. Haddad gives an overview of Apache 2.0 and shares with us the results of his Apache 2.0.8 performance tests. In conclusion, he highly recommends that current Apache 1.3.x users upgrade to Apache 2.0 once the release version is available. Please refer to "Apache Portable Runtime Project" and multiprocessing modules (MPMs) if you require more information about these two subjects. "Learning PHP: The What's and the Why's" is the first article in a new series that aspires to teach everything about PHP, beginning with the basics of PHP to advanced subjects such as databases and XML support. This introductory piece briefs us on what PHP is, its history, and the reasons for choosing it over other languages. Make a trip down memory lane with Rasmus Lerdorf, creator of PHP as he guides us through PHP's origin, usage, syntax, and features in "Scripting the Web with PHP". It provides a good overview on all that PHP has to offer with simple examples that illustrate the concepts clearly. The topics covered are the four different PHP tag styles, ways to install PHP, how PHP handles variables and errors, manipulates strings, connects to relational databases, generates content in formats other than HTML, and manages session. He advises that the best way to learn PHP is to use it. While PHP is easy to learn, it is another story when it comes to getting it right. In his three part article series, Sterling Hughes imparts some advice on how to prevent 21 common mistakes made by PHP programmers. It is worthwhile to read through the list of textbook, serious, and deadly mistakes, and give yourself a pat on the back if you have managed to avoid all of them. "Best Practices: PHP Coding Style" stresses the importance of having a coding standards and sheds some light on the PHP PEAR Project. Find out more about mod_perl in the first of a series of updated articles by Stas Bekman. "Why mod_perl?" intends to entice you to give it a try by revealing mod_perl's popularity and presenting a few well-known sites that are powered by it. Now that you're hooked, you'll be glad to know that it only takes 30 minutes to get started with mod_perl and here's how to do it. Take23 shows us how to use Apache::PortCorrect (a Perl module) to redirect users from a nonsecure port over to a secure SSL port based on the URL that they are trying to access. This article is for those who are more at home using mod_perl with the Apache Web Server and mod_ssl than setting up a set of mod_rewrite rules to perform the same task. Stas Bekman talks about improving mod_perl performance. He starts off with choosing the right operating system and hardware in part I, comparing various benchmarking tools in part II and now in part III, he continues with code profiling and memory measurement techniques. In "Improving mod_perl Driven Site's Performance - Part IV", Stas Bekman delves into the benefits of using shared memory, and calculates the size of a process' shared memory and the real memory used. Stas Bekman continues with other techniques on saving even more memory in "Improving mod_perl Driven Site's Performance"". It does pay to be frugal. In Apache Today, "Improving mod_perl Driven Site's Performance Part VI" is haunted by zombie and ghost. Of course Stas is referring to "orphan" processes as he explains in technical terms why it is bad to fork subprocessess from mod_perl. The administrator at cgisecurity.com looks at some common fingerprints used in port 80 exploits with a few examples on how each attack signature may be implemented. It covers common malicious requests, commands which may be executed by worms, files which may be requested by attackers, buffer overflows, and hex encoding. Although it is not meant to be an exhaustive list, it is sufficient to help web server administrators identify attack patterns in their logs, and to add the appropriate rules to their Intrusion Detection Systems (IDS). In "Freeware Security Web Tools", Gary Bahadur talks about a few freeware Linux tools that can be used to perform footprint and vulnerability analysis, the first two phases of a web server security assessment. Among the tools mentioned are Nmap, Netcat (nc), Whisker, Cgichk.pl (a Perl-based scanner), Malice (also a Perl-based scanner), and Md-webscan. In "Safer CGI Scripting", Charles Walker and Larry Bennett cover methods to fix various CGI scripts vulnerabilities and touch on developing a CGI security strategy. Although the examples are written in Perl and C, they can also be applied to the scripting language of your choice. In PHP DevCenter, Darrell Brogdon looks at security issues relating to PHP when running PHP as either an Apache module or a CGI binary, and the ways to remedy them. PHP, a server-side HTML-embedded scripting language, offers web developers the convenience of generating dynamic page content, and supports a wide range of databases but PHP programs are vulnerable to security compromises if they are poorly written. "On the Security of PHP, Part 1" aims to minimise this risk by offering some guidelines on secure PHP programming practices. It begins with an overview of PHP, and then examines some of the most common security issues with PHP programs. "On the Security of PHP, Part 2" wraps up this two-parter by showing us how to secure PHP scripts with a combination of safe programming practices and PHP settings. It talks about how to use PHP safe mode, how to avoid the risks posed by files with a .inc extension, how to filter user input, and how to prevent scripts from changing PHP configuration options. "Avoiding security holes when developing an application - Part 6: CGI scripts" explores a few examples of poorly written Perl scripts which are vulnerable to security compromises. Before delving into the code, it gives an overview of how a web server works and explains about server-side includes (SSIs) for Apache. Perl developers are advised to use the "warning" option, "taint mode" option, and to specify "use strict" at the beginning of their Perl scripts. In the wake of the Code Red worm, Joe "Zonker" Brockmeier warns Unix and Linux administrators running the Apache Web Server not to let their guard down in this tongue-in-cheek but apt piece entitled "Thinking about Security". I'm sure many of you will find his advice on how to stop your boss from embarrassing himself useful. Apache 1.3.22 Released Cox, Mark J Apache 1.3.22 was released on the 12th October 2001. This release addresses some security flaws, fixes minor bugs found in the 1.3.20 release, and adds some minor new features. Version 1.3.21 was not released. ********************************************************** Apache 1.3.22 Released Apache 1.3.22 was released on the 12th October 2001. This release addresses some security flaws, fixes minor bugs found in the 1.3.20 release, and adds some minor new features. Version 1.3.21 was not released. Apache 1.3.22 was released on 12th October 2001 and is now the latest version of the Apache server. The previous release was 1.3.20, released on the 22nd May 2001. Version 1.3.21 was never released. See what was new in Apache 1.3.20. Apache 1.3.22 is available in source form for compiling on Unix or Windows, for download from the main Apache site or from any mirror download site. This is a security fix, bug fix and minor upgrade release, with a few new features. Users should upgrade if they will be affected by the security problems, have noticed particular bugs mentioned below, or would like to use any of the new features. Due to security issues, any sites using versions prior to Apache 1.3.14 on Unix, or all versions on Windows or OS2, should upgrade as soon as possible. A vulnerability was found in the Win32 port of Apache 1.3.20. A client submitting a very long URI could cause a directory listing to be returned rather than the default index page. A 403 Forbidden will now be returned. A vulnerability was found in the split-logfile support program. A request with a specially crafted Host: header could allow any file with a .log extension on the system to be written to. , A vulnerability was found when Multiviews are used to negotiate the directory index. In some configurations, requesting a URI with a QUERY_STRING of M=D could return a directory listing rather than the expected index page. The security issues above have been assigned standardized names, CAN- by the Common Vulnerabilities and Exposures project The main new features in 1.3.22 (compared to 1.3.20) are: The user manual has been updated. As well as a number of small fixes these updates include new translations into French and Japanese, a guide to using Apache httpd on Cygwin, a lexicon of Apache error messages, updated TPF documentation, and a comprehensive guide to using log files The user manual can now be moved out of the htdocs DocumentRoot during installation by invoking configure with the --manualdir= switch, to allow separation of on-line docs from regular contents. The supplied icons are now also distributed in PNG format A significant overhaul to the the Apache Bench program, ab has taken place, as first reported in April. The new Apache Bench includes fixes, additional statistics, csv and gnuplot output, and SSL support New directives have been added to the mod_usertrack module, The first, CookieDomain, can be used to customise the Domain attribute. The patch to add the CookieDomain directive was first submitted over two years ago. Historically mod_usertrack has used the obsolete Netscape cookie syntax. The new CookieStyle directive allows use of the RFC2109 or RFC2965 syntax instead. ,,. The server will now display a warning if line-end comments (#) are found in the configuration file. Not all directives are able to handle comments on the same line A new directive, AcceptMutex, allows run-time configuration of the mutex type used for accept serialization, currently a compile-time only setting in 1.3. Since different types of mutex have different performance characteristics on different platforms, this directive will allow administrators to tune their Apache server more easily. The current list of possible methods is: uslock, pthread, sysvsem, fcntl, flock, os2sem, tpfcore, none. Not all platforms support all methods mod_auth has been enhanced to allow access to a document to be controlled based on the owner of the file being served. Require file-owner will only allow files to be served where the authenticated username matches the user that owns the document. Require file-group works in a similar way checking that the group matches New features that relate to specific platforms: A new directive, AcceptFilter, has been added to control BSD accept filters at run-time. This should make it easier to move server binaries across different BSD machines without requiring recompilation. Support for accept filters was first added to version 1.3.14, the functionality can postpone the requirement for a child process to handle a new connection until an HTTP request has arrived, therefore increasing the number of connections that a given number of child processes can handle On Win32 mod_unique_id, mod_mime_magic, and the mod_vhost_alias modules are now enabled On Win32 the code to allow the server to run under Cygwin has had a number of fixes and updates. Cygwin support was first added to version 1.3.20 On Windows NT or 2000, the service display names can now be modified by the user (use the service control panel applet) On Win32 add a new option -W that can set up a service dependancy The server will now take advantage of recent improvements to the TPF operating system which include an enhanced system fork and exec, updates to allow non-blocking file descriptors, and an update to shutdown processing The server has been ported to a new OS, Atheos The following bugs were found in Apache 1.3.20 and have been fixed in Apache 1.3.22 Under certain circumstances a child may crash due to a bug in mod_include. If a server uses an ErrorDocument for 404 (request not found) errors which points to a server-parsed HTML file which uses a  section, then a request containing %2f will result in a segfault. The segfault is harmless and does not cause a security problem, but is being triggered by the recent IIS worm The Multiviews functionality has been fixed to prevent mod_negotiation from serving any multiview variant that contains unknown filename extensions. Apache will prefer installed version of the Expat library over the bundled version. This fixes conflicts when multiple copies of the Expat library get loaded (notably when using mod_perl and XML::Parsers::Expat) UnsetEnv now works from the main body of a configuration file. When used as a reverse proxy any headers set by other modules (such as mod_usertrack or mod_securid) now get passed on to the back-end server. Server response headers can now be logged via the proxy. mod_proxy will now pay attention to HTTP headers that specify the request is not to be cached. When a client making a request via mod_proxy died unexpectedly, mod_proxy did not close its connection. The CacheForceCompletion directive has been fixed , , A memory leak has been fixed in the mod_mime_magic module A Satisfy All option has been added to the default container designed to stop access to .htaccess files. Without this directive, these files could still be fetched if they were within the scope of a Satisfy Any directive. The following bugs relate to specific platforms: A number of fixes for NetWare have been added. These include: enabling long file names in htpasswd and htdigest, protection against ill behaved modules, better handling of abnormal shutdowns, dealing with the limited stack space during server side includes, and recognising special filenames such as proxy:http:// correctly A shutdown hang could occur on Solaris when using lots of piped TransferLogs and at least one piped ErrorLog On EBCDIC platforms a bug in the proxy module stopped SSL proxying working On Win32, mod_unique_id did not guarantee a unique ID due to threading The Win32 Makefiles are now 100% compatible with the Microsoft Visual C++ compiler versions 5,6,7 Cox, Mark J Code Red requests for /default.ida Don't panic if you see requests for the default.ida file in your Apache access logs. These requests are from the Code Red Worm designed to seek out vulnerable IIS servers. Code Red requests for /default.ida We continue to get a large number of messages from system administrators who see requests for default.ida in their Apache access logs. These requests are from the Code Red Worm designed to seek out vulnerable IIS servers. We receive a large number of messages from system administrators who see requests for /default.ida in their Apache access logs. The requests look similar to this: 192.168.2.12 - - [19/Jul/2001:16:55:47 +0100] "GET /default.ida?NNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNN%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3% u7801%u9090%u9090%u8190%u00c3%u0003%u8b00%u531b%u53ff%u0078%u0000%u00=a HTTP/1.0" 400 252 - If you are running Apache there is nothing to worry about, these requests are part of the Code Red Worm designed to search out vulnerable IIS servers running on Windows. You can quite happily ignore these requests Other common log entries you might see include: Requests for robots.txt in the root directory. These requests are normally automatically made by robots which will analyse the contents of this file to see what files and directories they are not allowed to access. The format of the robots.txt file is given in the HTML 4 Specification. Requests for favicon.ico in various directories (first seen in April 1999). Microsoft Internet Explorer version 5 and above can display a site-defined icon when a site's URL is displayed in a favourites list. This icon is obtained by asking the site for favicon.ico. If the URL contains slash characters (normally used to represent a directory hierarchy), MSIE 5 will request "favicon.ico" in each parent directory until it finds one or reaches the root. The format of the favicon.ico file is the Microsoft icon format. To see this 'feature' in action, bookmark this page using MSIE. Requests for cmd.exe in various directories. These are usually attempts to exploit various security vulnerabilities that affect Microsoft IIS servers. Apache 2 Release Apache 2 was released on the 6th April 2002; we look at the history of development on Apache 2.0 and features to help you use this new release Apache 2: General Availability Apache 2 was released for General Availability on the 6th April 2002 After many years and several betas the Apache group were proud to announce the general availability of Apache 2.0 on the 6th April 2002. The general availability release was based on Apache 2.0.35. Apache 2: Brief History Plans for Apache 2.0 were discussed back in the summer of 1996. We look at the significant events in the history of Apache 2 The Apache group started discussing plans for Apache 2.0 as far back as the summer of 1996, just after the release of Apache 1.1.1. In July 1996, Apache Week issue 24, covered the first discussions of multithreading. One month later, in August 1996, Apache Week issue 128 looked at how useful filtering would be, and some possible ways of implementing it. Filtering was finally added to Apache 2.0 four years later, in August 2000. In February 1997 the Apache group looked at plans for the server after version 1.2 was released. The plans included a considerable rewrite to include support for multithreading, filters, and OS abstractions to allow for versions of Apache on Windows NT and other systems. Apache Week issue 54 covered these plans and predicted that Apache 2.0 is "likely to take some time" May 1997 saw the group decide that Windows releases would be outside of the main Apache development effort. The aim for 2.0 was to ensure that the same code is used for all operating systems with a set of platform-specific routines to handle anything that varies between operating systems. (Apache Week issue 65) . In June, Apache 1.2 had been finally released and work started on the requirements for the redesign of the core Apache code. Apache Week issue 69 discussed the need for additional processing phases, and the plans for a graphical configuration manager. All the plans for Apache 2.0 were summarised in February 1998, Apache Week issue 102. The major changes being discussed were multithreading, filtering, new process models, better system configuration, API changes, and changes to the configuration syntax. Some thought was also given to rewriting Apache in C++, but this idea was later dropped. In June 1998 the Apache core developers met for the first time to discuss the organisational structure of the Apache group as well as the plans for Apache 2.0. Apache Week issue 121 covered this meeting. Over a year later, in September 1999, we revisited Apache 2.0 development in a special feature, "Apache 2.0 preview". At this stage a beta was expected in late 1999 or early 2000. In January 2000, Apache Week issue 181, there was discussion within the group about how to deal with feature additions to the stable Apache 1.3. It was decided that all attention should be placed on 2.0 development and that no major new features would be accepted into the Apache 1.3 tree. The first Apache 2.0 alpha was launched at the final session of the ApacheCon 2000 conference in March 2000. A number of ASF members on stage updated the website and copied the distribution files into the correct locations live of in front of the audience. At ApacheCon Europe in November 2000, a meeting took place between Ben Laurie (the author of Apache-SSL), Ralf Engelschall (the author of mod_ssl), Mark Cox (Red Hat), and Randy Terbush (Covalent). The meeting was held to decide the fate of SSL support for Apache 2.0, aiming to avoid the current situation of parallel module development for Apache 1.3. The first Apache 2.0 beta was launched at the ApacheCon 2001 conference in March 2001, one year after the first alpha. Between April and November a large amount of internal code changes have taken place, with a few alpha-quality releases. The second Apache 2.0 beta was released in mid November 2001 Apache 2: Release History The first Alpha release of Apache 2.0 was released in March 2000, with the first beta just over a year later 2.0.52 September 2004 Security fix release correcting a new issue introduced in 2.0.51 2.0.51 September 2004 Security and bug fix release. See Apache Week Issue 348 2.0.50 July 2004 Minor security fix release. Includes new mod_log_forensic module. See Apache Week issue 347 2.0.49 March 2004 First release under Version 2.0 of the Apache License; includes fix for a denial of service attack on some platforms, and a substantially rewritten version of mod_include. See Apache Week issue 344 2.0.48 October 2003 Minor bug and security fixes See Apache Week issue 337 2.0.47 July 2003 Security fix release; includes performance fix for PROPFIND response handling in mod_dav. See Apache Week issue 331 2.0.46 May 2003 Security fix release. See Apache Week issue 329 2.0.45 Apr 2003 Security fix release. See Apache Week issue 324 2.0.44 Jan 2003 This release included fixes for two security issues affecting Apache on Windows platforms. It also included bug-fixes. See Apache Week issue 319 2.0.43 Oct 2002 Security fix release. See Apache Week 2.0.43 special feature 2.0.42 Sep 2002 This is primarily a bug-fix release, including updates to the experimental caching module, the removal of several memory leaks, and fixes for several segfaults, one of which could have been used as a denial-of-service against mod_dav. See Apache Week issue 310 Sep 2002 Red Hat include Apache 2 in Linux distributions 2.0.40 Aug 2002 This release fixes a serious vulnerability in Apache 2 on non-Unix platforms such as Windows. See Apache Week 2.0.40 special feature 2.0.39 Jun 2002 This release fixes a denial of service vulnerability in Apache 2. See Apache Week issue 299 2.0.36 May 2002 Second stable release (Apache Week issue 285) 2.0.35 Apr 2002 First General Availability release 2.0.32 Feb 2002 Third beta release (Apache Week issue 284) 2.0.28 Nov 2001 Second beta release Nov 2001 Covalent include Apache 2.0.27 alpha in a commercial product Aug 2001 IBM include Apache 2.0.18 alpha in a commercial product 2.0.16 Apr 2001 First beta release 2.0.14 Mar 2001 Improvements to mod_include and the start of abstracting HTTP specific protocol functions 2.0.11 Feb 2001 This was the first version to use a new release procedure, where the tree would be tagged and depending on the outcome of testing would later be distributed as a alpha, beta, or stable release. An early prototype of SSL support by Ben Laurie was added (Apache Week issue 235) as was a port of mod_proxy to Apache 2.0 2.0a9 Dec 2000 Coverage in Apache Week issue 226 2.0a8 Nov 2000 For this release APR was moved into separate project. Coverage in Apache Week issue 224 2.0a7 Oct 2000 For this release, mod_dav was added. During this alpha cycle, RSA encryption was released into the public domain, removing one of the obstacles for including SSL in Apache. 2.0a6 Aug 2000 This alpha saw the first support for filtering (using bucket brigades). Coverage in Apache Week issue 212 2.0a5 Aug 2000 Coverage in Apache Week issue 211 2.0a4 Jun 2000 The release of the alpha was covered in Apache Today, slashdot, and Apache Week issue 202 2.0a3 May 2000 The release of the third alpha was covered in Linux Today, slashdot, and Apache Week issue 197 2.0a2 Apr 2000 Coverage in Apache Week issue 194 2.0a1 Mar 2000 The first Apache 2.0 alpha was launched at the final session of the ApacheCon 2000 conference. A number of ASF members on stage updated the website and copied the distribution files into the correct locations live of in front of the audience. Announcements were then sent to a number of key sites such as Slashdot and Freshmeat. Coverage in Apache Week issue 190 Apache 2: In the news We highlight some of the news stories from around the web that mentioned Apache 2 In this section we highlight some of the news stories on the web that mentioned Apache 2. May 2001: CNET reviews Apache 2.0.16 Beta and suggests that administrators who are interested to upgrade to Apache 2.0 prepares for the stable release by installing the beta on a development machine. Then test the new features and benchmark its performance in order to speed up the eventual upgrade process. January 2001: Ryan Bloom discusses why the Apache 2.0 beta was delayed October 2000: Apache 2.0 was still in alpha release when an eagle-eyed subscriber in Apache Week issue 219 noticed that the high-profile Napster web site was running Apache 2.0a6. August 2000: C|Net reported on Apache 2.0 in "Apache Web software on verge of major revision". The article interviews a few of the Apache developers and highlights some of the advantages that version 2.0 will bring. They quote that "The final version should be out by the end of the year" (2000) July 2000: In the same week two projects not related to the Apache group announced that they would be integrated into Apache 2.0, even though this was not planned and didn't happen. Apache Week issue 206 looks at the claims from the TUX web server and BXXP protocol projects. Apache 2: Featured Articles We highlight some of the articles on the web that are of interest to Apache 2 users In this section we highlight some of the articles on the web that are of interest to Apache 2 users. In Ryan Bloom's swan song for the Apache 2.0 Basics series, he talks about one of the least publicised new features in Apache 2.0 which is allowing one module to call into another module to execute an operation. In Apache 1.3, for two modules to execute the same operation, the feature has to be implemented in both of the modules, making synchronisation of changes a tedious task. He uses the mod_include and mod_cgi modules to illustrate his points. O'Reilly ONLamp.com brings you the latest information about filters for Apache 2.0 in Ryan Bloom's column. This article is just an introduction to the subject, covering some of the basic concepts of filtered I/O which is the ability for one module to modify the output of an earlier module, listing three standard filters included in the basic Apache distribution, and explaining what filter types are. According to Ryan, developers have improved the interface over the past few releases so that the complex task of writing filters becomes easier. In "Apache 2.0: The Internals of the New, Improved A PatCHy", Ibrahim F. Haddad gives an overview of Apache 2.0 and shares with us the results of his Apache 2.08 performance tests. In conclusion, he highly recommends that current Apache 1.3.x users upgrade to Apache 2.0 once the release version is available. Ryan Bloom kicks off a new series of columns about Apache 2.0 for O'Reilly Network readers with his first column - "Installing Apache 2.0". This piece proves to be merely a rehash of his previous Apache 2.0 articles except for a mention of mod_tls. eWEEK Labs tests Apache 2.0.16 Beta and provides a brief review about its features and shortcomings. In CNet Builder.com, Ryan Bloom explains how Apache 2.0 is more than a web server as it has the potential to serve any protocol. He reveals the benefits of using a single server for multiple protocols and the way to implement it using Apache 2.0. In "Filtering I/O in Apache 2.0", Ryan Bloom explains how filtering in Apache 2.0 works, how modules can make use of it, and the basic concepts for writing filters. Ryan Bloom investigates writing an input filter for Apache 2.0 and shows the power of input filters with mod_apachecon as an example. Ryan Bloom tells C|Net Builder.com readers how to download, build, and install the Apache 2.0 alpha releases. Apache Today gives a concise guide on how to setup and compile Apache 2.0. Apache Today explains some of the new technology that is inside the fourth alpha of Apache 2.0. "Looking at Apache 2.0 alpha 4" takes a detailed look at reliable piped logging and the issues of running CGI scripts from a threaded web server. LinuxPlanet have a feature about Apache, "Using the Apache CVS Repository". The article explains how the Apache developers use a master code repository for the work on Apache 1.3 and 2.0. Anyone interested in keeping up to date with the cutting edge developments of Apache can use the described methods to maintain their own copy of the source tree, whilst easily keeping with the changes being made by the Apache developers. O'Reilly Open Source convention in San Diego Cox, Mark J Orton, Joe Apache Week visited the five day O'Reilly Open Source Conference in San Diego this week and found an overwhelming source of Apache information. O'Reilly Open Source Conference: Day 1 Apache Week visited the O'Reilly Open Source Conference in San Diego this week and found an overwhelming source of Apache information. Day one comprised of a selection of tutorials including several Apache tracks, and a showing of the film "Revolution OS" It is exactly a year ago that we had the pleasure of visiting Monterey California to report on the 4th O'Reilly Open Source software convention (Apache Week issue #208). When we managed to get invited back to San Diego in July 2001 we thought we'd been given the ideal assignment; we get to fly to California in July, avoiding the British rain, and spend a week right on the West Coast with other open source gurus and advocates. In fact with only one direct flight a day from England we were unsuprised to find a large number of delegates on the plane; wearing Penguin badges and snapping pictures of the clear views over Greenland with a variety of digital cameras. To accommodate feeding over a thousand delegates, the conference had erected a huge tent outside the hotel with views overlooking the harbour. It was there we started off Monday morning with the complimentary breakfast. The conference was split over two buildings, with a 10-15 minute walk between the two. With 16 simultaneous tutorial sessions on the first day and with only two Apache Week staff we found it really hard to choose between the talks. We spoke to other delegates who had been similarly overwhelmed by the choice. Apache Week has reported on the ApacheCon and O'Reilly conferences over the last few years, so this time we wanted to avoid the talks that were copies of ones we've already covered. We decided to mix Apache talks with others that seemed new or interesting. Matt Sergeant gave the first tutorial we visited on his XML application server for Apache, AxKit. AxKit performs a similar function to the Apache Cocoon project, but is written in Perl and C rather than Java. Matt even describes AxKit as "the C version of Cocoon". AxKit was born to as a way of collecting together the various Perl XML technologies and using them to deliver the same XML data in different formats. The use of XML allows for the separation of content, presentation, and logical site management. The tutorial focussed on the various Perl XML tools available, the evolution of AxKit, and ways to use the result to power both static and dynamic sites. Matt highlighted some exciting and powerful features of AxKit: the intelligent compression of pages being returned to the client (gzip), the ability to parse and serve OpenOffice files on the fly, and AxPoint which powered his presentation by converting an XML outline to PDF. AxKit allows any number of ways to process the XML for output; from the well known (but steep learning curve of) XSLT to XPathScript which has been designed to allow easy dynamic functionality and is also found within Cocoon. Future plans for AxKit were covered, these included a port to Apache 2.0 and a complete Content Management System. **************************************** not used potential use on Apache Week, powerful blah 24 people Ran through the Perl tools and modules used for parsing, including the pyx tools that we had not come across before. SAX is exciting as it allows a streat of XML to be chained with no tree stored in memory. Q"PHP XML solutions are not very strong" Q"Programming in C always makes you feel a bit more powerful" AxKit leverages Apache for efficient caching Can use Apache::Filter to get XML from a previous handler XSLT; Q"best to get a stylesheet guru to convert that" ************************************************************ After the provided lunch we headed over to the Perl for System Administrators talk. The presenter, David Blank-Edelman, played music and danced around the hall to get into the mood for the tutorial. The talk had a heavy bias towards security, giving reasons why administrators should be paranoid and numerous stories and anecdotes about hacks and security vulnerabilities. David suggested some best practices that can help protect your scripts; for example there is no need to run a log analysis script as root. Other areas where users can overlook potential security problems are when appending to files, or creating temporary files in Perl. Although this talk was primarily about Perl, David made the important point that "a cutting sysadmin is platform agnostic", and his tips applied as much to sysadmin scripts as to CGI programs. Also that afternoon, Jim Whitehead presented a tutorial on WebDAV and Apache. Jim, the chair of the IETF's WebDAV working group, began by giving a brief overview of authoring over HTTP, and gave examples of how collaborative web authoring can take place using WebDAV. The current state of client and server support was described, and an insight into some of the future extensions of the DAV protocol was given (including versioning, searching and access control). The talk continued giving a detailed description of the DAV protocol, explaining the support for properties, and the overwrite prevention mechanisms. The tutorial finished up with a guide to setting up the WebDAV module for Apache, mod_dav, covering the basic operation of the module and the usual configuration issues. Jim noted that Apache 2.0 bundles mod_dav inside the source tree, making it easier to set up than Apache 1.3, where mod_dav must be compiled as an external module. In the evening we took a coach to a local multiplex cinema for the west coast premier of the film "Revolution OS" by director J.T.S. Moore. The aim of the film was to document the history of the open source movement from Richard Stallman's founding of the GNU project, through the VA Linux IPO, to events taking place today. The film focussed on the key people responsible for a few of the historical turning points in the movement. Early into the film, Eric Raymond said that "Apache was the killer app[lication]" and was responsible for the mass adoption of the Linux operating system. A number of other key people were interviewed including Brian Behlendorf from the Apache Project and Michael Tiemann from Red Hat. We were impressed at the balance and accuracy of the film, especially the positive way the people interviewed were portrayed. The film would be interesting to engineers as well as outsiders. At the end of the film the director took questions from the audience aided by Eric Raymond and Bruce Perens. They explained that the film took two years to make and was planned to be shown in the future at film festivals and other conferences. ************************************* old stuff For the first tutorial of the day we wanted to learn more about the ..... Mention multiple conferences at the same place and how press can get between them Mention that we were expecting deja-vu talks on Apache from Ryan Bloom and the like Matt is doing 7 sessions O'Reilly Open Source Conference: Day 2 Apache Week visited the second day of tutorials at the O'Reilly Open Source Conference in San Diego, which ended with a lightning talk from Perl creator Larry Wall We kicked off the second day much as the first, spending our breakfast trying to decide amongst the 17 simultaneous tutorials. Amongst the sessions we didn't get to see was Ryan Bloom's "Writing an Apache 2.0 Filter" which was given to a small, but enthusiastic group of developers. We hear a lot of positive comments from people using the python-based Zope application server so decided to attend the tutorial "Introduction to Zope" given by Mike Homyack. Mike ran through what Zope is, and its architecture, telling us that "Zope is full Object-Orientated" and "really good at dynamic stuff". Zope has a built in server, z-server, that handles access to the internal content via a number of mechanisms including HTTP, FTP, and DAV. It is usual to let Zope handle all your web site content, but in most situations another server such as Apache or a reverse proxy such as Squid is placed in front in order to accelerate any static content. The main zope.org site itself uses Zope together with Apache; using Rewrite rules to proxy and cache requests to a Zope backend. Zope currently has its own license but we were told that there was "motivation to give Zope some license like Python" to make it GPL compatible. Zope is in production use by some major companies including CBS New York. At the same time as the Zope tutorial, Bruce Momjian gave an introductory tutorial on the PostgresSQL database. Attendees received a complimentary copy of Bruce's book, which the tutorial was based upon. Only a small amount of database expertise was presumed so this talk was very open to beginners. The half-day session allowed many chapters of the book to be covered in reasonable detail, starting with the basic architecture of a database, how to input data, modify data, and make simple queries. The talk then progressed to describe the construction of more complex queries, joins, and how to utilize the relational database capabilities of PostgresSQL. Bruce also presented a follow-up tutorial in the afternoon, covering some of the more advanced features. In the afternoon we visited a talk on "Secure Internet Servers and Firewalls with OpenBSD". Although not directly related to Apache, it was interested to see how much security had been added into the OpenBSD system by default. OpenBSD ships with an SSL-enabled version of Apache by default. We were also lucky to catch the second of a pair of tutorials by Mark-Jason Dominus, entitled "Stolen Secrets of the Wizards of the Ivory Tower". In an enigmatic talk, a set of Perl programming techniques were described including Memoization, the use of iterators, and drew particular attention to closures and anonymous subroutines. The obscure title alludes to the LISP heritage of many of these ideas. In the evening Larry Wall gave an entertaining and lightning talk on the new features in Perl 6. Larry's talk didn't touch on anything Apache related, so if you are interested read all about it in "The State of the Onion 5" at perl.com. O'Reilly Open Source Conference: Day 3 Apache Week visited the O'Reilly Open Source Conference in San Diego last week and found an overwhelming source of Apache information. Today we discovered why Apache was important to large enterprise customers, how to tune mod_perl, the many ways of XML content management, and much more. Wednesday started as usual with the complimentary breakfast. With 14 simultaneous talks split across the two hotel blocks we spent most of our breakfast choosing which to visit. Four of the days tracks were dedicated to Perl, two to XML, and the remainder split across Tcl/Tk, Mozilla, mod_perl, Java, MySQL, Python, and Emerging Topics. The dedicated Apache track was due to start on Thursday. We noticed that the number of Perl tracks had shrunk slightly this year, with other open-source technology tracks becoming more prominent. In particular we were pleased to see the two XML tracks, something we said was missing from last year. Before the keynotes of the day a short film was shown which was made up from interviews of the various conference attendees during the tutorial days. Tim O'Reilly appeared on stage and reminded the packed ballroom that we should "think the Internet" and think of "technologies such as Apache, PHP" and not just Linux. Fred Baker, previous chair of the IETF, gave his keynote presentation titled "Will the next Internet generation still depend on open source?". He explained that although Linux was the only real technology that could threaten Windows and that successful open source is "all about getting good documentation and predictable quality". He welcomed the involvement of commercial interests in open source: "Once the open source technology has to be used by real people then real companies have to do code freezes and manage the development in a way that makes a quality product". He predicted that in the coming years we'll see more open source projects in partnership with the business world. Open source leads to rapid prototyping and exploratory code, with the business partnerships being able to productise them. W. Phillip Moore from Morgan Stanley Dean Witter then took the stage to show "an open source success story on Wall Street". He showed why open source was important to their business, allowing them to tailor existing applications to their complex environment with a bit of Perl glue thrown in. MSDW are an enterprise class business that have decided to slowly migrate from using Sun hardware with Solaris to using commodity hardware and Linux, with Apache as their primary web server. They've also made contributions back to open source, and have been covertly submitting patches back into the community as well as funding open source development. "It all comes down to vendor risk management", he said, with proprietary software "you're placing a bet on the security of that company and the security of their product, a bet you're not always aware you're making". With open source this dependency is removed and it's possible to get enterprise level support for open source software from a number of vendors. Also taking place at the convention was the O'Reilly summit on Open Source strategies, aimed at CTOs, CIOs, and CEOs who want to find out how to use open source as a strategic advantage. Although this summit was separate to the main conference we decided to take a look at the opening talk given by Tim O'Reilly, and the subsequent panel discussion with the economist Hal Varian, Brian Behlendorf, and Michael Olsen from Sleepycat. To begin the session, Tim O'Reilly discussed the reasons underlying the success of the Internet and Open Source software, finding many common themes. The highlights were the emphasis on decentralisation, the combination of many small modules into large complex systems, and the ability to easily extend existing technologies - all important to the wide adoption seen in both arenas. By looking at current trends, Tim talked about some emerging projects which may prove key to the Next Generation Internet. One of the biggest challenges for Open Source and Internet companies is the search for an appropriate business model. The panel discussion which followed the talk gave many interesting insights from those who have been successful in that search. Brian Behlendorf spoke about the need to identify which intellectual property is released freely, and which is "owned" by the company generating it. All speakers noted that embedded systems would be increasingly important. After lunch, Apache Software Foundation member Ask Bjoern Hansen gave a talk on how to use mod_perl in an efficient way. He explained that it is generally preferable to use mod_perl statically compiled into Apache instead of as a dynamic (shared object) module. However, by doing this you end up with a server that has a much larger memory footprint and since the majority of the time the server is dealing with buffering data to slow clients, this is wasted overhead. The solution presented was to run a separate server that has mod_perl compiled into it behind a reverse proxy. Apache can also be used as this reverse proxy and can serve static content as well as cache the content created by the dedicated Apache+mod_perl server. In this way the memory usage can be decreased and performance increased. The slides from the full presentation are available online. *** Be interesting to write a white paper comparing this to the gains from something like Tux or another accelerator in front of mod_perl. This is also the way you can run Apache 2.0 and still reliably use mod_perl *** There were a large number of talks throughout the conference on SOAP and XML-RPC. Matt Sergeant took a step back to examine what all the fuss was about in a short talk renamed "Why SOAP sucks, Why SOAP rocks". He started out by asking why we are using SOAP when we could use HTTP instead, since HTTP already has all the features that are normally needed, and more. Using HTTP natively allows caching and logging for example. The talk then showed how to do SOAP without SOAP; using mod_perl to control the URL space and using Perl HTTP modules for the transport. The current major advantage of SOAP is that modules such as the Perl SOAP::Lite module exist which allow applications to be developed quickly and easily. There currently is no simple library that would do the equivalent directly over HTTP. Finally we were shown some services that are already doing the equivalent of a SOAP transaction without SOAP; such as the ability to get search results from Google in XML format (for example try http://www.google.com/xml?q=apacheweek). The slides to this talk are available online. For the remainder of the afternoon we visited the XML track; in particular we were interested in XML application servers. The first session "XML Content management using XSLT, Schematron and Ant", showed one extensible way of serving XML content to browsers. Following that talk a panel discussion "XML-based Application Frameworks" took place. The basic idea of an XML application server is that you create all the content for your site in XML. The use of XML allows the separation of content from presentation, a useful extra abstraction layer. The XML content can come from static files, from a database, or be dynamically generated content from scripts. In its simplest form you take your XML content then apply a style-sheet to generate HTML for a browser. Application servers usually perform this style-sheet conversion on the fly, caching the results for speed. XSLT is one language that is used to transform XML data in this way. Tools also exist that will take XML and generate PDF, Postscript, presentations, (and more) on the fly. The most well-known open source XML application server is Apache Cocoon, which relies on Java. Other solutions such as AxKit (C/Perl/mod_perl), Charlie (C/C++/Perl/mod_perl), and technologies such as Xerces/Xalan (Java), and Sablotron (Java), and LibXML/LibXSLT (C), are also available. Even scripting languages such as PHP now have their own XML solutions, although during his tutorial earlier in the week mod_perl guru Matt Sergeant said that the "PHP XML solutions are not very strong". When the attendees were asked which application server they were using for their applications, the majority said they were using a system they developed themselves (home grown) from the underlying technologies. The rest were a pretty even split between the application frameworks listed. However, having such a wide choice of technologies and servers is no bad thing. As one panel member said "no matter what, if your content is in XML you win". Brian Ingerson presented this talk on the award-winning Inline module (which only celebrated its 1st birthday a few days before the conference). Inline.pm allows programmers to embed code from a variety of programming languages directly inside a Perl script, from C, C++, and assembler through to Java and Python. Brian covered some of the advanced features available when using using embedded C, notably caching of compiled object files. A demonstration was given showing some "one-liners" using Inline.pm, including an ASCII Mandelbrot set generator. The talk went on to discuss some of the different ways to use Inline.pm: replacing the traditional usage of XS and MakeMaker, and also explained how to extend the module to support new languages. O'Reilly Open Source Conference: Day 4 Apache Week visited the O'Reilly Open Source Conference in San Diego last week and found an overwhelming source of Apache information. Today we discovered about Microsoft and the Apache license, where is Apache httpd 2.0, and had a demonstration of mod_perl 2.0, Wednesday had ended with a night of Mexican food and drink in the conference tent, followed by a party from Stonehenge. Even with all the free drink and food the night before, by 8.45am on Thursday the ballroom was packed for the much anticipated debate between Craig Mundie of Microsoft and Michael Tiemann of Red Hat. The details of the debate has been covered in a number of other articles. However, we were interested in the comments with relevance to Apache made during the panel discussion. Craig Mundie stated that Microsoft's concern was not about open source but "about the GPL" as it "creates it's own closed community". Tim O'Reilly commented that University licenses (like the BSD License and Apache Software License) "give the best balance between freedom and the right to make money". Also on the panel was Apache Software Foundation member Brian Behlendorf, who said the Apache model has worked well to build up momentum. Although with the Apache license there are no obligations placed on commercial users, history has shown that the companies involved do re-invest and give back to the community. aspects of open source movement - and free software movement press has been confused Microsoft comments are around free software movement open source isn't the issue said Mundie Microsoft is trying to learn form the open source community at large since may we've tried to get our developers involved in community programs MT better to be open than to seem open we want ms to do what is right it's like the alternative minimum tax; it's a proprietary software license bb: important while building community companies - build Apache + momentum no obligations but the companies do need to re-invest, to build it back up [look for transcript] cm: give back other things, standards, not just code "Packaging is more important" (binaries etc) CM: "we are in the business of licensing IP" BB: "like DNS" CM: "our concern about GPL is it creates its own closed community" TOR: Uni license like ASFL "best balance between freedom and the right to make money" With the provocative title "Apache 2.0; where is it?", Ryan Bloom proved a popular start to the Apache track, with over 80 attendees packed in to hear his session. The aim of the talk was to cover what was new in Apache 2.0 but also answer the question of why Apache 2.0 is taking so long. Ryan explained that since Apache is now so big there are "only three or four people who know 100% of Apache 2.0", and that fortunately he was one of them. The new features of 2.0 were then explained, stopping at Layered IO which is "the Holy Grail" of Apache. Ryan then gave a demonstration of Apache 2.0 acting as a POP3 server to show that it is easy to have Apache serve up other protocols as well as HTTP Apache Week asked Ryan if he was correct in using the name "Apache 2.0" throughout his talk given that the Apache group have a number of other products and that the binary downloads have been renamed to "httpd". Ryan said that the name was officially "Apache httpd 2.0" but hinted that there was talk of changing the name to something other than httpd in the future. To answer the question of Apache 2.0 availability Ryan said that he expected to see a full release "next year." After attending the PostgreSQL tutorial on Monday, we decided to follow up with this talk from Gavin Roy, which gave a practical guide to using PostgreSQL in web applications. Gavin gave an overview of which web platforms could make use of a PostgreSQL database (for instance, PHP and Perl), and gave testimony to the product's reliability and performance in large scale web applications. The talk proceeded to discuss the architecture of systems using a web server together with a PostgreSQL database, covering the advantages and disadvantages of using a single machine or two separate machines. Some tips on optimising performance in a production database were also given, emphasizing the use of database indices, and regularly vacuuming the database. In closing, Gavin briefly covered security, authentication and authorization issues when using Postgres in a web environment. http://www.blackperl.com/OSCON/openrpc/ Intro to OpenRPC. Explain why not XMLRPC or SOAP. Abstraction of XMLRPC to separate out transport layer. Allow transport over non-HTTP (SMTP, etc) "XSLT and scripting languages" we've already covered XSLT primarily a language between XML vocabularies integrated with .net, Apache, IE, Java etc. XSLT is not a scripting language it is a programming language in a way PHP was optimized for a task therefore it is popular XSLT makes input based recursion easy To end the day we were expecting good things from Doug MacEachern's talk on "mod_perl 2.0". We were not disappointed as over 50 people packed into the last mod_perl session to hear a heavy technical talk about Apache 2.0 and mod_perl. Doug showed Apache 2.0.22-dev working with both mod_ssl and Perl/mod_perl. This is perhaps the first demonstration of its kind, as mod_ssl is only just becoming usable in the Apache 2.0 tree. He continued and took a program that communicated entirely using stdin and stdout (in this case a NNTP server) and showed how it was easy to make this function as a Apache protocol handler. This allowed Apache to serve newsgroups to his news reader, whilst still allowing other filters to be included such as SSL and authentication. Future plans for mod_perl 2.0 include the ability to write a MPM completely in Perl, and to continue with the Apache-TestKit, a package not tied to mod_perl that has been designed to test Apache. Doug said that there was still plenty left to do on mod_perl even though it currently seems stable and that there would be "probably a release of some sort at the end of the summer." At the same time as the talk on mod_perl, Paul Weinstein was giving his popular introduction to mod_ssl in the Apache track. The history of mod_ssl for Apache 1.3 was discussed together with some of the decision making process for including mod_ssl in Apache 2.0. The slides to this talk are available online. In a pair of talks which attracted 60 people into a room designed for 40, the speaker known as "chromatic" described the basics of the Extreme Programming (XP) software development method, and in particular their application in the Open Source world. The first talk gave an introduction to XP, its differences from more traditional software development, and the motivations behind the techniques it uses to promote the development of high quality software. The talk highlighted that the most important aspect of XP is the emphasis on writing unit tests, and also covered the principles of incremental change, and pair programming. The room remained packed into the second half of the session, where chromatic discussed how XP can be used within Open Source software (OSS) development. Some elements of XP are already employed in many OSS projects, for instance, the tight feedback loop between users and developers. Many other XP techniques could also be usefully employed, but some, such as pair programming, were considered inappropriate in the majority of Open Source development. O'Reilly Open Source Conference: Day 5 Apache Week visited the O'Reilly Open Source Conference in San Diego last week and found an overwhelming source of Apache information. We found out why enterprise customers like commercial support, all about the APR, and visited the exhibition Last year, the conference sessions were held over just two days and we were pleased to see they were extended to a third day to fit in more presentations. Friday consisted of the extension of tracks from previous days together with tracks dedicated to PHP, Zope, and Open Source Speech. After breakfast, Michael Tiemann was the moderator for the morning keynote looking at the "big hairy problems: open source challenges in the enterprise". The first speaker was from DreamWorks, the animation company behind such epics as Antz, Chicken Run, and now Shrek. He told us how DreamWorks were slowly switching thousands of machines from SGI to Linux giving them increased performance and value for money. When working on their strategy for adopting Linux they analysed six key factors: performance, scalability, stability, software, support, and transition. W. Phillip Moore from Morgan Stanley Dean Witter took the stage and built upon his previous keynote. He explained that it was important that the enterprise customers have a support number they can call with problems, the ability to get fixes to existing problems, and the ability to get enhancements. He complimented Covalent and Red Hat specifically but said that there was a need to see more companies providing commercial support for open source software: "you need to know there is a 800 number and a staff of people that will be able to solve the problem." thing about open source is the low barriers to entry said Tim o embrace new technology to make.. "using animation to tell stories" Shrek: 30% desktops (200) Linux, 50% renderfarm (1000+) Linux "XML" stuff - not exciting for AWeek Spent the morning at Pervasive XML; infosec infoset to XML is like number to numeral pcdata for 23/23.0/2.3e1 is not the same - needs schema Post Schema Validation InfoSet (PSVI) says they are the same; knows about the objects Formatting objects: Rendering XML Ryan Bloom gave this talk on APR, the Apache Portable Run-Time, which began with a quick history lesson explaining how Apache 1.3 addressed portability issues, and how APR and Apache 2.0 grew out of that experience. Ryan explained what the initial goals for the library were, and showed how it provides an abstraction layer for commonly used operating system interfaces which has been ported to a range of 50 Unix platforms, BeOS, Windows, and OS/2. The talk gave a breakdown of the different components which make up APR: from file and network I/O, memory handling, through to some of the more complex interfaces providing threading support. For each component an overview of the API was given, showing how it could be used in applications. Ryan also gave an insight into why various OS interfaces (such as POSIX) cannot be used portably, justifying the need for the abstraction layer which APR provides. To give a more in-depth look at the API, the talk gave a walk-through of a code sample using the threading interface, and took a look at some of the test code present in APR which exercises most of the library's capabilities. Although APR's primary user is the Apache httpd server, the library is also used by a number of other projects such as Subversion. Paul Weinstein closed off the afternoon with his talk all about private certificate authorities. The session showed the basics of how to create and then use a private certificate authority, then went into the more advanced details. The examples were based around the OpenSSL toolkit; showing which parameters to use on the OpenSSL command line, and how to integrate the certificates into Apache with mod_ssl. Finally the tricky subject of certificate revocation was covered. The slides to this talk are available online. Oh, I think around 30 people or so, less then the frs presentation. There where a few questions on supporting MS products with the certificates created by OpenSSL (both IIS and IE) an few on keeping tack of what's been create and how it relates to everything (i.e keeping track of private and public keys for CA, client, server, CSR files for client, server where they go what file is for what step) and a few on creating a security policy for the CA, how to verify information within a company, how strict/informal you can be about managing your CA) http://www.weinstein.org/redhat/presentations/ The vendor exhibition area was very popular with a large number of companies attending. We didn't find much information specific to Apache at the exhibition: NuSphere were giving out MySQL CDs that come with a packaged version of Apache, and Red Hat had some information on their Apache services. However there were plenty of free promotional t-shirts to add to our collection, as well as more of the flashing clear rubber bouncy balls we picked up last year from collab.net. Oh, and let's not forget the "Apache by night" Apache Week postcards of course. Overall Impressions Apache Week reflects on the busy five days of the O'Reilly open source conference. We will be back to normal next week and catching up on the Apache news and features from the last few weeks. Even if you were not interested in any of the other tracks there were plenty of talks and tutorials relevant to Apache users, although a number of them were direct copies or updates of talks given at previous Apache conferences such as ApacheCon 2001. Apache Week talked to a large number of the attendees of the conference and the overall impression was very positive. One attendee said that "the keynotes alone were worth the trip". We were also particularly impressed by the child care facilities; allowing conference speakers and participants to bring their families and enjoy a mini holiday in San Diego. The night-time activities and the food was also excellent. The only complaint we heard repeated by a number of attendees was that lunch was not included on Friday, even though there was a full day of sessions. With 802.11b wireless internet connectivity to most of the conference rooms it was hard to escape from work; and with five intensive days packed with new material we found ourselves tired and in need of a holiday by the end of the week. Next time we'll bring our swimming trunks and sun cream. Please note that although Apache Week is an O'Reilly Network affiliate, O'Reilly had no editorial control over this review of their conference, even though they did give us free beer. Apache Week will give you our unbiased opinion of all the conferences we attend that have things of interest to Apache users and developers. For more coverage of the rest of the conference visit the O'Reilly Network web site. Larry wall quotes I don't want to lose: on debuggers: I'm an -insert print statement- guy" on security: "For some definition of security, Perl 6 will be secure" *** Linux on S390 could become strategic said MSDW who use Covalent *** New in Apache 1.2 All about what is coming up in Apache 1.2 Coming in 1.2 The next release of Apache will be version 1.2. The big new feature in 1.2 will be support for HTTP/1.1, the new version of the Hypertext Transfer Protocol. Apache 1.2 will fully support this protocol as a Web server (except for the proxy module). HTTP/1.1 is now going through the final stages of approval as an Internet standard, and Apache 1.2 cannot be fully released until it is approved, since there might be changes. However, HTTP/1.1 is not the only change in Apache 1.2. There will be quite a few other new features, new optional and core modules, and many bug fixes. In this feature, we list all the major changes coming in 1.2. Each change is accompanied by a link to the Apache Week issue where it is reported in more depth. Better Content Type Negotiation (Apache Week 16 August 1996) To overcome problems with some browsers, Apache's content negotiation algorithm has been updated to better guess what content type the browser wants. Conditional configuration directives (Apache Week 26 July 1996) Configuration directives can be ignored if the module they are defined in is not compiled in. Customising for Browsers (Apache Week 09 August 1996) It will be possible to set environment variables based on the browser's user-agent text. This will give CGI and SSI scripts a simple way of customising their output based on features available in the browser. Easier CGI script debugging (Apache Week 09 August 1996) It will be easier to debug CGI scripts, because Apache can now log the full input and output of the scripts. Faster persistent connections (Apache Week 02 August 1996) Various changes have been made to increase the speed of persistent connections. File, Directory and Limit can take regular expressions (Apache Week 02 August 1996) The files and URLs affected by each of these sections can be defined by regular expressions. Graceful Restarts (Apache Week 28 June 1996) Apache can re-read the config files and re-open log files without terminating transactions in progress. HTTP/1.1 support (Apache Week 16 August 1996) Apache will be 'unconditionally compliant' with the HTTP/1.1 specification (except for mod_proxy). Limit access on per-file basis (Apache Week 02 August 1996) A new section, <File>, can be used to restrict access on a file-by-file basis. It will now be possible to (for example) password protect a single file. More SSI Commands (Apache Week 09 August 1996) The 'extended SSI' module (XSSI) will replace the current server-side-includes module. This will give a number of powerful new features, such as the ability to set variables and use conditional statements. Multiple configurable log files (Apache Week 16 August 1996) More than one log file can use used, with the log format fully customisable. This reduces the need for addition log modules (mod_log_referer or mod_log_agent), and make it much easier to add customised log files and formats. PICs module (Apache Week 12 July 1996) An optional module will be included which can provide PICS labels. Resource limits for CGI scripts (eg max CPU time) (Apache Week 02 August 1996) To prevent runaway processes, the resources used by CGI scripts can be limited. Rewrite module (Apache Week 09 August 1996) The 'rewrite' module will be included with Apache for the first time. This module can be used to map incoming URLs onto other URLs, using regular expressions. Setuid CGI execution (Apache Week 09 August 1996) Apache will support the execution of CGI scripts as users other than the server user. A number of security checks will be built in to try and make this as safe as possible. Simplified configuration file format (Apache Week 09 August 1996) The process of configuring Apache for compilation has been simplified. User Tracking (cookie module) Updates (Apache Week 02 August 1996) It will be possible to disable the generation of cookies, even when the cookie module is compiled in. Also, an expiry time can be set of the cookies. A number of bugs from 1.1 have been fixed. Some of the major ones are: KeepAlive connection problems on some browsers If ErrorDocument redirect fails, displays filename Negotiation module negotiates on proxy requests (eg proxy:") then fails Scoreboard out of date (shows PID of children that have died) Problem with ScriptAlias including path info data <Location> matches directory sections support HTTP continuation headers mod_dir truncates file size Updates for QNX, OS/2, A/UX, IRIX, AIX compile warnings or system-specific behaviour Apache 2.0 Preview A preview to development work on the next generation of Apache, version 2.0 Apache 2.0: The Next Generation Over the last few months, we've received many queries about why Apache Week had little to report of Apache 1.3 development. Most of the Apache developers have been hard at work writing the next generation of Apache, version 2.0. Ryan Bloom takes time out to summarise the development effort. First published in Apache Week issue 173 (24th September 1999). It has been about a year since Apache 1.3 was released, and the core Apache members are now working on version 2.0. The new version will be significantly different to the current one, which raises issues such as "Why update Apache at all?" and "What does this update mean for Apache administrators?" We hope to answer those and many other questions in this article and, as the release of 2.0 approaches, provide more up to date information. It is important to note that presently there is only development code available for 2.0 and that downloading it now is not advised for anybody other than those who are already familiar with the Apache internals. The code in its current state is not guaranteed to compile from day to day or to work on many platforms. Apache Week will announce any upcoming alpha or beta versions and the details of the 2.0 release as soon as they are ready. Apache 1.3 is a great web server which serves pages for the vast majority of the web, but there are things it can't do. Firstly, it isn't particularly scalable on some platforms. AIX processes, for example, are very heavy-weight and a small AIX box serving 500 concurrent connections can become so heavily loaded that it can be impossible to telnet to it. In situations like this, using processes is not the right solution: we need a threaded web server. Apache is renouned for being portable as it works on most POSIX platforms, all versions of Windows, and a couple of mainframes. However, like most good things, portability comes with a price which in this case is ease of maintenance. Apache is reaching the point where porting to additional platforms is becoming more difficult. In order to give Apache the flexibility it needs to survive in the future, this problem must be resolved by making Apache easy to port to new platforms. In addition, Apache will be able to use any specialised APIs, where they are available, to give better performance. Multiple-Processing Modules (MPM) The original reason for creating Apache 2.0 was scalability, and the first solution was a hybrid web server; one that has both processes and threads. This solution provides the reliability that comes with not having everything in one process, combined with the scalability that threads provide. The problem with this is that there is no perfect way to map requests to either a thread or a process. On platforms such as like Linux, it is best to have multiple processes each with multiple threads serving the requests so that if a single thread dies, the rest of the server will continue to serve more requests. Other platforms such as Windows don't handle multiple processes well, so one process with multiple threads is required. Older platforms which do not have threads also had to be taken into account. For these platforms, it is necessary to continue with the 1.3 method of pre-forking processes to handle requests. There are multiple ways to deal with the mapping issue, but the cleanest is to enhance the module features of Apache. Apache 2.0 sees the introduction of 'Multiple-Processing Modules' (MPMs) - modules which determine how requests are mapped to threads or processes. The majority of users will never write an MPM or even know they exist. Each server uses a single MPM, and the correct one for a given platform is determined at compile time. There are currently five options available for MPMs. Their names will likely change before 2.0 ships, but their behaviours are basically set. All of the MPMs, except possibly the OS/2 MPM, retain the parent/child relationships from Apache 1.3. This means that the parent process will monitor the children and make sure that an adequate number are running. PREFORK This MPM mimics the old 1.3 behaviour by forking the desired number of servers at startup and then mapping each request to a process. When all of the processes are busy serving pages, more processes will be forked. This MPM should be used for older platforms, platforms without threads, or as the initial MPM for a new platform. PMT_PTHREAD This MPM is based on the PREFORK MPM and begins by forking the desired number of child processes, each of which starts the specified number of threads. When a request comes in, a thread will accept the request and serve the response. If most of the threads in the entire server are busy serving requests, a new child process will be forked. This MPM should be used on platforms that have threads, but which have a memory leak in their implementation. This may also be the proper MPM for platforms with user-land threads, although there has not been enough testing at this point to prove this hypothesis. DEXTER This MPM is the next step in the evolution of the hybrid concept. The server starts by forking a static number of processes which will not change during the life of the server. Each process will then create the specified number of threads. When a request comes in a thread will accept and answer the request. At the point where a child process decides that too many of its threads are serving requests, more threads will be created. This MPM should be used on most modern platforms capable of supporting threads. It should create the lightest load on the CPU while serving the most requests possible. WINNT This MPM is designed for use on Windows NT. Before Apache 2.0 is released, it will also be made to work on Windows 95 and 98 although, just like Apache 1.3, it is unlikely to be as stable as on NT. This MPM creates one child process, which then creates a specified number of threads. When a request comes in it is mapped to a thread that will serve the request. OS/2 This MPM is designed for use on OS/2. It is purely threaded, and removes the concept of a parent process altogether. When a request comes in, a thread will serve it properly, unless all of the threads are busy, in which case more threads will be created. Multi-processing modules are designed to work behind the scenes and do not interfere with requests in any way. In fact, its only function is to map the request to a thread or process. One advantage of this technique is that each MPM can define its own directives. This means that if you are using a PREFORK MPM, you won't be asked how many threads you want per server, or if you are using the WINNT MPM, you won't need to specify the number of processes. Will Apache 1.3 Modules work? Modules written for 1.3 will not work with 2.0 without modification. There are many changes which will be documented by the time 2.0 is released. In Apache 1.3, each module uses a table of callback routines and data structures. Instead of using this table to specify which functions to use when processing a request, 2.0 modules will have a new function to register any callbacks needed. In the past, new features have been added to subsequent releases of Apache which required the callback table to be expanded causing existing modules to break. In 2.0, each module is able to define how many callbacks it wants to use instead of using a statically defined table with a set number of callbacks. If the Apache Group decides to add callbacks in the future, the changes are less likely to affect existing modules. Many things have been abstracted in Apache 2.0 and there are many new functions available. This means it will no longer be possible to access most of the internals of Apache data structures directly. For example, if a module needs access to the connection in order to send data to the client, it will have to use the provided functions rather than access the socket directly. The Apache Portable Run-Time (APR) APR was originally designed as a way to combine code across platforms. There are some sections of code that should be different for different platforms as well as sections of code that can safely be made common across all platforms. Apache on Windows currently uses POSIX functions and types that are non-native and non-optimised for communicating across a network. By replacing these functions and types with the Windows native equivalent there has been a significant performance improvement. For example, spawning CGI processes is very confusing in Apache 1.3 because Unix, Windows, and OS/2 all handle spawning in different ways. By using APR, the logic can be combined for spawning CGI processes, decreasing the number of platform-specific bugs that are introduced later. APR will make porting Apache to additional platforms easier. With a fully implemented APR layer any platform will be able to run Apache. APR is small and well defined and once it is fully integrated into Apache, will change very little in the future. Apache has never been well defined for porting purposes as there was too much code to make porting a simple task. In addition, the code was originally designed for use on Unix, which made porting to non-POSIX platforms very difficult. With APR, all a developer needs to do is implement the APR layer. APR was designed with Windows, Unix, OS/2, and BeOS in mind and is more flexible as a result. APR acts as the abstraction layer in Apache 2.0. To allow the use of native types for the best performance, APR has unified functions such as sockets into a single type which Apache will then use independently of the platform. The underlying type is invisible to the Apache developer, who is free to write code without worrying about how it will work on multiple platforms. When, When, When? Apache 2.0 is a major re-working of Apache that will hopefully result in a web server that can continue to grow and serve the web. As has been traditional with previous Apache releases, the 2.0 upgrade will be made available when it is ready and stable. There is no promised release date although it is hoped that a beta version will be available either late in 1999 or early in 2000. This article covers some of the major changes in Apache 2.0, such as MPMs, module callbacks, and the abstraction layer. Future editions of Apache Week will report on the progress of Apache 2.0 and highlight any major developments. ApacheCon 2000 Europe Conference Report Report from the third Apache conference Report from ApacheCon Europe 2000 This is a special report covering the ApacheCon 2000 conference in Europe held in London. First published 3rd November 2000. ApacheCon Europe 2000, the first ApacheCon outside USA was held on Apache Week's home ground from October 23rd to October 25th. As promised, Apache Week was there in London to cover the conference. Early Monday morning at 8 am, we had a brisk walk from the Hilton London Olympia hotel where we were staying to the Olympia Conference Center about three blocks away. There was no fear of losing our way as shortly after leaving the hotel, we were greeted by a succession of signboards displaying the familiar Apache feather, leading us straight to the conference center. Our first day did not really get off to a good start as during the registration our records were not found in the database. Luckily the organizers were efficient enough to resolve this problem quickly and we were handed our passes and complimentary ApacheCon bags containing three thick manuals of conference proceedings and other goodies. As the conference package included light breakfast and lunch for all days registered, we all had empty stomachs that morning. We really should have taken the word "light" literally as to our dismay breakfast consisted of only one plate of biscuits per table, and tea or coffee. The only difference was you could keep your dirty cup after you had drunk your coffee or tea. As this was the case, there were plenty of seats for us to take our pick but like the other attendees, we did not stay long for breakfast. The conference had three main parallel "unthemed" tracks of classes or talks with one hour, one and a half hours or two hours time slots. There were a total of 42 classes, covering the Apache web server, XML, Java, mod_perl, PHP, and a case study of the real-life implementation of the Apache web server. The classes were spread over three days, including a busy Monday that packed in 21 of the 42 classes. There was also an additional concurrent track of talks by vendors namely, Sun Microsystems, IBM, MyComponents.com, and Oracle, with a sprinkling of BoFs (Bird of a Feather sessions) as well. As we were approaching the auditorium for the opening session at 9 am, strains of a western tune drifted to our ears and for a split second when we stepped into the room, we thought we were transported back in time to the wild, wild west as the formidable figure of Ken Coar loomed above us on the stage with a cowboy hat on his head. Later he revealed that the piece of music we heard was "Apache", one of the many western-themed hits by the Shadows who reigned unchallenged as Britain's top band between 1960 and 1963. After the welcoming speech, Ken Coar proceeded to give an update on the schedule where one talk was cancelled and a few were swapped. This was not good news for those who had already decided on the talks that they were going to attend. For the few unlucky ones, this change caused their chosen talks to be back to back so they have to go through the mind-boggling task of making a choice again. The official number of pre-registered attendees was about 900. For my first of the seven classes on the first day, I decided to attend "Toward the Semantic Web: a View of XML from Outer Space" given by Stefano Mazzocci, Cocoon's creator in the Apache Cocoon project. The attendance was so high, that even with extra chairs some delegates were left sitting on the floor. This talk gave a clear explanation of the XML model and the "semantic web", covering many of the technologies that the W3C are developing to shape the future of the World Wide Web. Stefano described the ways in which XML can be used to overcome some of the problems inherent in today's Web, and demonstrated how they can be implemented using Cocoon, Xalan, and other Apache projects. After 2 hours of XML, an hour of Apache 2.0 by Ryan Bloom was my next stop. The major changes in Apache 2.0 are the implementation of MPM (Multiple-Processing Modules), APR (Apache Portable Run-Time) and I/O filtering. No release date was decided for an Apache 2.0 beta, although Ryan promised it would be as soon as possible. Lunch was served between 12 pm and 2 pm but talks were still being held during these two hours so it was either lunch or class. At 1 pm, I had no choice but to forgo a class as hunger beckoned and I joined one of the two long queues to collect my meal at the reception and bars area. Seats were limited but as the turnaround time was quick (no one loitered at the lunch tables), everyone managed to find a place at the tables in the end. A bit short on space but at least it worked out well. After a meal that was nothing to shout about, I had just enough time to drop by the Sun's Internet Pavilion to check out my emails before joining the next class at 2 pm. It was time for a change so I joined a business-oriented talk instead of another technical one. Peter Moulding gave a few useful tips for convicing higher management to use Apache instead of other proprietary web servers in his "Apache in the Real World - Beating the In-house Bias" talk. After this was another two hours slot class and it was "Introduction to Apache Server" by Rich Bowen for me. This class was more for users new to Apache so I left halfway to listen to "AxKit - an XML Delivery Toolkit for Apache" presented by Matt Sergeant. AxKit is implemented as a Perl Apache module using mod_perl that provides on-the-fly conversion from XML to a variety of format, such as HTML and WML for WAP phones. It provides similar functionality to Cocoon. After attending four classes and missing one due to lunch, there were two more talks to go with 3 hours in total, an hour and a half each. Sterling Hughes, co-author of the soon-to-be-published-in-November "The PHP Developer's Cookbook" gave a very technical talk on "Extending PHP4" covering the PHP API and compiling a PHP extension in detail. The talk covered the new scripting engine in PHP 4, Zend. Like a traditional interpreter, the old PHP scripting engine would execute scripts while parsing them. The new Zend engine operates using the more efficient model of pre-compiling the script. The last class of the first day was a highly entertaining and animated talk by Ralf S. Engelschall, author of mod_ssl, mod_rewrite, and much more. The talk, "Security Solutions with SSL", covered the evolution of mod_ssl, described its features, and gave twelve useful configuration examples. Each of the beautifully presented slides included an amusing quote to lighten up the atmosphere of this heavy subject. After a long day of exhausting technical classes, it was time for a relaxing night event named "The LongevIT Spa" at the Rock sponsored by IBM WebSphere. Round trip transportation from the Olympia Center was provided. Most delegates had absolutely no idea where the coaches were taking them. The Rock is a newly opened nightclub. There were free cocktails and beers; head, neck and shoulder massage; and two virtual reality simulators that emitted smells too but we were too conservative to give the latter two a try. Despite the free flowing drinks, sushi and loud music, we were desperate for a decent meal at 10 pm so we nipped out for dinner and were back by 11 pm for the coach back to our hotel. In doing so, we missed the raffle and the bag of goodies given away by IBM - a pair of slippers, t-shirt, CD-ROM and etc. We were real tired when we reached the hotel and could barely walked to our room. What a day! ApacheCon Europe 2000: Day 2 The schedule for the second day was not as punishing as the first day. There were only a total of 12 classes held on this day with only four to attend with three keynotes. There was ample time for lunch and for visiting the exhibition that didn't start until 12 pm. The first session of the second day was "JCP (Java Community Process) and Apache" presented by George Paolini, Vice President of Technologies and Advocacy. Basically he talked about the role Sun has working with Apache Software Foundation and the roadmap for the Java 2 platform. Juggling between Java Application Servers, mod_snake, and mod_perl, I finally dropped the former two and settled on the latter. In a nutshell, Eric Cholet talked about configuring Apache with Perl using <Perl> sections and @PerlConfig, and configuring mod_perl applications using PerlSetVar and custom configuration directives. The main question is why would anyone write Perl codes inside Apache httpd.conf file? One of the benefits is that in a many virtual hosts environment, Perl codes can be used within the httpd.conf to generate suitable values for directives based on some external variables. Next Dr Kristof Kloeckner, Vice President of Business Integration Development and Director from IBM Hursley Laboratory enlightened us on how IBM relates to open source both as a contributor and a beneficiary. Soon it was lunchtime. Only an hour of IBM Management Briefing, "Infrastructure for Web Services" in the Vendor Theatre overlapped with the two hours lunchtime so there was time to visit the exhibition. Around eighteen companies including IBM, Sun Microsystems, Covalent Technologies, Thawte, Zend Technologies Ltd, Eliad Technologies took part in the trade show. There were a coffee stand in Sun's booth and two romper rooms with pinball machines and two Sega Racing Arcade machines. We picked up more freebies such as a Tomcat cup and t-shirt, cap, and magazines and even tried our hand at a pinball machine but alas, we were not Brooke Shields. Soon it was time for three more classes within the next four hours before the long awaited guest keynote by Douglas Adams. I filled the next three hours with Tomcat by attending "Migrating Apache JServ Applications to Tomcat" by Craig McClanahan and "Advanced Tomcat Configuration and Performance Tuning" by Costin Manolache who was a fast speaker and completed his very technical talk in just an hour within his two hours slot. Then it was an hour of "Improving script and handler performance under mod_perl" by Stas Bekman who unfortunately had to wrap up his talk quickly as delegates were waiting to enter the room for the final keynote of the day. "Living in a Virtual World" was the keynote everyone was waiting for. The whole auditorium was filled to the brink and the audience were not let down as Douglas Adams soon had them in stitches with his urban myths and unique perspective about computers. The last event for the second day was the reception serving cocktails and hors d'oeuvers on the Exhibit Floor. "Bop Ad" was definitely the STAR of the day as fans, ASF members and fellow Apache enthusiasts alike queued for his autograph and a free paperback copy of "The Hitchhiker's Guide to the Galaxy". With that, I ended the day and retreated to the haven of my hotel room. ApacheCon Europe 2000: Day 3 On Wednesday, I was late for the first talk of the day, as I had to check out from my hotel. On this final day of the conference, there were only a total of nine talks running in the three concurrent tracks, two keynotes, a book-signing event, a few vendor presentations by MyComponents.com and Oracle, and not forgetting the closing plenary. When I reached the center at quarter-past nine, all three talks had started. I planned to attend "Running a Successful Web Hosting Business" by Frank DeChellis, one of the two business-oriented talks in this conference. Peering through the glass panel in the door, I couldn't find an available seat in that class so I sneaked unnoticed into the auditorium instead. This turned out to be a good choice, as the talk "Managing your Web Site with Cocoon" was very well presented by Doug Tidwell. Doug, author of an upcoming book on XSLT, demonstrated how the array of tools written by the Apache XML project (including Cocoon, Xerces, Xalan, and FOP) could be used to perform server-side transformations of XML documents. From a single XML document, HTML, PDF and WML could all be served to the client. Next came the Oracle keynote titled "Convincing Management to Embrace Open Software Development" by Brian Behlendorf, president of the Apache Software Foundation and cofounder of CollabNet. He gave a brief definition of open source, and the various licences used such as the Apache Licence, GNU General Public Licence and Mozilla Public Licence. He described how open source software is designed and built using the collective wisdom of a group of developers, with contributions from a large user community. When I heard the word "collective", images of the Borg flashed through my mind. Brian also offered some tips on how to make lawyers less nervous and ended the talk by sharing a list of free buzzwords: "reduce time to market", "increase margins", "expand public mind share" and "take ownership of your future", with the audience. The exhibition hours were only from 12 pm to 6 pm but outsiders were still registering on-site for exhibition passes. I was slightly taken aback when I was stopped and asked to show my pass (it was hidden under my jacket) at the exhibition entrance but I guess they were just being careful. During the two-hour lunch break from the main talks, presentations by vendors were still in progress. I ate lunch leisurely as most delegates had already taken theirs and no one was waiting for my seat. I had a pleasant conversation with two participants from Germany and the USA during lunch. The latter only heard about ApacheCon after the Orlando event. Instead of waiting for the next ApacheCon in the USA, he persuaded his company to send him to this one. The former was in charge of migrating his Netscape web server to the Apache web server all by himself. Both of them were very satisfied with the quality of this conference and the useful technical details that they managed to absorb from the talks. Oblivious to time, I missed the Wrox Press book-signing event at 1 pm by Peter Wainwright, author of "Professional Apache" but still, I managed to pick up a cute horsey toy known as "CocoJ" from Eliad Technologies booth. At 2 pm, I was off to "mod_perl Version 2.0" given by Doug MacEachern. Because of the architectural changes in Apache 2.0, particularly the introduction of thread support, mod_perl has been rewritten from scratch. The presentation was served by Apache 2.0 and the development version of mod_perl 2.0, and Doug demonstrated use of some of the more advanced features of Apache 2 which are supported in mod_perl 2, including I/O filtering. Soon James Davidson took over the stage for the "Guru Keynote" session titled "Jakarta Perspective". This was his personal account of the origins and goals of the Jakarta project. In a spontaneous talk, he reminisced about the history and progress of Tomcat and Ant including an insight into the various obstacles that had to be overcome in getting the ASF and Sun together. In his zest to deliver an up-close and personal look at Tomcat, users unfamiliar with the Jakarta project might have complained that he had neglected to give a clear definition of the Jakarta project. Nevertheless I enjoyed the talk as it provided a glimpse into Tomcat's roots. For the final 2-hour class of the conference, I attended the talk, "The Backhand Project: Load-Balancing and Monitoring Apache Web Clusters" by Theo Schlossnagle. He clarified the differences between "load balancing" and "high availability" since they are often used interchangeably to mean both. Both mod_backhand and mod_log_spread were covered in this talk. Back to back with this class was "WebDAV and Apache" by Greg Stein which other delegates from Apache Week reported was an excellent talk about WebDAV and mod_dav. The closing session hosted by Ken Coar saw only one third of the attendance of the opening plenary. He announced that there were about 1200 registrants (20 percent more than at Apachecon 2000 Orlando) with around half attending only the exhibition. With a panel of ASF members on stage, it was time for comments about the conference. The overall feedback was positive. Some complaints were that the Monday schedule was too tight and the Internet access was slow and not very reliable. One suggestion was to introduce lightning sessions where speakers would talk for five minutes on a subject. Hands-on sessions in the evenings were also suggested. While most attendees came from Europe, there were also some from the USA, Canada, South America and even all the way from Japan. Delegates who attended both ApacheCon conferences this year commented that ApacheCon Europe was definitely better than the previous one held in Orlando. If this is the trend then it is good news as we can expect more improvement in the next ApacheCon, which will be held in Santa Clara, California from April 4th to April 6th. The location for the next ApacheCon to be held outside the USA is yet to be determined, but a hint was dropped about Australia. As in all conferences, there were various technical glitches when presentation laptops froze and batteries ran out, some inexperienced speakers, and not enough seats but these were all minor issues considering the excellent detailed technical knowledge that was imparted by the speakers. An annoying distraction was the occasional ringing of mobile phones during the talks. Perhaps the audience need to be reminded to switch off their cell phones at the start of presentations. My personal opinion is that it is very important to pick suitable talks to attend based on your own requirements, as all of them seemed very interesting from the abstract provided. As soon as you are aware that the talk is not what you expect it to be, you must just walk out and join another talk. This may seem very rude to the speaker but to make the most of the conference, this is the only way. One suggestion is for the planning committee to indicate the level of technical knowledge required for the talk, so delegates can make a better choice depending on their own expertise. This conference was most suitable for "technical technical" people who wanted to know in depth about a certain subject and to talk to the authors of various modules but it also catered for higher-level managers and new users. With that, I end my report and hope to see you all at ApacheCon 2001 in Silicon Valley next year! ApacheCon 2000 Conference Report Report from the second ever Apache conference Report from ApacheCon 2000 This is a special report covering the ApacheCon 2000 conference in Florida held in March 2000. First published 10th March 2000. The conference ran from March 8th to March 10th, at the Caribe Royal Suites in Florida, USA. The hotel is situated very close to the main Orlando attractions and the sessions took place in the conference center of the hotel. In total, just over 1000 people attended the conference and this included a large number of Apache Software Foundation members. At the very first session of the conference, the opening plenary, the previous record for the most Apache developers in the same place at the same time was broken. Apache Week counted 18 developers during the session, 4 more than at ApacheCon 98. While most people came from the US and Canada, there were also a significant number of people from Europe and beyond. This was the second official Apache conference, the first being held in San Francisco in 1998 with over 500 attendees. Some initial pictures from around the conference are available from the Apache Week site. In addition, personal pictures from attendee Kevin Burton are available. Ken Coar opened the conference on Wednesday with a plenary session and a song. Joining him on stage was a selection of the current Apache Software Foundation members. Roy Fielding, current president of the ASF, said that the conference provided a unique opportunity to talk to the people who actually write the code. The floor was opened to questions which mostly revolved around the function of the foundation, the Java and XML Apache projects, and the upcoming 2.0 alpha. Roy said that 2.0 would be available "real soon now" and Ryan Bloom promised that an alpha would be available during the week of the conference, or at least a few days after it. The first keynote speech was from Dr Alfred Z Spector, Senior Technical Strategist with IBM. Dr Spector outlined IBM's contribution to open-source and particularly their work on Apache 2.0 and that Java and XML Apache projects. The increasing importance of modularity by creating customisable building blocks and code reuse was stressed. The conclusions of the talk were that developers need to be given access to libraries of standard components and better tools to utilise them, and that the education system should be changed to put more emphasis on the use and reuse of components. Brian Behlendorf gave an energetic look at the internals of the ASF in his keynote session "State of the Foundation". His talk covered some of the reasons that the ASF was formed which includes protection for the individual contributors against lawsuits and the abilty to control the Apache identity. The ASF is the umbrella organisation behind the Apache httpd server as well a number of other open-source projects such as Jakarta and Apache XML. Brian announced that the ASF had currently received over US$35,000 in donations which was quite an accomplishment given that it is not publicised that the Foundation accepts donations. The difficulty for the ASF now is working out how to spend the donations. The FreeBSD project was cited as a good model as it gives grants to developers who needed additional resources for example. For the future a goal for the ASF is to develop a structure to help support new projects, aided by creating a standard framework of developer tools and procedures for running open-source projects. On Friday the first keynote was given by the president of the Java Software Group within Sun, Patricia C Sueltz. She talked about how Sun views the open source movement. She said that Sun has made three technology bets in the year 2000: computers will need to massively scale, that the network stack will need to be interoperable, and that devices will be always on and always connected. Finally, she talked about how Sun views open source, and addressed some of the criticisms of their current approach. She said Sun was committed to working better with open source, and is working to improve its source license. Various pieces of software have been or will be released to open source groups, including the Tomcat servlet engine and the Xerces XML parser. There were four parallel tracks running throughout the conference, with a total of over 40 classes. Unlike the last ApacheCon the tracks were not themed, and some of the classes took place as "Nightschool" events. Over the next few weeks the ApacheCon site will be updated to include links to all the talks and papers that were presented. Popular talks on Wednesday included a series of tutorials on starting with mod_perl, Comanche a GUI for Apache, and the Catherdral Meets the Bazaar. The mod_perl tutorials continued into Thursday and was joined by talks about XML, HTTP, and APR. The day classes ended with a well received talk on load balancing for Apache using the Backhand module, mod_backhand. Talks on Friday covered a variety of subjects, including XML and XLST, PHP, and Apache 2.0. There was also a panel discussion about the future of Apache after 2.0, including a variety of ideas for new features for Apache in the future. This included IO layering (such as the ability for the server-side includes module to parse the output of the CGI module), and replacable configuration engines so that configuration information could be stored in a database instead of a file. Around sixteen companies exhibited at the trade show during the conference. Companies present included IBM, Sun Microsystems, LinuxMall, and Covalent Technologies. The exhibition was very popular and reinforced how Apache has built an associated industry. The exhibitors we talked to were very happy with the quality, interest, and response of people that they met at the conference. The final session of the conference consisted of a launch of the first alpha of Apache 2.0. A number of ASF members on stage updated the website and copied the distribution files into the correct locations live of in front of the audience. Announcements were then sent to a number of key sites such as Slashdot and Freshmeat. This was followed by a session of questions and answers about the conference. In general most of the attendees seemed to like the conference, with positive reaction to the speakers. An interesting point was that was that most speakers knew their subjects very well, and although not all were experienced speakers, they were preferred to excellent speakers without detailed technical knowledge. Other comments were that the session lengths were just right and that the conference was good value overall. The main critisisms were the fact that lunch was not included in the conference price, the difficulty obtaining meals since BOFs were scheduled at lunch times, and confusion since the 'nightschool' sessions were not included in the 'full conference' registration. Plans are already underway for the next ApacheCon conference, which will be in London in October this year. The conference will be smaller than the US show and tailored towards the European community. The next ApacheCon conference to be held in the US will probably be in San Jose in 2001. The first meeting of the Apache Software Foundation members took place on the Saturday morning following the conference. A total of 27 of the 38 ASF members were present, together with representatives of the conference organising company, Camelot. A secret ballot was held to elect the new board of directors of the ASF as well as to elect a number of new ASF members. ApacheCon 2001 Dublin Cancelled The ApacheCon Europe 2001 conference scheduled for Dublin in October has been cancelled, due to financial difficulties with Camelot Communications. ApacheCon 2001 Dublin Cancelled The ApacheCon Europe 2001 conference scheduled for Dublin in October has been cancelled, due to financial difficulties with Camelot Communications. Camelot are the production company who produced all but the very first ApacheCon conferences. The Apache Software Foundation today released the following statement: Due to financial considerations beyond our control and unrelated to past ApacheCon conferences, our conference producer has decided that they are unable to produce the upcoming ApacheCon Europe 2001 in Dublin. With only three months left before the conference was scheduled to begin, The Apache Software Foundation has decided that it is in the best interests of attendees to cancel the show now rather than attempt to find another conference organizer for the Dublin event. We had suspected there was a problem with the conference when we were contacted by Peter Moulding, a speaker at ApacheCon 2001 in Santa Clara. Peter said that he had not had his travel refunded and that the conference organisers, Camelot Communications, had called him to tell him they were closing the company. We were unable to get an official response from Camelot or the Apache Software Foundation in time to run the story in issue 254 (13th July 2001). It is disappointing that Camelot is unable to produce the conference; the previous conferences that they have run have been well attended, made a profit, and been highly rewarding for everyone involved. The Apache Software Foundation are about to begin evaluating proposals by other conference organisers so that future ApacheCon events will not be affected. More news in Apache Week and on the conference web site as it becomes available. ApacheCon 2001 Conference Report This is a special edition of Apache Week covering the April ApacheCon 2001 conference in Santa Clara. ApacheCon Santa Clara 2001: Day 1 ApacheCon 2001 was held in Santa Clara, California from April 4th to April 6th. As promised, Apache Week was there to cover the conference. The first day didn't get off to a good start as there were no signs in the hotel explaining where the conference registration was, [photo: "registration", 77K jpeg] so we ended up eating a breakfast provided for a different conference in the hotel. This turned out to be a good plan, as the ApacheCon breakfast wasn't nearly as good. Registration was quick and painless but even though conference proceedings were available on a CDROM, the registration bag contained hard copies of all the papers, running to three thick volumes well over 600 pages. Unlike the last ApacheCon there were no free goodies in the bag; last time we got a t-shirt and a pen, this time we just got marketing leaflets from companies sponsoring the event. The schedule showed that ApacheCon had packed over 24 classes into the first day, running from 9am through to after 9pm. First up was the opening plenary presented by Ken Coar, and over 180 people packed the theatre [photo: "ken coar", 59K jpeg], [photo: "packed theatre", 169K jpeg] Ken gave a welcoming speech, details of changes to the schedule, and where to find lunch. Just under 200 proposals for sessions were received for this conference from which just 89 were picked. Sadly attendees we talked to afterwards said the session came across as unplanned and unprofessional for a conference of this type. This would have been a good opportunity to introduce the Apache Software Foundation or give a brief overview of the major events since the last conference. We made use of the wireless Internet access available throughout the conference area to catch up on some work before attending the "birds of a feather" (BOF) session on clustered Apache services [photo: "BOF audience", 63K jpeg]. The group behind the Spread toolkit explained how to create reliable distributed clustering systems and showed examples of how Spread can be used within Apache. Apache-SSL has code that makes use of Spread to facilitate a shared session key server, although the toolkit can be used for much more complex tasks such as database replication. Next, Harrie Hazewinkel gave a short but interesting talk on quality of service measurement, using SNMP to monitor and manage Apache. Harrie is the author of the Apache SNMP module, mod_snmp. After the provided lunch, Jon "maddog" Hall from Linux International enlightened us with an entertaining and animated keynote speech [photo: "maddog", 64K jpeg]. He touched on trademark issues where people take advantage of the Linux name to create, for example "Linux University". These issues are of particular interest to Apache, and the ASF take care to protect the Apache name. With the recent downturn in the technical sector he explained his business plan which involves combining microcomputing and microbrewing. "When the computer industry is at a low, beer drinking is at a high." he said. By combining both industries into a single course you can make sure you always have a job. The keynote touched on issues to do with classification of machines, the accuracy of his predictions applied to the Internet, and look at Star Trek technology including communication badges, personal log computers, and female Borg. Next we had intended to visit the talk on WebDAV and Apache with Greg Stein, but the small presentation room was overflowing with people, so much so that the talk was repeated later in the week for those that could not fit in the first time. Instead we went to see Giacomo Pati and his talk on Cocoon. When we started developing Apache Week back in 1995 we looked at content-independent ways to store the issues. We actually wrote our own format, in a style similar to the Ventura publisher markup language. If we were to start again we'd definitely be using XML, in fact we already use XML for parts of Apache Week as well as the "In the news" section of the main apache.org site. We were interested in finding out more information about some of the XML publishing systems available, and this is the goal of the Apache Cocoon project. Doug Tidwell spent some time explaining Cocoon 2.0 and focussed on serving up XML documents. The basic idea is that you write a XML representation of the resource you wish to serve together with an XSL stylesheet that shows how the XML is to be translated. The XSLT process is normally left to the server and is usually cached as the translation may take a significant time. In the future, browsers will be able to do this transformation themselves with the server just providing the XML and XSL files directly. Some browsers attempt to do this now, but support is still limited. Cocoon is able to pick which XSL stylesheet to use to render a page based on things such as the user-agent field. Once you have an XML representation of your data you are not limited to just providing a translation to HTML, and we were shown tools that could convert the XML into other presentation types such as JPG and even the creation of dynamic PDF. For the remainder of the day we decided to attend the talks on security. The first, "PKI with OpenSSL", aimed to show the applications for which OpenSSL can be used. OpenSSL is an open-source toolkit that implements SSL as well as many other cryptography and public key protocols. Before September last year the RSA patent prohibited the use of OpenSSL inside the USA. Rodney Thayer explained that OpenSSL can do much more than act as the SSL layer for a secure web server as he went through the various standards as well as commands for general cryptography, certificate processing, and key storage. OpenSSL is now used in a large number of applications and is a product-grade general purpose cryptography tool. The last class of the first day was a highly entertaining and animated talk by Ralf S. Engelschall, author of mod_ssl, mod_rewrite, and much more. The talk, "Security Solutions with SSL", covered the evolution of mod_ssl, described its features, and gave useful configuration examples. Each of the beautifully presented slides included an amusing quote to lighten up the atmosphere of this heavy subject. The future of mod_ssl was discussed including the work currently going on to port it to Apache 2.0, add LDAP CRL handling, and a distributed session cache. mod_ssl will not need EAPI hooks for Apache 2.0, but other EAPI functions may be useful. It is not certain how this effort will fit into the work being done in Apache 2.0 on mod_tls and if we will end up with two SSL solutions like we have with Apache 1.3. When asked about support for Win32 Ralf replied "if you really think that you can run a secure web server on Windows you've not understood security". ApacheCon Santa Clara 2001: Day 2 The second conference day was almost as packed as the first, with 25 talks and additional BOF sessions spanning from 9am until after 8pm. After the free breakfast doughnuts I decided to attend the BOF sessions on using Apache for serving multiple protocols. One of the aims for Apache 2.0 is that the HTTP engine is abstracted, and in particular APR is designed to be a portable layer that can sit beneath all sorts of applications. The BOF gave a list of the protocols that have been examined so far including HTTP, FTP, POP, IMAP, IDENTD, and SNMP. It then looked at why you'd want to use Apache to do this when good applications for each of these protocols already exist. The main advantage is that you get a common infrastructure for all your applications so you can use one standard configuration format, one standard way of doing authentication and so on. You can also make use of the extensive tools such as the Rewrite module and SSL across all protocols. The biggest requirement for the project is that the performance for serving HTTP requests should not be affected if you don't use Apache to serve any other protocols. Once discussion moved to POP and IMAP support I was reminded of Jamie Zawinski's law of software envelopment: "Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can." Each time a secure web server receives a connection from a new client it has to establish a new SSL session. This negotiation requires the server to perform a private key operation, usually with a 1024 bit RSA key. This operation is mathematically complex and is therefore time consuming. Hardware accelerators are designed to offload the most complex parts of this operation allowing more new connections to be established every second. Existing hardware units handle anywhere between 75 and 300 of these operations per second using a number of internal processors, and can cost up to US$15,000. The OpenSSL project has recently been incorporating support for various hardware cryptographic accelerator cards. Until recently these accelerators were only supported by commercial secure servers. A number of these hardware vendors were invited along to a special BOF to discuss OpenSSL support and their units. Representatives of nCipher, Rainbow, and GIGI attended and gave short talks about the capabilities of their hardware and how it was supported. nCipher stressed that the ability to keep your servers private keys on an external device, and scalability was more important than performance. Rainbow said that they concentrated on acceleration, having the fastest boards available. Dr Lee Nackman of IBM gave a keynote entitled "Open Source and the Corporation". He said that IBM had an "open source zeal" and had developed internal processes that made working with open source projects less painful. Of course IBM wants to see a return from their investment, and in the case of their substantial contributions to Apache-XML they saw that it would open up new business models for IBM. They see themselves supporting the customer demand for Linux and being able to exploit the emerging technologies. Looking to the future, he predicted an increase in web services and service-orientated web applications such as stock quotes, news, and increased integration with business processes. Soon it was lunchtime, and at this conference the ApacheCon planners had decided not to schedule sessions overlapping with lunch. Instead lunch coincided with the opening of the exhibition hall [photo: "lunch queue", 80K jpeg] The turn out of exhibitors was disappointing, under half the number at the last ApacheCon, and a distinct lack of giveaways. I failed to find which company was giving away inflatable camels (or in fact why they were doing so) [photo: "apacheweek sign", 61K jpeg], [photo: "exhibitors hall", 98K jpeg], [photo: "exhibitors hall", 97K jpeg]. I skipped most of the afternoon sessions in order to finish off the Apache Week guide to the history of Apache 2.0 and catch up with some sleep. ApacheCon Santa Clara 2001: Day 3 Friday marked the last day of the conference, but the schedule was still packed with exciting talks and keynotes. For the first talk of the day we visited Mark Wilcox who was presenting "Apache and LDAP". The talk outlined the role that LDAP can play with Apache, looking at what directory services are, and how to make use of LDAP with Apache and Perl. Mark explained that the aim of a directory service is to provide quick access to hierarchical information in a way that can be distributed and replicated. These services can be useful to Apache for authentication, authorisation, and perhaps even configuration. The HTTP protocol is stateless so user authentication needs to happen on every request. Rather than have every page request do a new database lookup, LDAP services are usually combined with some other system, such as cookies. The Perl::LDAP module provides an easy way to interface to directory services from within Apache. Jon Tigue gave an interesting presentation on extending directory indexes provided by mod_autoindex. By cleaning up the HTML produced by the module with a simple patch, the output from the module can be sent through an XML parser. When used in conjunction with clients that can parse XML this allows things such as the column sorting in the FancyIndexing without any server interaction. After lunch a panel discussion took place about Apache on Windows. Ryan Bloom, William Rowe, Jeff Trawick, and Rich Bowen formed the panel but were greeted by only 20 attendees [photo: "win32 round", 77K jpeg]. The discussion formed around APR and how the implementation of this layer makes Apache 2.0 think that Windows is just another Unix. Even though Apache for Windows is designed to run best on NT (and hence Windows 2000), a substantial proportion of the audience wanted to keep support for Windows 95 and 98 for testing purposes. The closing session hosted by Ken Coar saw only a fraction of the attendance of the opening plenary, but it was getting late on a Friday evening. With a panel of ASF members on stage [photo: "some ASF members", 52K jpeg], it was time for comments about the conference. The overall feedback was positive. Some complaints were there was poor Internet access, this was true if you relied on the computers provided but I found the wireless coverage to be excellent. One suggestion was that there should be less sessions in the evenings, leaving them free for more social interaction or BOF sessions. Another suggestion was to have talks that explained (probably in an unbiased way) the commercial products available that interfaced with or were based on Apache. Overall I was very impressed with the conference. A lot of the problems from previous ApacheCon conferences had been addressed and the quality of the presenters was high. It was a shame that more exhibitors had not taken part as it seemed that a number of corners had been cut to save money. The only negative impressions were fairly minor; the food choices were limited (on Friday all the meal choices involved cheese making it difficult for Vegans to find things to eat), the conference was a long way from any other facilities (having a car was essential), and there were no fancy parties. Wireless internet access was available throughout the conference rooms and I found it difficult sometimes to stay focussed on the speaker, missing parts of presentations whilst catching up on email without realising it. With so many interesting talks I couldn't attend all of them and this report gives only a snapshot of the ones I thought would be interesting to me. ApacheCon has a variety of talks aimed at all technical levels, so you should definitely consider attending if you've not been to one before. With that, I end my report and hope to see you all at the next ApacheCon later this year! ApacheCon'98 Conference Report Report from the first ever Apache conference Report from ApacheCon '98 This is a report about the first ever conference dedicated to Apache, held in October 1998 in San Franscisco. First published 16th October 1998. The conference ran from October 14th to October 16th, at the San Francisco Hilton in California, USA. This is the largest hotel in San Francisco, and is located in the downtown area. In total, just under 500 people registered for the conference. While most people came from the US and Canada, there were also a signficant number of people from Europe. For the first conference on Apache, this was a very good attendence, and the exhibitors and sponsors were very happy with the number of people at the conference. In addition, most of the 18 core Apache developers also attended, coming from the US, Canada, Italy, UK and Germany. This article contains some links to pictures taken at the conference. Some additional pictures are also available. The first general session on the 14th started with a keynote speech from author Bruce Sterling (picture). This was not directly related to Apache, but contained Bruce's thoughts on the future of a networked society. This was followed by another keynote, from John Gilmore of the Electronic Frontier Foundation (EFF) (picture). The EFF is an organisation concerned with freedom and liberties in computing and the Internet. He outlined objections to software patents, and covered the problems caused by the US export restrictions on secure encryption. Decisions about what is exportable and what is not exportable from the US are made by government employees, without any ability to appeal. Even worse, government employees can revoke export permission at any time without giving any reason, which could seriously affect businesses who rely on exports. The export restrictions were applied to the NCSA httpd server, where the government demanded that the server removed all "hooks" which could allow encryption to be added, even thoughb there was no actual encryption technology in the server. This is the reason that Apache does not contain any hooks to enable encryption to be added. The first talk on the second day was by John Patrick from IBM (picture). He talked about his view of how the Internet will evolve. In the last session, David Filo from Yahoo! showed how Yahoo! has used open source software (picture). They started by using commercial operating systems and home-written web servers, but had problems with vendors not being able to scale to the huge number of hits they soon received. They moved to FreeBSD so they could read and if necessary tweak the operating system code. They also use Apache on most of their servers and find that the majority of the performance limitations come from the application layer software. There were four parallel tracks running throughout the conference, with a total of 55 talks. The tracks were Dynamic Content, Performance, Security and Case Studies. On the Dynamic Content track there were talks about using Java servlets at beginner, advanced, and performance levels. Two talks about PHP showed beginner and advanced techniques. There were also talks on writing Apache modules and mod_perl (although unfortunately the first mod_perl session could not be given by the original presenter and the second advanced session had to be cancelled). The Performance track covered making Apache go faster on Windows and Unix systems, using servlets efficiently, and tweaking Linux and FreeBSD. Also on this track was a presentation on how the Netscape Portable Runtime (NSPR) package (available under Netscape's NPL license) could be integrated with Apache to provide a multi-threading Apache on all Unix platforms as well as NT. There was also a talk about how the Apache development process works and how people with contributions can get involved. On the Security track the talk about the new mod_ssl package was popular. The author presented what mod_ssl does and why it was created from the existing Apache-SSL package. Also on this track was an introduction to SSL and TLS, basic security issues in Apache, NT security, and a panel on public key infrastructure on the Web. The final track had Case Studies from various companies. This track also contained a demonstration of various GUI configuration programs for Apache. There are various free and commercial configuration systems in development currently, some of which were demonstrated. This seems to be the start of a more concerted effort to develop a GUI infrastructure within Apache, which will allow multiple front-end implementations. About a dozen companies exhibited at the trade show during the conference (picture). Companies present included IBM, RedHat, C2Net, Sendmail, nCipher, SUSE and O'Reilly. This was a very explicit demonstration of how Apache has built an associated industry, and the exhibitors we talked to were very happy with the quality, interest, and response of people that they met at the conference. The final session was a chance to communicate with the core Apache developers (picture). After introducing each member, there was a short discussion of items of interest to the developers, such as plans for 2.0. This was followed by an open session for questions from the floor. Questions covered a range of topics, from IBM's involvement with the Apache group (they have several people working full time on Apache and will contribute back changes) to a request for Apache incorporate SSL by storing it on a server outside the US (this cannot happen because then no US citizen could work on any part of Apache as it includes encryption). This was the first conference about Apache, and the first conference ever organised by the Apache Group. The result was a very successful conference, where sponsors, exhibitors and attendees were all happy. The success of the conference means that there will be another ApacheCon in the future, but the location and dates have not yet been decided. As soon as anything is known, it will be announced in Apache Week. Apache 1.2 API Guide For module authors, a comprehensive list of changes to the Apache module API. Introduction Apache 1.2 is now out. Here we list all the module API changes compared to the API in Apache 1.1.3. Anyone who has written a module for Apache 1.1.3 or earlier should read this to see if the need to make modifications for it to work with 1.2. In any case, Apache adds many new features from HTTP/1.1, and modules might want to take advantage of them. See also our Guide to Apache 1.2 First published in Apache Week issue 44 (6th December 1996), last updated 6th June 1997. API Changes The module API version is now 19970526. A new phase of request processing is available, to allow modules to process the request headers early on in the request. The functions which handle directives should now return type "const char *" instead of "char *". If this is not done, compiling the module might result in type-mismatch warnings, although it will still work. Directives can now be defined in more than one module as once. Each module is given the chance to handler the directive, and can decline it by returning DECLINE_CMD. This gives other modules the chance to handle the directive. This is used in Apache in mod_auth.c and mod_auth_dbm.c, which both support AuthUserFile, but only handle it if they recognise the file type argument. Directives can now take up to three arguments, and can take optional arguments. The number of arguments is specified in the module's command table, with values such as TAKE2 (for two arguments). Possible values are now: TAKE3: takes 3 arguments TAKE12, TAKE23, TAKE123, TAKE13: takes a variable number of arguments (1 or 2, 2 or 3, 1 or 2 or 3, 1 or 3 respectively). The function called should declare arguments for the maximum number of argument the directive can take. Arguments not set on the directive will be passed to the function as NULL. Finally, the cmd_parms structure has been updated (this is passed in as the cmd argument to directive handlers). A new 'cmd' element is now available, pointing at the directive's command table definition (command_rec). Apache now supports the additional OPTIONS and TRACE request methods. Two new defines are available for these methods, M_OPTIONS and M_TRACE. The request_rec's method element could be set to one of these. The handler can send an OPTIONS response using send_http_options() (although it could also decline the request, and let the default handler send the response). Handlers can also set the new allowed request_rec element to enable the creation of a proper Allow HTTP/1.1 header. This is done by shifting the M_ defines right by the appropriate amount. For example, to specify that GET and POST (only) are allowed for a particular resource, the following could be used: r->allowed = (1 < M_GET) | (1 < M_POST); The way that a module reads PUT or POST data has been completely changed. This is necessary to support HTTP/1.1, which can send this data in a 'chunked' encoding. Modules can request that they get the data after it has been 'dechunked', or they can get the raw data. Any module which handled PUT or POST data by using the old read_client_block() will need to be modified before it will compile with 1.2. The way to read a request body in 1.2 involves several steps: Call setup_client_block() to prepare to handle the data. The second argument to this function tells Apache how to process the body (if at all). It can be one of: REQUEST_NO_BODY (issue a 413 error if any body is present), REQUEST_CHUNKED_ERROR (issue a 414 if the body was sent encoded), REQUEST_CHUNKED_DECHUNK (if body is chunked, process to remove the chunking), REQUEST_CHUNKED_PASS (pass on the chunks). Call should_client_block() when ready to read the data. This sends a "100 Continue" status to the client (new in HTTP/1.1) and tells the module whether it is ok to read the data. Repeatedly call get_client_block() to get the data (possibly all in one go, but possible also a bit at a time) A HTTP response can include headers to indicate to the client that this response should not be cached at all. In previous versions of Apache, this was done by setting the no_cache element of the request_rec. This also had the effect of always sending the response, even if a "304 Not Modified" response could be returned. Now a new element has been added, no_local_copy. When this is set, a 304 response will never be generated. Setting no_cache will send a response that cannot be cached. New in the request_rec no_local_copy and no_cache replace 'no_cache' (type int) request_time - time request was received (type time_t) boundary - boundary string for multipart/byteranges (type char *) range - range header text (type char *) content_language deprecated. Use content_langauges array instead (array of char*) allowed - set to allowed methods (returned on Allow: header by send_http_header()) (type int) byterange - number of byte ranges (type int) chunked - if sending chunked encoding (type int) read_length (bytes read so far) (type long) read_body (read_body can take REQUEST_NO_BODY, REQUEST_CHUNKED_ERROR, REQUEST_CHUNKED_DECHUNK, REQUEST_CHUNKED_PASS) - set by handler (type int) clength - real content length (type long) remaining - bytes left to read (type long) There are some other elements used internally within Apache. In addition, the existing port element is now an unsigned int rather than a (signed) int. New in the server_rec send_buffer_size - sets the TCP send buffer size addrs - list of addresses for this vhost (type server_addr *) server_uid and server_gid contain the euid/egid to run suexec wrapper as (types uid_t, uid_t) The server_rec no longer contains host_addr, host_port or virthost. Instead, the server could be responding to multiple server addresses, so a new array (addrs) is created, each type type server_addr. The server_addr_rec contains the IP address, port and name of the server. Apache is now compiled with a regular expression library. Modules can use the function calls provided by this library to make use of regular expressions. Note that on systems which provide a stable and bug-free regular expression library, the one supplied with Apache is not used. The library is available in the src/regex directory of the Apache distribution. The only thing to note when using this regular expressions is that regsub() should not be used. This is because it returns a string allocated internally, not using Apache's pool allocation system. A new API function, pregsub() is provided instead which does the same as regsub(), but allocates space in the pool passed in as an argument. Resources can be associated with multiple resources. Typically, mod_mime obtains information about which languages a file is in from its extensions, but modules can also set the language of their response. Previously, the language was set as a string called content_language in the request_rec. That is still available for backwards compatibilty, but will only hold the last language that mod_mime set. To get all the language in a file, or to set a response with multiple languages, the new element content_languages should be used instead. This is an array (created using the standard Apache array functions such as make_array()), with each element being a "char *" string containing a language tag. For example, if a module wants to output a response in English and German, it should set content_languages with: char **new; r->content_languages = make_array (r->pool, 2, sizeof(char*)); new = (char **)push_array (r->content_languages); *new = "en"; new = (char **)push_array (r->content_languages); *new = "de"; The following API functions are new in Apache 1.2, and have not already been mentioned above. blookc() can be used to look ahead one character in a BUFF* stream. call_exec() to run sub-programs, possible as a different user. clear_table() to empty a table construct_server() returns a string giving the "hostname:port" for a given hostname and port (:port is omitted if it is 80). find_last_token() looks if a given token appears as the last part of a string. find_token() looks to see if a given token exists in a comma-separated list of tokens getword_white() available to get a word, skipping white space is_table_empty() check if a table has any contents (this is a macro) pregcomp() to preform a regular expression comparison pregfree() to mark memory used by a regular expression comparison as available. pregsub() is used after a regular expression match to substitute matching parts. scan_script_header_err can be used instead of scan_script_header() to return error information from the headers send_fd_length() sends a part of an open file. send_header_field() sends a single header to the client. set_flag_slot() sets an on/off flag in a module's config (complements existing set_string_slot()). rflush() can be used when sending a response to force output to be flushed to the client. table_do() to call a function for each item in a table API functions that use a port number previously used a signed int and now use an unsigned int. File descriptors are now passed as long instead of int to functions such as pclosef() and note_cleanups_for_fd(). All the HTTP status codes have been renamed to start with HTTP_, and the new codes from HTTP/1.1 have been added. Macros are now available to check status codes, such as is_HTTP_REDIRECT(status) Internal Changes People writing modules might also be interested in how the core Apache code works. This list, provided for information only, is a summary of the major changes to the source code which have not been reported elsewhere (as new features, for example). CookieLog now handled by mod_log_config Code to do some of transparent connected negotiation (see #define HOLTMAN in mod_negotiation.c) Configure updated to handle new simpler configuration file format Date-related functions are now in util_date.c mod_includes calls can_exec() for sub-processes Modules can be compiled in but inactive. The compiled in modules are listed in preloaded_modules[] array, while the active modules are stored in prelinked_modules[]. Modules will be moving into the src/modules directory (only mod_proxy has moved so far) Proxy code moved to src/modules/proxy directory, within the new modules directory Regular expression library has been added in src/regex directory Returns 100 Continue before reading request entity Scoreboard now contains the name of the vhost processing the request. The #define names for OS-specific functions have been simplified and made consistent: HAS_GMTOFF is now HAVE_GMTOFF, HAVE_SYS_SELECT_H and HAVE_SYS_RESOURCE_H and been added, and USE_* used to select preferred options on particular OSes (USE_FCNTL_SERIALIZED_ACCEPT; USE_FLOCK_SERIALIZED_ACCEPT; USE_LONGJMP) The fd for each listener is stored to allow graceful restarts To support graceful restarts, scoreboard records a 'generation' number Various function arguments and return values are declared as const. Appaloosa Awards 2000 The Appaloosa Awards were announced at the O'Reilly Open Source Conference this week. The winners included ASF members Ryan Bloom, Lars Eilebrecht, Roy Fielding, Doug McEachern, Dirk-Willem Van Guilik on behalf of Apache XML projects, and Rasmus Lerdorf on behalf of the PHP team. Appaloosa Awards New to the conference this year were the Appaloosa Awards, designed to reward the people and projects who have had a significant influence on Apache. The voting was open to Apache Week readers for one week and we received just under 3000 votes in total. The awards were announced on Tuesday evening at the conference by ASF member and Apache program co-chair Chuck Murcko. [Photo, Chuck Murcko, jpeg 33k] The Vision Award was for the best ideas to move Apache forward and was won by Ryan Bloom for Apache 2.0 and Roy Fielding for standards and industry acceptance of Apache. [Photo, Ryan Bloom, jpeg 29k] [Photo, Roy Fielding, jpeg 30k] The Evangelism Award was for promoting Apache awareness or acceptance was won by Lars Eilebrecht, and collected by Dirk-Willem Van Guilik for the Apache XML Projects. [Photo, Dirk-Willem Van Guilik, jpeg 30k] The Technical Contribution Award went to Doug McEachern for mod_perl and Rasmus Lerdorf collected on behalf of the PHP Group. [Photo, Doug McEachern, jpeg 47k] [Photo, Rasmus Lerdorf, Sam Ruby, Jim Winstead, jpeg 53k] All photographs copyright Story Photography 2000 Apache 1.1.1 bugs review A round-up of all the bugs in 1.1.1 Bugs in 1.1.1 The next version of Apache will be 1.2. This will include a lot of new features, as previewed in our Apache 1.2 article (from issue 29). It will also fix most of the outstanding bugs identified in 1.1.1. In this issue we summarise these bugs sorting them by affected function. There are quite a few bugs listed here, but most will not have a serious affect on most setups. Many are restricted to specific operating systems, or to particular configurations and modules. It should be remembered that Apache 1.1.1 is a stable release and most users are unlikely to come across these bugs. For each bug we have tried to identify its current status in the latest development version of Apache. If the bug is followed by the word FIXED then the bug has been fixed and tested. If the status is VERIFIED then the bug exists but has not yet been fixed (although in many cases a fix will be in progress or undergoing initial testing). If neither word is present, then the bug has not been verified or fixed. We have tried to ensure that only real bugs are listed here, but the Apache group receives quite a few bug reports, many of which relate to incorrectly configured systems or which are caused by the operating system or other software. These bugs affect the operation of the core server, or are related to low-level networking or operating system interaction. DNS Failure causes core dump Apache can core dump if it cannot obtain the local hostname from the ServerName directive or from the DNS. FIXED. High Load Problems At startup Apache forks the initial children. If it fails to fork (perhaps because of resource limitations), it immediately tries again, which can make the load situation worse. FIXED. A race condition can cause occasional hung processes on very high load systems. VERIFIED. Memory allocation failure causes core dump The memory allocation return value is not checked which could cause core dumps. FIXED. ErrorDocuments ErrorDocument redirect fails, displays filename. FIXED Docs claim %s in ErrorDocument string prints reason for error - no code to implement this. VERIFIED ErrorDocument displays " in string message. FIXED Executing sub-programs When a sub-program is about to be run, Apache checks for correct permissions, but it does not account for other groups that the current user might be in. Scoreboard Scoreboard sometimes out of date (shows PID of children Domains Starting with Numbers Hostnames starting with a number (e.g. 123.domain.com) are incorrectly treated as IP addresses. VERIFIED. Domain name capitalisation Domain names on allow and deny lines are not compared case-insensitively. VERIFIED. Expires Header Apache is not setting Expires header on 304 responses FIXED Continuation Headers Doesn't support HTTP continuation headers FIXED Keep Alives Netscape Navigator 2 has bugs in its keepalive support, so Apache should turn off keepalives when accessed from Navigator 2. FIXED. The proxy module has been extensively modified since 1.1.1 to correct a large number of problems and omissions. NULL requests logged Report of request "NULL" being logged in access log Missing Hits Reports of access_log missing some hits (possibly related to keepalives) ErrorLog ErrorLog | does not work. VERIFIED. Imagemap Module Long URLs (>100 chars) can cause buffer overflows (possible core dump) VERIFIEDo Status Module Can gives wrong start-up time on some systems Core dumps on a few systems (OSF, SCO) Wraps bytes total at 4.2GB FIXED Transfer bytes per second figures wrong FIXED Negotiation Module Language negotiation doesn't work for cgi scripts without extensions, which are in a valid ScriptAlias directory. Charset negotiation is not implemented. VERIFIED. Language negotiation doesn't match languages against sub-languages, i.e. it treats en and en-US as completely different languages. FIXED. Directory Index Module Core dump on Solaris 2 with empty directories Truncating file size in listing (e.g. 1.8Mb is displayed as 1Mb) FIXED Userdir UserDir cannot handle certain configurations, such as http://10.1.2.3/~* VERIFIED Includes Module Possible mod_include bug causing core dumps if SSI include fails due to incorrect .htaccess directive Current working directory can change while processing includes These bugs are related to specific operating systems. A/UX: Linger close fails on A/UX FIXED AIX: Compile warning for SERVICE_UNAVAILABLE FIXED Apollo Domain: Some compilation errors on Apollo Domain Digital Unix/OSF: V4.0 requires -lm because the frexp() function has been removed from libc.so. Incompatible pointer type warning. IRIX: IRIX kernel fails to notify Apache of dead children FIXED Linux: File descriptor bug causing SEGV in includes module. FIXED. NeXTSTEP: support/logresolve.c does not compile because of strdup OS/2: Simplified code for OS/2 FIXED. OS/2 filesystem is case-independent, can cause URLs to fail to match protection limitations QNX: Missing prototypes for QNX FIXED. SCO: Dumps core in status module with a Floating exception when compiled with -DSTATUS on SCO ODT 3.0 SGI: Compile warning in http_bprintf FIXED Ultrix: Compile error in http_main.c UnixWare: Configuration updated for UnixWare (needs NEED_LINGER) Example URLs for status and info Example URLs for status and info pages (/status and /info) can intercept other URLs (e.g. anything in a directory called /info or /information). FIXED. ScriptAlias and PATH_INFO problem Bug in the SCRIPT_NAME passed to CGI where the ScriptAlias directory included some PATH_INFO. FIXED VHosts Host: header can override IP virtual hosts to give access to other vhosts's information. VERIFIED. IP-based Virtual hosts on main IP address but different ports not working. VERIFIED. Directives with on/off arguments Directives that taken an argument that is either "on" or "off" infact accepted any argument. FIXED. Default configuration mime types can conflict with encodings Default mime.types contains content-types for gz and Z extensions, but should be given as encodings with AddEncoding. FIXED Port directive Apache accepts non-numeric Port number. FIXED. Authoritative misspelt Spelling of authoritative (as authorative) wrong in auth_anon and auth_msql FIXED. Finally, a few bugs reports cannot by verified or discounted. That is, they may or may not exist, but cannot be reliably reproduced. While they may be Apache bugs, they could also be bugs in the operating system, or problems related to particular load conditions or configurations. Any further information about these possible bugs should be reported on the apache-bugs email address or Web page. CGIs intermittently fail with 'premature end of file error' on site with 100 vhosts. Occurs even with low load. Server will not respond after a few days of running. Instead of the 5 processes typically running, there is only one. Server accepts the requests, but never responds. This site makes heavy use of CGIs (>50% of all requests). Some hits are not logged in the access_log, or logged as "NULL". Using Certificate Revocation Lists Certificate Revocation Lists (CRL) increase the security of Client Authentication Realms by enabling server administrators to block client certificates that have been revoked because they are known to have been compromised. Mike Leach and Tim Starr take a look at how to get CRLs working with mod_ssl and Apache. Feature: Using Certificate Revocation Lists One of the most common kinds of access control for secure web servers is Basic Authentication, in which a login and password are required. Access controls can apply to part or all of a web site. The restricted area is called the "authorization realm." Even though Basic Authentication is the most common kind of access control, it is not the most secure. The most secure kind of access control is Client Authentication. Client Authentication uses client certificates installed in users' web browsers or other client applications (clients) to authenticate users, and only lets clients with the right client certificates into the authorization realm. (In this article, an authorization realm with client authentication will be called a "Client Authentication Realm.") A client certificate is issued by a Certificate Authority (CA). A CA checks whether a client certificate applicant meets the CA's criteria for trustworthiness before issuing the client certificate. The client certificate is good for access to the Client Authentication Realm until its validity expires. After expiration, the user will be blocked. To renew access, the user's trustworthiness must be reaffirmed by the CA before renewal of the client certificate. This checking when client certificates are issued and renewed helps to ensure that valid client certificates are only in the hands of users trusted to get into the Realm. However, a client certificate can be compromised before it expires. For example, it can fall into the wrong hands, or the CA may decide that the user it was issued to is not trusted anymore. To reject client certificates which are known to be compromised before expiration, a web server consults a Certificate Revocation List (CRL). A CRL is a list of client certificates that were revoked before they expired. Clients with revoked client certificates will be denied access to a Client Authentication Realm if the revoked client certificates are in the server's CRL. This article explains how to configure Apache+mod_ssl to keep clients with revoked client certificates out of a Client Authentication Realm. Don't forget to make a backup of your configuration files and keys and certificates before trying these examples. This article assumes that you have: Apache+mod_ssl installed on your machine A browser that supports client certificates such as Netscape Navigator or Microsoft Internet Explorer A revoked client certificate installed in the browser The root certificate (rootcert) which signed the client certificate The CRL file which includes the revoked client certificate. The client certificate, rootcert, and CRL file must be issued by a CA. The CA can be a third-party application or service, or OpenSSL (the SSL toolkit on which mod_ssl is based) can be used as a CA. The certificates and CRL must be in the PEM (base64-encoded x509) format required by mod_ssl. The Client Authentication Realm can be either a secure virtual host or a directory. Make sure these directives are in the secure virtual host or directory container for the Realm in httpd.conf: SSLVerifyClient require SSLVerifyDepth 10 After these changes are made and the server is restarted so the changes take effect, clients without client certificates will be kept out of the Client Authentication Realm. Even browsers with client certificates will be denied, unless the rootcert has already been installed on the server. Test this by trying to access the Realm with a browser without a client certificate (or with a client certificate with an uninstalled rootcert). To let a client with a client certificate into the Client Authentication Realm, the rootcert must be installed. This can be done with the <SSLCACertificateFile> directive (or with SSLCACertificatePath, which will not be covered here). Install the rootcert by adding it to the default SSLCACertificateFile, client-rootcerts.pem. If the rootcert filename is ca.crt, The rootcert can be added with this command: cat ca.crt >> client-rootcertificates.pem The rootcert can also be made the SSLCACertificateFile instead of client-rootcerts.pem if none of the other rootcerts in the default SSLCACertificateFile are needed. After the server is restarted again, browsers with client certificates signed by the installed rootcert will be let into the Client Authentication Realm, even if the client certificates are revoked. Revoked client certificates will not be blocked until the CRL is enabled. Test this by accessing the Realm with a browser that has a client certificate that is revoked and signed by the installed rootcert. Make a CRL directory such as /ServerRoot/crl/. Copy the CRL file (ca.crl) into the CRL directory, then configure the CRL in httpd.conf with either SSLCARevocationFile or SSLCARevocationPath: With SSLCARevocationFile, put this directive in the secure virtual host container for the Client Authentication Realm: SSLCARevocationFile /ServerRoot/crl/ca.crl SSLCARevocationPath requires two steps. First, put this directive in the secure virtual host container for the Client Authentication Realm: SSLCARevocationPath /ServerRoot/crl/ Next, make a symlink of the CRL file in the CRL directory, with a filename based on a hash of the CRL file: ln -s ca.crl `openssl crl -hash -noout -in ca.crl`.r0 Every CRL file in the SSLCARevocationPath must have one of these symlinks. After the web server is re-started, the CRL will be enabled. Clients with revoked client certificates will not be let into the Client Authentication Realm and will get a browser error message saying that access was denied because the client certificate was revoked. An error message such as this will appear in the /ServerRoot/ssl/error_log: [Thu Aug 31 15:32:47 2000] [error] mod_ssl: Certificate Verification: Error (23): certificate revoked There are a couple of known problems which may come up because of differences between the CRLs issued by CA software and mod_ssl's requirements. One is that CA software may issue CRLs without the required start -----BEGIN X509 CRL----- and end -----END X509 CRL----- lines. Here is an example of a CRL generated with OpenSSL that works with mod_ssl: -----BEGIN X509 CRL----- MIIBmjCCAQMwDQYJKoZIhvcNAQEEBQAwgb0xCzAJBgNVBAYTAlVTMRMwEQYDVQQI EwpDYWxpZm9ybmlhMRAwDgYDVQQHEwdPYWtsYW5kMRYwFAYDVQQKEw1SZWQgSGF0 LCBJbmMuMSIwIAYDVQQLFBlHbG9iYWwgU2VydmljZXMgJiBTdXBwb3J0MR0wGwYD VQQDExRSZWQgSGF0IFRlc3QgUm9vdCBDQTEsMCoGCSqGSIb3DQEJARYdc3Ryb25n aG9sZC1zdXBwb3J0QHJlZGhhdC5jb20XDTAwMTExMzIwNTcyNVoXDTAwMTIxMzIw NTcyNVowFDASAgEBFw0wMDA4MzEyMTE5MTdaMA0GCSqGSIb3DQEBBAUAA4GBAIge X5VaOkNOKn8MrbxFiqpOrH/M9Vocu9oDeQ6EMTeA5xIWBGN53BZ/HUJ1NjS32VDG waM3P6DXud4xKXauVgAXyH6D6xEDBt5GIBTFrWKIDKGOkvRChTUvzObmx9ZVSMMg 5xvAbsaFgJx3RBbznySlqVU4APYE0W2/xL0/8fzM -----END X509 CRL----- Another problem is that CRLs issued by third-party CA software may not have all the fields required by mod_ssl. It may be possible to configure the CA software to issue CRLs with all the required fields. Use this OpenSSL command to view the CRL: openssl crl -text -noout -in filename Then compare its fields to those in the sample CRL above to see if the same fields are in your CRL: Certificate Revocation List (CRL): Version 1 (0x0) Signature Algorithm: md5WithRSAEncryption Issuer: /C=US/ST=California/L=Oakland/O=Red Hat, Inc./OU=Global Services and Support/CN=Red Hat Test Root CA/Email=stronghold-support@redhat.com Last Update: Nov 13 20:57:25 2000 GMT Next Update: Dec 13 20:57:25 2000 GMT Revoked Certificates: Serial Number: 01 Revocation Date: Aug 31 21:19:17 2000 GMT Signature Algorithm: md5WithRSAEncryption 88:1e:5f:95:5a:3a:43:4e:2a:7f:0c:ad:bc:45:8a:aa:4e:ac: 7f:cc:f5:5a:1c:bb:da:03:79:0e:84:31:37:80:e7:12:16:04: 63:79:dc:16:7f:1d:42:75:36:34:b7:d9:50:c6:c1:a3:37:3f: a0:d7:b9:de:31:29:76:ae:56:00:17:c8:7e:83:eb:11:03:06: de:46:20:14:c5:ad:62:88:0c:a1:8e:92:f4:42:85:35:2f:cc: e6:e6:c7:d6:55:48:c3:20:e7:1b:c0:6e:c6:85:80:9c:77:44: 16:f3:9f:24:a5:a9:55:38:00:f6:04:d1:6d:bf:c4:bd:3f:f1: fc:cc If your CA issues CRLs that do not work with mod_ssl and have fields that differ from those in the example shown above, consult your CA administrator or software vendor to see if it can be reconfigured to issue CRLs with the same fields as above, and, if so, how. CRLs increase the security of Client Authentication Realms by enabling server administrators to block client certificates that have been revoked because they are known to have been compromised. Without CRLs, server administrators would have to wait for the client certificates to expire, or change CA certificates and issue new client certificates to all users who are still trusted to access the Realm. Waiting for the client certificates to expire would risk having untrusted users get into the Realm until expiration, while issuing and installing new client certificates to all users who are still trusted would be a great inconvenience both to server administrators and to users. CRLs allow server administrators to avoid this inconvenience by blocking revoked client certificates without affecting unrevoked client certificates. The authors would like to thank Shari Miller and Simona Nass for their comments on earlier drafts of this article. DBM User Authentication With more than a few users, keeping user passwords in a .htpasswd file can get inefficient and slow down page accesses considerable. DBM user files let sites efficiently store many tens or thousands of users (or more) with very quick access. This feature explains what DBM is, and how to use it with Apache. DBM User Authentication This week, we explain how to store user authentication information in DBM files for faster access when you have thousands of users. The feature on User Authentication shows how to restrict pages to selected people. We showed how to use the htpasswd program to create the necessary .htpasswd files, and how to create group files to provide more control over the users. We also said that .htpasswd files and group files like this are not very efficient when a large number of users are involved. This is because these are plain text files and for every request in the authenticated area Apache has to read through the file looking for the user. A much faster way to store the user information is to use files in DBM format. This article explains how to create and manage DBM format user authentication files. DBM files are a simple and relatively standard method of storing information for quick retrieval. Each item of information stored in a DBM file consists of two parts: a key and a value. If you know the key you can access the value very quickly. The DBM file maintains an 'index' of the keys, each of which points to where the value is stored within the file, and the index is usually arranged such that values can be accessed with the minimum number of file system accesses even for very large numbers of keys. In practice, on many systems a DBM 'file' is actually stored in two files on the disk. If, for example, a DBM file called 'users' is created, it will actually be stored in files called users.pag and users.dir. If you ever need to rename or delete a DBM from the command line, remember to change both the files, keeping the extensions (.pag and .dir) the same. Some newer versions of DBM only create one file. Provided the key is known in advance DBM format files are a very efficient way of accessing information associated with that key. For web user authentication, the key will be the username, and the value will store their (encrypted) password. Looking up usernames and their passwords in a DBM file will be more efficient than using a plain text file when more than a few users are involved. This will be particularly important for sites with lots of users (say, over 10,000) or where there are lots of accesses to authenticated pages. If you want to use DBM format files with Apache, you will need to make sure it is compiled with DBM support. By default, Apache cannot use DBM files for user authentication, so the optional DBM authentication module needs to be included. Note that this is included in addition to the normal user authentication module (which uses plain text files, as explained in the previous article). It is possible to have support for multiple file formats compiled into Apache at the same time. To add the DBM authentication module, edit your Configuration file in the Apache src directory. Remove the comment from the line which currently says # Module dbm_auth_module mod_auth_dbm.o To remove the comment, delete the # and space character at the right-hand end of the line. Now update the Apache configuration by running ./Configure, then re-make the executable with make. However, before compiling you might also need to tell Apache where to find the DBM functions. On some systems this is automatic. On others you will need to add the text -lndbm or -ldbm to the EXTRA_LIBS line in the Configuration file. (Apache 1.2 will attempt to do this automatically if needed, but you might still need to configure it manually in some cases). If you are not sure what your system requires, try leaving it blank and compiling. If at the end of the compilation you see errors about functions such as _dbm_fetch() not being found, try each of these choices in turn. (Remember to re-run ./Configure after changing Configuration). If you still cannot get it to compile, you might have a system where the DBM library is installed in a non-standard directory, or where the there is no DBM library available. You could either contact you system administrator, or download and compile your own copy of the DBM libraries (a good choice might be GDBM: read about it or download it). For standard (htpasswd) user authentication password files, the program htpasswd is used to add new users and set their passwords. To create and manage DBM format user files another program from the Apache support directory is used. The program is called dbmmanage and is written in perl (so you will need perl on your system, and it will need to have been compiled with support for the same DBM library you compiled into Apache. If you have only just installed DBM on your system you will might need to re-compile perl to build in DBM support). This program can be used to create a new DBM file, add users and passwords to it, change passwords, or delete users. To start by creating a new DBM file and adding a user to it, run the command: dbmmanage /usr/local/etc/httpd/usersdbm adduser martin hamster The creates the DBM file /usr/local/etc/httpd/usersdbm (which might actually consist of /usr/local/etc/httpd/usersdbm.dir and /usr/local/etc/httpd/usersdbm.pag), if it does not already exist. It then adds the user 'martin' with password 'hamster'. This command can be used with other usernames and passwords to add more users, or with an existing username to change that user's password. A user can be deleted from the password file with dbmmanage /usr/local/etc/httpd/usersdbm delete martin You can get a list of all the users in the DBM file with dbmmanage /usr/local/etc/httpd/usersdbm view Now you have a DBM user authentication file with some users in it, you are ready to create an authenticated area. You can restrict a directory either using a <Directory> section in access.conf or by using a .htaccess file. The feature on user authentication explained how you can set up a basic .htaccess file, using this example: AuthName "restricted stuff" AuthType Basic AuthUserFile /usr/local/etc/httpd/users require valid-user To use DBM files, the only change is to replace the directive AuthUserFile line with AuthDBMUserFile /usr/local/etc/httpd/usersdbm This single change tells Apache that the user file is now in a DBM format, rather than plain text. All the rest of the user authentication setup remains the same (so the authentication type is still Basic, and the syntax of require is the same as before). Each user can be in one or more "groups", and you can restrict access to people just in a specified group. This makes it possible to manage all your users on your site in a single database, and customise the areas that each can access. The use of DBM files for storing group information is particularly efficient because you can use the same file to store both password and group information. The dbmmanage command can be used to set group information for users. For example, to add the user "martin" to the group "staff", you would use dbmmanage /usr/local/etc/httpd/users adduser martin hamster staff You put a user into multiple groups but listing them, separated by commas. For example, dbmmanage /usr/local/etc/httpd/users adduser martin hamster staff,admin Note that dbmmanage has to be told the password as well, and there is no way to set or change group information for a user without knowing their password. This means in practice that dbmmanage is not suitable for managing users in groups, and you will have to write your own management scripts. Some help writing perl to manage DBM files is given later in this article. After creating a user and group file containing details of which users are in which groups, you can restrict access by these groups. For example, to restrict access to an area to only people in the group staff, you could use: AuthName "restricted stuff" AuthType Basic AuthDBMUserFile /usr/local/etc/httpd/users AuthDBMGroupFile /usr/local/etc/httpd/users require group staff The supplied dbmmanage script to manage DBM files is adequate for basic editing, but cannot handle advanced use, such as managing group information. It is also command line driven, while a Web interface might be a better choice in many situations. To do either of these things you will have to write programs to manage DBM files yourself. Using perl this is not too difficult. As a simple example, say you have an existing .htpasswd file and you want to convert it to a DBM file, putting all the users in a specific group. We will introduce the concepts here, and there is a link below to the completed program for you to download. It will be written in Perl which is quick to write and easy to customise, although the principles of DBM use are the same whatever language is used. The basic way to look in a DBM file is given here. DBM files are opened in Perl as 'hashed arrays'. The "key" is the user name, and the value is the encrypted password and optionally group information. A simple script to lookup all the keys and values in a DBM is: dbmopen(%DBM, "/usr/local/etc/httpd", 0644) || die "Cannot open file: $!\n"; while (($key, $value) = each %DBM) { print "key=$key, value=$value\n"; } dbmclose(%DBM); Note that if the given DBM file does not exist, it will be created. This script will work with both perl 4 and perl 5 (although Perl 5 users might prefer to use the new tie facility instead of dbmopen). To lookup a known key you would use: $key = "martin"; dbmopen(%DBM, "/usr/local/etc/httpd", 0644) || die "Cannot open file: $!\n"; $value = $DBM{$key}; if (!defined($value)) { print "$key not stored\n"; } else { print "key=$key, value=$value\n"; } dbmclose(%DBM); Now we can write a script to convert a htpasswd file into a DBM database, optionally putting each user into one or more groups. The script is htpasswd2dbm.pl, and is used like this: cd /usr/local/etc/httpd htpasswd2dbm.pl -htpasswd users usersdbm The -htpasswd option specifies the htpasswd file to be read, the the final argument is the DBM file to create (or add to). To set a group, use the -group argument. For example, to put all the users from this file into the groups admin and staff, use htpasswd2dbm.pl -htpasswd users -group admin,staff usersdbm The program will add users to an existing DBM database, so it can be used to merge multiple htpasswd files. If you give users from different files different groups, you will be able to set up access restrictions on a group-by-group basis, and manage all your users in one database. Note that if there is already a user with the same username in the DBM file it will be overwritten by the new information. Group information stored in a DBM file as part of the value. If no group information is stored, the key associated with a username just consists of the encrypted password. To store group information, the encrypted password is followed by a colon, then a list of groups that the user is in, each separated by a comma. So a typical key might look like this: E7yT67YGht65:admin,staff A program written in perl can easily extract the group information, for example: $value = $DBM{$key}; ($enc, $groupfield) = split(/:/, $value); @groups = split(/,/, $groupfield); It is also possible to store additional information in the DBM file, by following the groups list with a colon. Apache will ignore any data after a colon following the groups list, so it could be used, for example, to store the real name and contact details for the user, and an expiry date. This could be stored in the DBM like this: $DBM{$key} = join(":", $enc, join(",", @groups), $realname, $company, $emailaddr, $expdate); Keeping all the user information together in a database like this, which Apache can also use for user authentication, can make administering a site with many users simpler. Dynamic Page Languages From SSI to CGI via PHP and perl: which language should you use for your dynamic pages? Feature: Dynamic Page Languages When choosing how to generate dynamic pages there are serveral things to consider: Performance: dynamic pages require more work on the server, so are less efficient than static files, but some types of dynamic pages are more resource efficient than others. Complexity: dynamic features can be generated from relatively simple code build into HTML pages (called "embedded"), through to self contained programs written in C or perl, using the CGI interface. Security: some methods of generating dynamic pages allow you to use a programming or scripting language on your server. There is a risk of letting users access things on your system that they should not do if the pages are poorly written. Traditionally there were three ways of getting dynamic pages on your site: use "server side includes" (SSI) inside HTML pages, use a scripting language such as Perl or PHP, or use a compiled programming language such as C or Pascal. Both scripts and compiled programs were accessed using "CGI". But the distinctions are becoming more blurred. SSI as implemented in Apache 1.2 now has variables and conditional execution, making it more like a scripting language, while the PHP scripting language can be embedded into HTML pages. There is even a module to embed perl commands into HTML pages. Also, many scripting languages can be built into Apache as Apache modules, rather than using CGI. This makes executing the scripts much more efficient, since an interpreter does not need to be started for very request. There are two ways to get the server to run your programs: either embed a script into an HTML document, or create a standalone program which makes use of the CGI interface. Embedded scripts are easier to write but restrict you to the languages available for embedding, while CGI can be used with any language. The traditional embedded language is "Server-Side Includes" (SSI) but other scripting languages are available which can be embeded. Embedded commands are executed by the server before it serves the page to the client (so serving HTML pages containing embedded commands is slower than serving straight HTML pages). Embedded pages can be processed either by an Apache module or a CGI program. Using a module will be much faster. Languages available for embedded use include SSI, PHP, Perl and NeoScript (of these, SSI is built into Apache by default, while the others require a new module to be compiled in). The alternative to embedding the commands into HTML is to write self-contained programs. These usually use the CGI, or Common Gateway Interface, to work with the server. The CGI specification says how servers should talk to the script or program and how the script or program formats its reply for use by the server. CGI is not a language itself. If you know the CGI protocol you can write programs for use with a web server in any language. If you want better performance from your pages (by performance we mean low use of resources, resulting in more pages served more quickly), you should use either a pre-compiled language (such as C) and CGI, or a scripting language which is available as an Apache module. In the case of the perl and python modules, preload scripts or data that will be used often. If you are thinking of using CGI, you might consider using FastCGI instead. FastCGI is an alterative method of running programs from a server which has several new features and is more efficient than normal CGI. If your CGI is in perl, think about using mod_perl to pre-load the perl scripts (and, where possible, to open database and similar connections when Apache starts and re-use them across multiple requests). Of course the best performance can be obtained by using static pages instead of dynamic ones. You might consider pre-generating HTML files, rather than serving up dynamic pages if possible. For example, if your readers access pages from a database, it might be faster to export those pages into HTML every so often, rather than lookup the records in the database for every request. Alternatively (or in addition) consider using a local cache in front of your Apache server. The client would connect to the cache first, and if that page has already recently been requested, the cache would return it without calling the server. This sort of local cache is also called a "server accelerator". Your dynamic pages will have to be set up to allow them to be cached though (SSI pages, for example, are usually not cacheable). Security is a very important considerable when thinking about dynamic pages. All CGI programs, both scripted and compiled, are potentially insecure. You have to be very careful when writing CGI programs, for instance, to ensure that Internet users cannot execute programs on your server or read files they should not have access to. Another security issue which might be important is related to other local users. For example, you might want to let your customers or colleagues use a dynamic language. But if you let them write CGI programs they could write a program which accesses other people's files (since by default all CGI programs run as the same user). More limited scripted languages (such as SSI) might be safer in this situation. Finally, here is a reference list of ways of including dynamic pages on your site. Language Embedded? Apache Module? Description SSI Yes Yes Traditional "Server Side Includes" allow simple dynamic pages. Apache 1.2 extends SSI to include variables and conditional code. Already part of Apache. Because of the restricted range of commands this can be more secure than other languages, and Apache has the ability to turn off some less secure features. PHP Yes Yes A more comprehensive embedded language than SSI, with built-in support for many databases (such as mSQL, mySQL, DBM), page counters. NeoScript Yes Yes An embedded scripting language based on Tcl. Meta-HTML Yes No An extended version of SSI. Python No Yes Python is an interpreted object-orientated language. This module builds the Python interpreter into Apache for better performance than normal CGI. embedded Perl (ePerl) Yes No Perl is a powerful general purpose interpreted (scripting) language. This module lets you embed arbitrary Perl commands into your HTML. mod_perl Yes Yes Perl is an advanced interpreted language. This very powerful module integrates Perl into Apache, letting you pre-load Perl scripts, re-use resource across multiple requests, and even write whole Apache modules in Perl. This gives you much more access to and control over the server than CGI programs in Perl (which this module also supports). The ability to write modules in perl makes it possible to extend the server's functionality relatively easily, without the complexity of writing a module in C. Compiled languages (C, Pascal, Fortran, etc) No No Facilities available depend on language. Usually more efficient than scripted or embedded languages. Has to be written to use CGI protocol, or an equivalent such as FastCGI. Scripting languages (Perl, Python, shell, etc) No* For Some Languages Facilities available depend on language. Unless an Apache module is available, has to be written to use CGI protocol. When using CGI is less efficient that compiled languages or scripting languages using an Apache module. (Note: * Perl can be embedded if eperl or mod_perl is used). Java No Yes Use mod_jserv to call Java "servlets" from Apache. It is impossible to recommend the "best" dynamic page language since what is best will depend on your needs. However some general conclusions can be drawn. If you do not already know a scripting or programming language, use one of the embedded languages. SSI is probably the simplest, but PHP has some useful extra features. If you want a language than is quick to develop in and efficient, use an embedded language such as PHP or embedded perl, or use perl with mod_perl. If you prefer other scripting languages, use one with an Apache module (e.g. python). If you already use perl CGI programs, consider moving over to using mod_perl, which will give you much better performance and more control over the server. If you want a "full" programming language for arbitary programs, either use any compiled language (e.g. C) or use perl with the perl module. If you've been put off Perl because of concerns about performance, think again. The module makes it very efficient, and the ease of development and large range of add-on perl modules (packages) make developing applications more convenient. To make external CGI programs more efficient, use FastCGI instead of CGI, or write in Perl and use mod_perl. For Java programs, use mod_jserv. The final way to make a top-performance dynamic page is to write an Apache module. This is complex and requires care to ensure that you do not "leak" resources or affect the rest of the server, but will give the best performance. Modules have to be written in C (although it might be possible to link in other languages). An alternative to writing modules in C is to use mod_perl, which lets you develop Apache modules in perl. Apache 1.2 Guide A guide to everything new and changed in Apache 1.2 Major Features The biggest single change in Apache 1.2 is the support for HTTP/1.1. However there are also major changes to simplify configuration, provide better help, speed up network transfers, log requests to multiple files, switch UID for running CGIs, use regular expressions in various places, make debugging CGI easier, and more. Apache 1.2 is fully compliant with the new HTTP/1.1 standard (except for the proxy module). Some of the power of HTTP/1.1 support will not be apparent until browser are available which implement it. The major changes are: All possible status values are now defined Byte ranges fully implemented for receiving Content negotiation by content type, language, charset and encoding Content negotiation can return 406 status with a list of possible variants, if none are suitable for the browser's preferences Much better cache control with Cache-Control and Vary headers, and use of entity tags (etags) New preconditions with If-Match, If-None-Match, If-Range, If-Unmodified Since request headers New request methods OPTIONS and TRACE join the existing GET, PUT, POST etc Persistent connections implemented, and internally copes with some known buggy browsers Resources can be in multiple languages Sends a 'etag' with the response where possible (i.e. if sending a file), which can be used for more efficient caching Support for reading and sending 'chunked' encoding The default handler can send byte ranges and multipart documents Configuration process simplified Configuring Apache is now much easier. The Configure script automatically identifies the operating system and compiler to use. These can still be set in Configuration if required. Many more operating systems are now supported. A Makefile is created in the support directory. Better Help, Documentation and Bug Tracking Various updates to provide help: the new -h option lists all the available directives, while -l lists available (compiled) modules. The descriptions of the directives has been updated and expanded. A -v option gives the version number of Apache. The full Apache documentation comes in the distribution, while a new FAQ and comprehensive bug tracking database are available on www.apache.org. Network Improvements Persistent connections are now faster, and are used in more cases. Network traffic has been reduced. Persistent connections are not used if the browser appears to be one that has a bug in its implementation. Graceful Restarts to Avoid Dropping Connections Apache can be told to re-read configuration files and re-open log files, without dropping connections in progress, as currently happens with a -HUP restart. Better Logging The configurable log module is now the default. It can now log each request to multiple log files, each in a different format. There are several extra items which can be logged: filename (%f), notes from other modules (%n), port of request (%p), PID of child handling request (%P), formatted time (%t), time to service request in seconds (%T), URL path requested (%U) and name of server or vhost (%v). More Control over Files It is now possible to apply directives to individual files with <File>, which can appear in access.conf or .htaccess files. Multiple files can be selected using regular expressions (which can also now be used in <Directory> and <Location>). Running CGIs as Other Users A helper program (suexec) can be configured to run CGI scripts as other users. If the CGI is in a public_html directory, it can run as the user whose directory it is in, or a user can be set for each virtual host. Various security checks are performed before running CGI as another user. More NCSA-Compatibility Some directives have been updated to be more compatible with the NCSA HTTPd. The Satisfy, RedirectTemp and RedirectPerment directives are now implemented. AuthUserFile and AuthGroupFile can now take an argument to specify dbm format files. KeepAlive and MaxKeepAliveRequests are NCSA compatible. Easier CGI Debugging It is now possible to log the input and output of a CGI script when an error occurs. This will make debugging CGI programs much easier. More Includes Directives Server-Side-Includes (SSI) have a number of important new features. Variables can now be set and tested, and regular expressions can be used. Code can be conditional, using if...endif directives. Content Negotiation Enhanced Content negotiation has been updated to meet the HTTP/1.1 specification. In addition some special cases are catered for to cope with browsers which currently send incomplete negotiation information. Better Control over Options Options can now be set or removed on an individual basis, rather than having to set all the options at once. More Configurable Authentication It is now possible to restrict pages by username and password, but to let users from particular domains access the pages without giving a password. This is implemented with the Satisfy directive. Restrictions can be applied to individual files with <File>, and to files which match a regular expression. SetUID Execution of CGI Programs CGI programs can be executed as other users (on a per-virtual host or per-userdirectory basis) if the optional suEXEC code is compiled. Conditional Modules and Directives Part of the configuration files can be made conditional, depending on what modules are currently loaded. The <IfModule>...</IfModule> section surrounds directives which are only executed a particular module is loaded (or not, if the test is negated). Compiled in modules can be activated or disabled, with ClearModuleList and AddModule. Preventing Too Many Resources Being Used New directives can set total amount of resources that can be used to child processes (such as CGI scripts). This can be used to prevent run-away scripts from taking over the system. The resources which can be limited are: cpu usage, virtual memory usage and number of (sub-) processes. This feature is available on operating systems which implement these restrictions. Virtual Host can Handle Multiple Addresses and be a Default Each virtual host can now be configured to handle requests on multiple addresses, by listing the addresses in the <VirtualHost ...> directive. Also a virtual host can be defined to accept requests not handled by any other host (instead of leaving them to the main server configuration). Can Return HTTP Redirect Permanent, Gone or See Other Status The Redirect directive has been enhanced to allow for additional response codes. The current Redirect directive always returns a "temporary redirect" code. In 1.2, the redirect code can also be "permanent redirect" or "see other", or a resource can be marked as "gone" (permanently removed). Better and More Robust Performance The code has been cleaned for easier maintenance and to fix various bugs. Error conditions are dealt with better, including network problems, timeouts and signals. It is better commented. Various performance optimisations have been applied to enhance speeds. Network traffic has been reduced where possible by sending larger blocks of data. Persistant connections are used if possible, even after error statuses. Major Changes to the Proxy Module The proxy module has been extensively updated for this release. It is not yet compliant with HTTP/1.1. There are a lot more smaller changes, some of which are listed here: BADMMAP compilation directive removed Checks to see if Apache is linked to modules compiled with a previous version of the module API. Checks argument to Port directive is a number and not 0. Cookies used by the usertrack module are not sent by default, unless enabled by CookieTracking. The initial cookie request is now logged. The CookieLog directive is deprecated. Does not flush output after headers (with was a 'hack' to get around a bug in keep-alives in a some versions of Netscape. Apache now does not use keep-alives if this version is being used) The maximum value of MaxClients has been increased from 150 to 256. Attempts to set a value higher than this will display a warning message. Compilation rule to tell IRIX that NIS is running (Rule IRIXNIS=yes) Some systems failed to notice when the child Apache processes died, leading to scoreboard entries for dead processes. An explicit check for dead processes is now performed each 60 seconds, and the scoreboard updated if necessary. CGI programs can get the port on the remote system in the environment variable REMOTE_PORT and the original URI is REQUEST_URI. Error code number not shown in <h1>..</h1> on error page As defined in HTTP/1.1, an empty Accept-Encoding: request header means that no encoding is acceptable (previously it meant any encoding was acceptable) Status screen output has been tidied up, and now also lists the server host name servicing the request (the virtual host or main server) Responses can be marked as HTTP/1.0 rather than HTTP/1.1 if the force-response-1.0 environment variable is set Access can be denied based on which environment variables are set Return 404 status on POST to bad URL (previously used 405) Linux now defaults to shared-memory scoreboard (not available on 1.2 kernels, or Alpha hardware) Better error_log messages, including Unix system call error status Modules can be placed in separate directories If virtual host cannot be configured (hostname cannot be resolved) then Apache continues to start-up but disables this virtual host. Can now work-around bugs in MSIE and Netscape Navigator when serving PDF files, and bug in Navigator which cause cause broken images. Modules re-ordered to allow rewrite and alias modules to process requests before they are handled by the proxy module (if enabled). Preserve query_string information during a redirect. If the client connects but does not send a request, log a 408 ("Timed Out") error instead of a OK response (200). Major Modules Changes The following modules have been added to this version of Apache. Of these, only mod_browser is compiled in by default. The other modules here are optional, and to use them you need to uncomment the appropriate line in Configuration and re-compile Apache. API Example Module (mod_example) This example module can be used to see how Apache processes requests. It is not compiled in by default and should not be used in a "production" server. FastCGI (mod_fastcgi) This module implements the FastCGI method of invoking sub-processes, which is faster and more configurable than CGI. It is available from the FastCGI site and is not part of the Apache distribution. Set Response Expiry Times (mod_expires) This module can be used to set 'expiry' times on responses. This can be used to tell caches about the expected life-time of resources, to make caching more efficient or to prevent users seeing out-of-date information. Set or Remove HTTP Headers (mod_headers) This module allows individual HTTP headers to be set or removed. Set Environment Variables based on Browser (mod_browser) This module can be used to set environment variables based on the 'user agent' that created the request. This could be used to set environment variables based on the capability of the browser. Rewrite Requested URL (mod_rewrite) This module provides a generic way of re-writing the incoming request URL based on various aspects of the request. Cookies module renamed Usertrack (mod_usertrack) The cookies module (mod_cookies) has been rename usertrack (mod_usertrack) to prevent confusion over what it does. As in previous releases, this module is not compiled in by default. Config log module replaces common log (mod_log_config) The common log module (mod_log_common) has been replaced by the configurable log module (mod_log_config) as the default log module. This module has been enhanced to allow multiple log files, so it can also replace most of the functionality of the mod_log_referer and mod_log_agent modules (although it is not a complete replacement for these modules). Directive Changes This section lists the directives which are new in this release, or which have changed their behaviour or syntax. Note that only the modules compiled in by default are covered here, and the directives provided by the new modules are not listed (see the documentation for the module concerned for its directives). <Files>... </Files> section applies directives to individual files, or files that match a wildcard or regular expression. <IfModule>...</IfModule> make directives conditional depending on which modules which are compiled in CustomLog adds a transfer log with a custom MaxKeepAliveRequests sets the number of requests per connection instead of KeepAlive RLimitCPU, RLimitMEM and RLimitNProc limit resource usage of sub-processes Redirect can take an optional first argument giving the status value to return (one of temp, permanent, seeother, gone or a numeric status). RedirectTemp and RedirectPermanent added for NCSA-compatibility (but Redirect status should be used instead). ScriptLog set a logfile for CGI debug output ScriptLogBuffer set a maximum size for PUT or POST data logged to a ScriptLog file ScriptLogLength sets an overall maximum size for a ScriptLog logfile SendBufferSize sets the size of the TCP send buffer <Location> now only matches full URL segments (<Location> /i does not match URL /info, for example) <Location> and <Directory> can match the URL or path (respectively) against a regular expression <VirtualHost> can take multiple addresses Anonymous_Authorative has been renamed to Anonymous_Authoritative AuthDigestFile can take optional second argument of "standard" (for NCSA compatability) AuthUserFile and AuthGroupFile are now NCSA compatible, with an optional second argument which can be either dbm or standard (dbm is only valid if the optional mod_auth_dbm module is compiled in) Auth_MSQL_Authorative has been renamed to Auth_MSQL_Authoritative deny has been updated to allow an argument of user-agents followed by a list of user-agents to deny access IdentityCheck timeout now 30 seconds rather than 60 KeepAlive now takes an "On" or "Off" argument, rather than a number (if a number if used, 0 means Off while any other number means On). If switched on, the default requests per connection is 100. See also MaxKeepAliveRequests. Options can set or remove individual options, instead of replacing all the options currently in force Timeout defaults to 300 seconds instead of 1200 TransferLog can now be used more than once in each main server or virtual server User and Group can be set inside virtual host sections, and are used when running sub-processes (e.g. CGI) if the server is configured for setuid execution In all directives, a backslash character (\) now only escapes quotes or / chars (e.g. XXX "123\"456" gives argument 123"456. Previously \ could escape any character Configuration and Support Program Changes The conf directory contains examples of the four configuration files needed: httpd.conf, srm.conf, access.conf and mime.types. Each of these files has been updated slightly. httpd.conf Example BrowserMatch directive is given, which disables keep-alives for browsers which had a buggy implementation. srm.conf No changes (except in the sample domain names) access.conf An example <Location> section to log attempts to access the phf CGI program is given. phf has a security hole which is actively being exploited, and should immediately be removed. This example shows how to log people trying to access this program, possibly in an attempt to hack your site. The logging is done at the apache.org site, or you can log it locally using a supplied CGI program in the support directory. mime.types A type has been added for midi files, and removed for .gz and .Z files (they should be marked as an encoding type, not a media type). In all files, all domain names have been replaced with names that can never occur on the Internet. A new CGI program phf_abuse_log.cgi is provided which can log attempts to access the phf CGI program. The program suexec is provided as C source. If compiled, this can be used with Apache to allow for the execution of programs as users other than the default server user. It makes extensive checks before it runs the CGI as another user to prevent security problems. Other than these two new programs, there are no functionality changes to the programs in the support directory. The C programs have been updated to prevent compiler warnings on some systems, and the perl dbmmanage now creates passwords with a random 'salt'. Apache 1.3.6 Guide A guide to everything new and changed in Apache 1.3.4 New in 1.3.6 This is a guide to all the changes between Apache 1.2 and Apache 1.3.6. For each change, we say which version it was introduced in, so you can also use this feature to upgrade between 1.3.* versions. First published 25th September 1998. Last updated 26th March 1999. Apache 1.3.6 was released on 25th March 1999 and is now the latest version of the Apache server. The previous release was 1.3.4 (version 1.3.5 was never made publically available). Apache 1.3.6 is available in source form for compiling on Unix or Windows, in pre-compiled form for many common versions of Unix, and in pre-compiled for as an single-file installer on Windows. All the pre-compiled forms also include full source code. All all available for download from any Apache local download site. This is a bug fix and minor upgrade release, with a few new features. Users on Unix systems should upgrade to fix various bugs. Users on Windows systems should consider whether to upgrade, becausee htpasswd files that worked with 1.3.4 and earlier will not work with 1.3.6 unless updated. The main new features in 1.3.6 (compared to 1.3.4) are: Logging can be conditional based on whether an environment variable is set or not (see the CustomLog directive). mod_rewrite has much faster DBM and TXT maps through the use of an internal cache. Passwords in htpasswd files can be encrypted with MD5 instead of DES. On windows this allows encrypted passwords for the first time, using the new bin/htpasswd.exe program. Access restrictions can be applied to all methods (known and unknown) apart from specific named ones, with the new <LimitExcept...> section. On Windows, additional Start menu items have been added and the bug where the conf files where not being created has been fixed. On Windows, it is now possible to tell Apache to use the registry to find how to execute CGI scripts based on the file extension, with the new ScriptInterpreterSource directive. New in 1.3.4 There are several new features in 1.3.4 compared to 1.3.3: A default language for documents can be set with the DefaultLanguage directive. Mappings from file extension to handler can be removed with RemoveHandler The negotiation module has been extensively updated to support the latest version of the HTTP/1.1 specification, to fix various bugs and inefficiencies, and to add some support for the transparent content negotiation RFCs. All the new HTTP/1.1 methods required for WEBDAV (distributed authoring) have been added, so that they can be used by third-party modules to implement the DAV specifications. A default order for fancy directory indexes can be set with IndexDefaultOrder. New options have been added to ./configure: --target sets the executable name, --permute-module sets relative module order, --with-layout sets the directory layout and --shadow has been extended to specify the shadow directory name. There have been a number of important security fixes to Apache on Windows. The most important is that there is much better protection against people trying to access special DOS device names (such as "nul"). In addition, there is better processing of UNC paths, and Makefiles are now provided to allow Apache to be compiled on Windows 95. Apache 1.3.3 and earlier came with three configuration files in the conf directory: httpd.conf, access.conf and srm.conf. This was for purely historic reasons: any directive can appear in any file, and the configuration files can have any filename (although the configuration file defaults to conf/httpd.conf unless overridden with the -f command line option). Many people configure Apache using a single file, normally httpd.conf. This can be created by appending the contents of access.conf and srm.conf to httpd.conf, then removing access.conf and srm.conf. Apache 1.3.4 comes with this already done (although the access.conf and srm.conf files will exist containing a comment about why they are now empty). New in 1.3.4 compared to 1.2 There are many new features in Apache 1.3.4 when compared to Apache 1.2. The major features are: Support for Windows NT systems Apache now compiles and runs on Windows NT. It will also work, with slightly less functionality, on Windows 95. The current 1.3.4 release is not as well developed as the Unix version, and will be slower and may include some security problems (although it is much better than earlier 1.3 releases). For now it should be regarded as a "beta" quality release on Windows. See the separate section below on Apache for Windows. Better configuration and building process The Apache source files have been re-organised. Modules have been moved into sub-directories directories, making it easier to add additional modules. OS specific code has been moved into separate directories. A new command-line way of configuring and installing Apache has been added. The source file re-organisation has made it easier to add third-party modules. They can be dropped into a directory and, with the appropriate configuration command at build time, Apache will create the Makefile for the module and build it. Larger modules can have their own directory, and can integrate easier into the build process. If modules require additional libraries to command line arguments, they can add the required options themselves during the build process, without the user having to edit the Configuration file. The new way of configuring and building Apache is refered to in the source tree as "APACI". This provides a command-line method of configuring Apache rather than editing the "src/Configuration" file. This method also builds a Makefile which can be used to install Apache after it has been built. APACI consists of a new configuration program, called "configure", which should be given details of all the build options such sa destination directory, modules to be built and included, compiler to be used, and so on. This is the information previously placed into the "src/Configuration" file. "configure" will use a different directory structure during installation than the normal Apache layout, unless the --compat option if used. Support for dynamic modules Apache now supports loading of additional modules without having to recompile the source. This is refered to as "DSO" or "Dynamic Shared Objects" on Unix, and "DLL" on Windows. This means that a small Apache executable can be created, and other modules added as required. It also lets module developers release or sell modules in binary only form, ready to be loaded into a running Apache. With graceful restarts it is even possible to add or remove modules while Apache is running without any downtime. DSO and DLL functionality is provided by the new module mod_so. Modules can be built ready for dynamic loading with new directives in the src/Configuration file, or using APACI's "configure" script. Using the latter can also automatically build a correct configuration file for loading the dynamic modules. A program is also provided to build modules for dynamic loading without using the Apache source tree. Dynamic modules are supported on these operating systems: Windows, FreeBSD, OpenBSD, NetBSD, Linux, Solaris, SunOS, Digital UNIX, IRIX, HP/UX, UnixWare, AIX, ReliantUnix and generic SVR4 platforms Better performance There have been considerable internal changes to make Apache perform better than 1.2. Some of the more important changes are: the code which merges per-directory configurations (<Directory> sections) is more efficient, IP virtual hosts are looked-up in a hash table, less system calls are used when serving static pages, faster adaption to load spikes, less copying of data when assembling responses for sending to the client, and so on. Better security Public web servers are always open to the risk that someone will try to attack the server. Apache is carefully written to try to eliminate as far as possible the damage that this can cause. The most serious type of attack is where the attacker can gain some kind of unauthorised access to the server system. There are no known ways of doing this with recent versions of Apache. So attackers may decide to use a "denial of service" attack. This is where they know that they cannot get into the system, so instead they try to overload the server to prevent it being used by anyone else. Obviously there is little that can be done when someone decides to attempt to overload the server by sending more and more requests, because those requests are usually indistinguishable from real requests. The load on the server in this case will increase in direct relationship with the speed of the attack. However in Apache 1.2 there were some ways in which the attacker could make the load on the server increase much more rapidly than the speed of the attack. These have been eliminated in 1.3. To help server administrator limit the amount of resources used by attackers, there are now also a series of new directives which can be used to specify limits on the size of each request. The size of the request line, the number of request headers, the size of the request header lines, and the size of any request body can now all be limited. If the server administrator does not trust users on the server system (for example, if the server is a multi-user system for customers to provide web documents), there were additional potential denial of service attacks available in 1.2. These included putting extra long lines in .htaccess files or creating .htaccess files that were actually special devices. These have been eliminated in 1.3.2. Enhanced virtual host configurations Virtual host handling has been updated. For IP based virtual hosts, finding the virtual host for a given request is faster because the configurations are stored in a hash table. For name-based virtual hosts, the configuration has been made less ambiguous. It is now necessary to explicitly state which IP:port combination will be used for name-based requests, and requests coming in on this IP:port will only get served by virtual hosts defined for that IP:port. See Apache name-based virtual hosts. The order that virtual hosts are used in the configuration file has been reversed from Apache 1.2. Now the virtual hosts listed first in the configuration file have priority over those listed later. To help debug virtual host configurations, the new command line option -S displays how Apache has parsed the virtual host information in the configuration files. The features above are the major changes between 1.2 and 1.3.4. This section lists most of the remaining changes, sorted into some broad categories. As well as new features, 1.3.4 has a lot of bug fixes compared with 1.2.X. Configuration parsing: Multiple whitespace characters within quoted strings in configuration files are no longer compressed to a single space (1.3.2) Better error checking in configurations: reports missing closing section directives, reports if directive which are not valid within <VirtualHost> sections are used in a <VirtualHost> section, reports invalid multiple arguments to <Files>, <Directory>, etc (1.3.0) <DirectoryMatch> sections are applied after all <Directory> sections have been applied (1.3.0) Include directive added to read additional configuration files (1.3.0). Extend to allow Include directive in .htaccess and <Directory> sections (1.3.2) Command line options: Add a -t command line option for testing the syntax of the configuration files (does not check .htaccess files) (1.3.1) Add ability to process configuration directives given on the command line. The option -c "directive" gives a directive to process before reading the configuration files, and -C "directive" gives a directive to process after reading the configuration files (1.3.0) New command line option -V displays the options used when compiling Apache (1.3.0) New command line option -S displays the virtual host configuration (1.3.0) The -S option now does not attempt to start the server: it will exit after showing the virtual host configuration (1.3.4) The -h, -l and -L options have changed meaning in 1.3.4. Previous -? gave a list of options, -l gave a list of directives and -h gave a list of modules compiled into the server. In 1.3.4, -h gives a list of options, -l gives a list of modules and -L gives a list of directives (1.3.4) Child processes, CGI and SSI: Does not pass invalid environment variable names to child (CGI) processes. Any invalid character in a variable name is replaced with an underscore (1.3.0) REMOTE_HOST environment variables is not set if IP cannot be resolved to a hostname (1.3.0) Add SERVER_SIGNATURE environment variable containing the sigature as controlled by ServerSignature directive (1.3.3) Add VARIANTS environment variable from the spelling module containing list of possible matching URLs (1.3.3) Logging and error messages: The default text of a 404 error message changed from "File Not Found" to "Not Found" (1.3.0) In log formats, %a logs the client IP address and %h now logs only the hostname (never an IP address). If no hostname is available for a given IP address, %h logs "-" (1.3.0) In log formats, %v and %p log the server name and port from the configuration files, not the request (1.3.4) In log formats, %V logs the hostname of the request, subject to the setting of UseCanonicalName. This is the same behaviour as %v in 1.3.3 and earlier (1.3.6) Does not log an error about "handler not found" if a handler was found, but declined to serve the request (1.3.1) The Apache parent process will log the reason why a child process dies, if it dies from an unexpected signal (1.3.0) Logs client IP addresses in error_log messages (this was in 1.2, but not in 1.3.0 or 1.3.1. It is restored in 1.3.2) Fix problem where mod_usertrack could corrupt the client hostname in the log files (1.3.1) The reason for "500 Server Error" responses is passed to error documents in the ERROR_NOTES environment variable (1.3.2) Logging can be conditional depending on whethe an environment variable is set or not (1.3.6). Proxy: More accurate error responses can be returned from the proxy (1.3.6) The proxy module now handles invalid responses from IIS (1.3.2) Proxy module now prompts for FTP username and password, if required, to avoid storing that information in URLs and the access_log (1.3.2) The proxy module now accepted reject requests with URL syntax http://host:/path (1.3.4) Performance: More efficient <Directory> and <DirectoryMatch> section matching (1.3.0) More efficient virtual host matching. Address * behaves like _default_ (1.3.0) More efficient use of network: combines smaller network writes (1.3.0) Faster response to load spikes, by first spawning one new child, then the next second two, then four and so on up to 32 children per second, until there are enough idle servers (1.3.0) Efficient unbuffered CGI. As soon as the CGI stops sending output, it will be sent to the client. This replaces the old scheme where output was buffered up to a fixed size, or until the CGI process exited. This also replaces the old "nph-" prefix for getting unbuffered CGI output (which was not compatible with HTTP/1.1 or SSL layers anyway) (1.3.0) Security fixes: Directives to limit size of requests, to avoid denial of service attacks based on sending extra big requests. Eliminate unnecessary processing when handing requests (1.3.2) Avoid denial of service attacks if a configuration file (such as a .htaccess file) is a device file, by refusing to open device files apart from /dev/null which is still valid (1.3.0) Correctly handle over-long lines in configuration files (1.3.0) Fix denial of service attack by sending requests with lots of slashes in them (1.3.0) Deny access to directories if a .htaccess file in that directory cannot be read (1.3.0) Better name-based virtual host support, using new directive NameVirtualHost. This gives the port:IP of interfaces which are used for name-based virtual hosts. Requests on this port can only match <VirtualHost> sections defined on that port:IP combination. Also reverse order of matching of <VirtualHost> sections so earlier sections override later ones(1.3.0) Detach from stdin, stdout and stderr after reading configuration files, so Apache can be started via rsh, etc (1.3.0) Directory indexes now dynamically size the width of the filename column (1.3.2). Columns can be sorted (1.3.0) Do not kill connections in progress when a TERM (shutdown) signal is received (1.3.0) Experimental support for passing symbols required by the Apache core through dynamic modules onto libraries loaded by those modules (Rule SHARED_CHAIN). (1.3.2) Expires headers will now be returned for content which is served from sources other than files, if configured with mod_expires (1.3.2) Header files can be included into C++ code (1.3.0) mod_negotiation has been overhauled to bring it up to the latest HTTP/1.1 revision 6 specification and to support some of the transparent content negotiation drafts (1.3.4) mod_negotiation also works around a bug in Lynx where it sends a header saying it understands transparent content negotiation, but it does not (1.3.6) mod_rewrite now correctly sets the HTTP/1.1 Vary: response header if decisions are made based on request headers (1.3.2) mod_rewrite has much faster DBM and TXT maps through the use of cacheing. (1.3.6) mod_status is now included by default. The new directive ExtendedStatus can be used to turn this module on (1.3.2) New script apachectl to start, stop, restart and check the status of Apache (1.3.0) SIGPIPE is no longer reserved for use by the Apache core while sending a response (1.3.6) Support for DES and MD5 encrypted passwords (1.3.6) Support has been added for the HTTP methods defined in the distributed authoring drafts (WEBDAV) (1.3.4) Support has been added for the new Expect: response header, as introduced in HTTP/1.1 revision 5 (1.3.4) The configuration directives are now all given in httpd.conf, and the default access.conf and srm.conf are empty (1.3.4) The PID file is removed when Apache exits (1.3.2) The meta information module (mod_cern_meta) can be configured on a per-directory basis (1.3.0) The status page now shows the "generation" of each child process (1.3.6) Try to avoid problems with eight-bit characters in URLs and configuration files (1.3.1) Use the supplied regex library on all systems, unless explicitly told otherwise (1.3.0) Various year 2000 compliance changes (these are minor changes, in things like log messages) (1.3.0) Major Modules Changes The following modules have been added to this version of Apache. Of these, only mod_setenvif is compiled in by default. The other modules here are optional, and to use them you need to uncomment the appropriate line in Configuration and re-compile Apache. Dynamic loading of modules (mod_so) The mod_dld module from previous releases has been removed and replace with a much improved replacement, mod_so. This module supports dynamic loading of modules on most Unix systems and on Windows. This module was added in 1.3.0. Conditionally set environment variables ( mod_setenvif) The mod_setenvif module can be used to set environment variables based on headers on the incoming request or other aspects of the request (for example, the client hostname). The replaces the mod_browser module which set environment variables based on the User-Agent request header. This module was added in 1.3.0. Fix typos in URLs (mod_speling) This module can be used to correct simple typing errors is requested URLs, based on looking at real directory and file names. This modules was added in 1.3.0. Generic unique ID for every request ( mod_unique_id) This module generates a unique identifier for every hit. It was added in 1.3.0. Automatically work out MIME type ( mod_mime_magic) This module can be used to return a MIME type based on the contents of the file being served. This is similar to the Unix "file" command. Added to 1.3.0. Directory indexing module ( mod_autoindex) This new module contains the directory indexing functionality previously provided by mod_dir. See the section on mod_dir below. API Example (mod_example) This module provides example code for module developers. mod_dld replaced by mod_so See section above about mod_so. mod_browser replaced by mod_setenvif See section above about mod_setenvif. mod_dir split into two modules (mod_dir and mod_autoindex) The mod_dir module has been split into two modules. Both are included by default in an Apache build. The new mod_autoindex module supports creating directory indexes. The updated mod_dir now just supports the basic functionality of trailing-slash redirects and DirectoryIndex files. This means that if directory indexes are not required, the large mod_autoindex module does not need to be compiled into Apache. (Updated in 1.3.0) mod_auth_msql removed This module is no longer supplied with Apache, because there are a lot of possible databases and it is not possible to include all database modules into the Apache distribution. (Removed in 1.3.0). New and Updated Ports This section contains summaries of changes for more unusual systems or systems not widely used by the main Apache developers. Sometimes these ports are not maintained after their initial inclusion in the Apache source tree. Changes to support the major platforms used by Apache developers (such as FreeBSD, Linux, IRIX and Solaris) are not listed here. Changed the name of the "OS/2" port from "__EMX__" to "OS2" (1.3.2) New port and binaries available for Windows NT (1.3.0) New port to Acorn RISCiX (1.3.0) New port to BeOS (1.3.0) New port to Cyberguard V2 port (1.3.4) New port to DRS 6000 (1.3.3) New port to Encore UMAX V (1.3.0) New port to HP UX 11 (1.3.0) New port to Linux with glibc (e.g. RedHat 5) (1.3.0) New port to NCR MP/RSA 3.0 (1.3.0) New port to PowerMAX OS (1.3.4) New port to Rhapsody (Mac OS X) (1.3.2) New port to SCO SV (1.3.0) New port to SONY NEWS-OS (1.3.0) New port to Sequent (1.3.0) New port to Siemens Nixdorf BS2000-OSD (1.3.0) New port to UnixWare 7 (1.3.1) New port toNEC EWS4800 (1.3.2) Recongnise UnixWare 7.0.1 (1.3.3) Updated support for ARM Linux (1.3.1) Updated support for LynxOS (1.3.0) Updated support for MPE (1.3.0) Updated support for NCR SVR4 (1.3.1) Updated support for NEXTSTEP (1.3.1) Updated support for QNX 32 bit systems (1.3.1) Changes for Apache on Windows Apache 1.3.0 was the first full release of Apache to support Windows systems. Some of the most important changes since the last 1.3 beta release are listed here. Add support for encrypted passwords (encrypted with the MD5 algorithm). Added bin/htpasswd to create and modify MD5 passwords (1.3.6) Errors from running Apache with -i or -u command line arguments are now displayed on the console rather than sent to the error log (1.3.6) Compile time default for the error log filename is error.log rather than error_log (1.3.6) New directive ScriptInterpreterSource which configures Apache to find a CGI file interpreter via the registry rather than via the #! line in the CGI file itself (1.3.6) The Apache executable now contains an icon (1.3.6) The binary installer now creates additional Start menu options for shuttind down a running console application and to uninstall the NT Apache service (1.3.6) Remove limit of 64 threads per process (1.3.2) Remove trailing "."s in path components, which are ignored by windows when accessing files so could be used to bypass security settings (1.3.1) Eliminate directory components consisting of three or more dots (e.g. "...") which can cause security problems (1.3.1) Make IndexIgnore case insensitive because the Windows filesystem is (usually) case insensitive. Set current working directory for CGI scripts (1.3.0) Pass environment variables to CGI scripts (1.3.0) Add ability to gracefully shutdown or restart Apache on Windows 95, without pressing Control-C in the Apache console window (1.3.3) Allow CGI child processes to die properly if the client aborts the connection (1.3.3) Handle paths like D:/ correctly (1.3.3) Handle drive letters sub-requests properly (1.3.3) A running console version of Apache can be restarted or shutdown with the -k command line option (1.3.3) Makefiles have been added to allow Apache to be build on Windows 95 (1.3.4) Various problems with UNC paths have been fixed (1.3.4) Possible security and denial of service attacks by use of special DOS devices have been removed (1.3.4) Directive Changes This section lists the directives which are new in this release, or which have changed their behaviour or syntax. Note that directives provided by the new modules are not listed (see the documentation for the module concerned for its directives). When upgrading from an earlier version of Apache, check this list to see if any of the directives in your configuration have changed. <DirectoryMatch>, <LocationMatch> and <FilesMatch> can be used to match sections using regular expressions. These are equivalent to the <Directory ~ ...> syntax (1.3.0) <IfDefine name >...</IfDefine> sections which are only used if Apache is started with a corresponding -Dname command line options (1.3.1) <LimitExcept method method ...> is the inverse of <Limit>. This contents of LimitExcept only apply if the request method is not listed as an argument. (1.3.6) AddModuleInfo provides additional text in mod_info output (1.3.0) AliasMatch, ScriptAliasMatch and RedirectMatch provide the ability to use regular expressions (1.3.0) AllowCONNECT to allow CONNECT requests on arbitrary ports (for proxying HTTPS requests) (1.3.2) CoreDumpDirectory gives the directory to use to dump core files, after receiving signals which cause core dumps (1.3.0) DefaultLanguage sets a default language to files without a language specified by an extension (1.3.4) ExcessRequestsPerChild Used on Windows systems only ExpiresActive to turn the expires module on or off (1.3.0) ExtendedStatus to turn on or off collected status information for display by mod_status. Off by default. Replaces the previous compile-time rule "Rule=STATUS" (1.3.2) Include specifies arbitrary configuration files to be read when this directory is processed (1.3.0) IndexDefaultOrder sets a default sorting order for fancy directory indexes (1.3.4) LimitRequestBody limits the size of the request message body (1.3.2) LimitRequestFields sets a maximum number of requests headers that Apache will accept (1.3.2) LimitRequestFieldsize sets a maximum size of any single request header (1.3.2) LimitRequestLine set a maximum request-line length that Apache will accept (1.3.2) ListenBacklog can set the size of the TCP backlog (the argument to listen()) (1.3.0) LogLevel sets the detail that will be logged to the error_log file. Possible values are "emerg", "alert", "crit", "error", "warn", "notice", "info" and "default". The default is error. (1.3.0) NameVirtualHost added to support better configuration of name-based virtual hosts (1.3.0) NoProxy in mod_proxy prevents proxying certain addresses (1.3.0) ProxyDomain in mod_proxy adds a domain to unqualified requests (1.3.0) ProxyPassReverse in mod_proxy lets Apache work as a "revere proxy", i.e. a front-end to multiple servers (1.3.0) ReceiveBufferSize in mod_proxy to control size of the receive buffer (like SendBufferSize) (1.3.0) RemoveHandler in mod_mime removes a mapping between a file extension and a handler name (1.3.4) ScriptInterpreterSource (valid on Windows only) can be used to tell Apache to file CGI interpreters via the registry. If set to "script" it uses the initial #! line from the CGI file, like previous versions. If set to "registry" it uses the registry to map the file extension to the interpreter. The default is "script". (1.3.6) ServerSignature can be used to turn on a "signature" in various automatically generated responses such as error messages. The possible values are "off" which is the default, "on" which uses a signature of the server version and hostname, and "email" which adds the mail address from the ServerAdmin directive (1.3.0) ServerTokens allows the Server: response header to be configured. Possible values are "min" which returns just the Apache version number, "OS" which also returns the operating system type, and "full" which returns the identifiers from any modules which request to be added. The default is "full". (1.3.0) ThreadsPerChild Used on Windows systems only UseCanonicalName is used to determine how Apache creates URLs pointing back to itself. The default value is "on" which means that Apache will use values from the configuration (i.e. ServerName and Port settings). If set to "on", Apache will use the information supplied by the client. (1.3.0). The use of this directive is now controlled by the Options override, rather than AuthConfig (1.3.4) <Directory> and <Location> sections defined in a virtual host override corresponding sections defined in the main server, rather than the other way around (1.3.0) <Directory> wildcards (* and ?) now do not match the forward slash character, to be compatible with shell expansions (1.3.0) <Directory>, <Files> and <Location> can now use [...] style wildcards (1.3.0) <Limit> now matches request methods on a case-insensitive basis, as required by the HTTP/1.1 specification (1.3.1) AccessFileName can take more than one filename argument (1.3.0) AuthName argument must be enclosed in double-quotes if it contains whitespace (1.3.0) CheckSpelling is now valid in per-directory locations (.htaccess files and <Directory> sections) (1.3.2) CustomLog can now take an additional argument env=[!]env-var which makes the logging conditional on the named environment variable being set (or, if ! is used before the env-var, unset) (1.3.6) CustomLog formats can contain or to represent a tab or newline character in the log file (1.3.6) FancyIndexing now no longer unsets any options already set by IndexOptions (from 1.3.2) HostnameLookups defaults to off (1.3.0) HostnameLookups has a new possible argument, double, which ensures that Apache only uses a remote hostname if it passes a double-reverse lookup. This replaces the MAXIMUM_DNS compile time option (1.3.0) IndexOptions has new arguments: NameWidth specifies the width of the filename column in directory indexes (1.3.2). SuppressColumnSorting turns off the links for sorting the output (1.3.0). SuppressHTMLPreamble prevents Apache outputting the start of the HTML response (1.3.0). IconHeight and IconWidth set the size of the icons (1.3.0). Options can now be added or removed with leading + or - (like Options) (1.3.3) LocationMatch no longer matches a single slash against multiple slashes in the request URL (1.3.0) RefererIgnore is now case-insensitive (1.3.0) RewriteMap now has two additional map types: "rnd" for randomreplacements, and "int" to use an internal function to make a replacement (1.3.0) SetenvIf and SetenvIfNoCase can now match an empty field with ^$ (1.3.1) TransferLog: if no log file is defined, Apache will not log requests. Previous versions would always log to the default filename (access_log) (1.3.0) Userdir can disable specific users, or can selective enable particular users (1.3.0) allow and deny can accept network/netmask and cidr formats. If hostnames are used a double-reverse lookup is always used (1.3.0) allow can be used to allow access based on environment varibales, with allow from env=variable. This is useful with the new mod_setenvif directives. The old allow user-agents syntax is no longer valid. (1.3.0) require can now accept TAB characters between arguments (1.3.3) Configuration and Support Program Changes The conf directory contains examples of the four configuration files needed: httpd.conf, srm.conf, access.conf and mime.types. Each of these files has been updated slightly. In 1.3.4 all these files have been merged into the single conf/httpd.conf file. httpd.conf HostnameLookups is set to "off" to reflect the new default. LogLevel set to warn. LogFormatCustomLog is used instead of TransferLog. ServerSignature is set to "on". srm.conf A <Files .htaccess> section prevents access to .htaccess files. access.conf Apache now defaults to a much more restrictive set of permissions, by specifying AllowOverride none and Options FollowSymLinks in a <Directory /> section. This means that .htaccess files will not be processed unless turned on by another <Directory> section, and all options (except following symbolic links) are turned off. This is a much more secure initial configuration. mime.types New types for javascript, mpeg 3, VRML, CSS and XML documents. All currently known MIME types (as registered with the IANA) have been added (1.3.4) New in the support directory are a web benchmark program (ab.c), a script to control the starting and stopping of the Apache server (apachectl), a perl script to compile modules for dynamic loading without using the source tree (apxs.pl), a perl script to resolve IP addresses in log files (logresolve.pl), a script to split logfiles based on virtual hosts (split-logfile), and manual pages for all these programs (1.3.0). The benchmark program has been overhauled and can now output HTML pages (1.3.6). apxs can now pass arbitrary arguments on to the compiler or linker, with -Wc and -Wl respectively (1.3.4). The httpd_monitor program has been removed since status information about Apache can be obtained via mod_status's output. (1.3.0). The manual pages for ab and apachectl have been moved to section 8. (1.3.6). The new option --permute-module allows the relative order of modules to be specified (1.3.4) The default directory layout for make install is now the same as the layout that src/Configure uses. The new --with-layout option can be used to specify a different layout, for example --with-layout=GNU would use the previous default layout for ./configure (1.3.4) The new option --target=name can be used to give the binary a different name than the default "httpd" (1.3.4) The --shadow option has been extended to take an argument which is the name of the shadow directory to create (1.3.4) Upgrade Notes Because of the various changes between 1.3.3 and 1.3.4, when upgrading you should beware of the following things: If you use ./configure to configure and compile Apache, be careful to ensure that you get the directory layout you want. If you previously used --compat, you can omit it. If you previously did not use --compat you must give --with-layout=GNU If you have can scripts which run Apache and use any of the arguments -?, -h, -l or -L, then they must be updated to use the new arguments (-h, -l, -L and -R, respectively) If you use the -S command line option to show the virtual host configuration and start the server running, you will have to do this is in two steps since -S will now exit without starting the server If you use UseCanonicalName inside .htaccess files, you must ensure that the Options override is in force rather than the AuthConfig override. If you used multiviews for content negotiation and relied on the fact that Apache read the variants from the disk in the directory order (rather than, say, alphabetically) you should check that the negotiation still works as expected (Apache now sorts the variants into order before using them, so that negotiation is not dependent on the usually arbitrary directory order of the files). This should not normally be a problem. The first three items are described in more detail below. If you configure Apache with ./configure you will have to change the options you use to set the directory layout. If you do not currently use an option to set the directory layout you will have to use an option in 1.3.4 because the default layout has changed. There are two layouts for directories: the first is the "Apache" layout. This was used in all versions of Apache before 1.3, and in Apache 1.3 it is still used if you use src/Configure to configure and build Apache. The second layout was introduced by ./configure, and is called the "GNU" layout because it is similar to the standard layout used by GNU tools. This created two layouts within Apache 1.3.*: the Apache layout if src/Configure was used, and the GNU layout if ./configure was used (although ./configure could also be told to use the Apache layout with the --compat option). Unfortunately this created a lot of confusion, and in particular many people thought that the GNU layout was the preferred directory layout for 1.3, because it was the default in ./configure. It is not: the preferred layout is the "Apache" layout, consistent with src/Configure and Apache 1.2. In Apache 1.3.4, the Apache layout becomes the default layout for ./configure. If you have been using the --compat option, then you do not need it anymore. However if you did not use the --compat option (that is, you used the GNU directory layout) then you must now use --with-layout=GNU. This table summarises the meaning of the directory layout arguments in each version: Layout option Meaning in 1.3.3 Meaning in 1.3.4 None GNU layout Apache layout --compat Apache layout Apache layout (but not needed since this is the default) --with-layout=GNU Not valid GNU layout --with-layout=Apache Not valid Apache layout (but not needed since this is the default) Various command line arguments have changed in meaning. This affects the -h, -l and -L options. This table shows the meanings of these arguments in both versions of Apache. Option Meaning in 1.3.3 Meaning in 1.3.4 -? List command line options List command line options (but use -h instead) -h List modules List command line options -l List all directives List modules -L Specify location of the core loadable module if built with SHARED_CORE List all directives -R Not used Specify location of the core loadable module if built with SHARED_CORE So if you were using -?, change to using -h. Similarly, change from -h to -l, from -l to -L and from -L to -R. Also, the -S option now exits after showing the virtual host configuration, rather than continuing and starting the server. When upgrading from a 1.2 server to 1.3, the following changes will also be required: Virtual hosts are matched by looking from the first one downward in the configuration file, rather than from the last one. So you should consider reversing the order of your virtual host sections. Use the new -S option to check your virtual hosts configuration. If you use name-based virtual hosts read carefully the Apache documentation about them. This has changed considerably. If you server both name-based and IP-based hosts from the same IP:port combination you will need to change your configuration. In all cases you will need to add NameVirtualHost directives for each IP:port on which name-based requests can be received. Again, use the -S option to check your virtual hosts configuration. Check your AuthName directives (remember to check in .htaccess files as well) for multi-word arguments. If you have any, put quotes around the argument. Known Bugs These bugs in 1.3.3 have been fixed in 1.3.4: Windows-specific Bugs In some circumstances the configuration files in the conf directory are not installed. This can occur if the computer needs to be rebooted because a system DLL file was updated. For now a work-around is to re-install Apache again after the reboot, since the DLL will not need to be installed again. . Requests for filenames containing non-ASCII characters such as accented characters gives a "Forbidden" error. . If the ErrorLog directive is removed from the httpd.conf file, Apache will use the built-in default filename for the error log file. This should match the name given on the ErrorLog directive in the distributed httpd.conf file, which was error.log. However it would actually revert to the "Unix" name of error_log. From the next release it will default to error.log. Other Bugs The default method of locking between processes on Linux has been changed from flock and fcntl, because of possible instability with flock in some kernel versions. . In Apache 1.3.4, lines in the error log were being preceeded by "httpd: ". This will be removed in the next version to avoid breaking any automatic error log analysis programs. If a CGI returns a Set-Cookie header it was sometimes being duplicated in the response to the client. . If the mod_info module was compiled as a DSO and the relevant lines uncommented in iin the distributed httpd.conf file, Apache would not start because the mod_info directive appeared before the line which loaded mod_info into the server. . Fix potential buffer overrun problem. . Added support for the standard file layout on Mac OS X (Rhapsody). apachectl gives an error if the PID file does not exist. The macro escape_uri was renamed to ap_escape_uri but no backward compatibility was provided from the old name. . Using the mod_speling module where there were lots of possible matching files caused Apache to use more memory than a linear relationship to the amount of data being handled. It is recommended to use a single configuration file (typically conf/httpd.conf) but mod_info will log a warning message if it cannot read conf/access.conf or conf/srm.conf. . With some browsers, Apache may not send a full response even though the file was updated on disk. This affects browsers which use HTTP/1.1 "etags" to ask servers for later versions of a file. Browsers known to do this are MSIE 4.1 and 5.0beta (older browsers used the modification time of the file). The problem is that Apache did not correctly compare the "etag" in the request with the "etag" of the file on disk (which will be different if the file has been updated). . When using ./configure with the --with-layout=GNU the directory layout may be different from the default layout in Apache 1.3.3. This only occurs if the "prefix" includes a directory component named "apache", and results in directories containing unnecessary "httpd" components. This was an effect of a new feature in Apache 1.3.4 which allowed for the executable name of Apache to be changed from "apache". . Compiler options starting with + cannot be used in EXTRA_CFLAGS in src/configuration. Most compilers use - for compiler options, but HP-UX's C compiler also uses +. . The INSTALL file shows examples of commands to start and stop the server using apachectl. However it assumes that this script is in the sbin directory, but the default is now bin. . HTTP/1.1 HTTP/1.1 is a major revision of the HTTP standard, which defines how browsers, servers and proxies communicate. The Hypertext Transfer Protocol From version 1.2, Apache was be fully compliant with the new HTTP/1.1 specification. This is the protocol which tells browsers and servers how to communicate, and the features added here determine how Web pages can be accessed. We take a look at what HTTP/1.1 includes and what changes it will bring to browsers and servers. Part of Apache Week issue 28 (16th August 1996). Hypertext Transfer Protocol (HTTP) defines how Web pages are requested and transmitted across the Internet. Almost all servers and browsers currently use version 1.0 of this protocol, but a major update, version 1.1, has been released. HTTP/1.1 adds a lot of new features to HTTP, which in turn will lead to new capabilities in both servers and browsers. We look at what is new in 1.1 and how it is likely to affect the Web. HTTP was initially a very simple protocol used to request pages from a server. The browser would connect to the server and send a command like: GET /welcome.html and the server would respond with the contents of the requested file. There were no request headers, no methods other than GET, and the response had to be a HTML document. This protocol was first documented as HTTP/0.9. All current servers are capable of understanding and handling HTTP/0.9 requests, but the protocol is so basic it is not very useful today. Browsers and servers extended the HTTP protocol from 0.9 with new features such as request headers and additional request methods. The resulting HTTP/1.0 protocol was only officially documented in early 1996 with the release of RFC1945. Servers and browsers having been using HTTP/1.0 for several years. Even while 1.0 was being documented, the next version was in serious development. This time the specification was developed first. This new version, 1.1, is now available as RFC2068. HTTP/1.1 will include a lot of new features, and will also document for the first time some features already found in servers or browsers. Knowing how HTTP works is very useful for a server administrator. It lets you check out the operation of your server without having to fire up a browser, and gives you a very useful diagnostic tool to check in detail how the server responds to individual requests. You can use telnet to emulate how a browser requests documents from a server. With telnet you can connect to the server, issue a request, and see what the server responds with. For example, to get the home page from www.apacheweek.com, you would use: % telnet www.apacheweek.com 80 Connected to www.apacheweek.com. GET / HTTP/1.0 [RETURN] [RETURN] This assumes you are connecting from a Unix system, starting at the command prompt (%) and with a telnet command available. You could also use any other telnet program such as the one in Windows 95. The text in bold is what you type. The standard port for Web requests is port 80, so we connect to that port number. Once connected we can type in and send a HTTP request, followed by the request headers. In this case, the request is GET / HTTP/1.0. The / is the resource we want to obtain, and the HTTP/1.0 tells the server that this is a HTTP/1.0 request. After entering this line, press RETURN twice - the first ends the request line, and the second marks the end of the optional request headers (in this case, we did not enter any request headers). The server will respond by sending a number of response headers, followed by the text of the requested document. It is often more convenient to send a 'HEAD' request instead of 'GET'. This makes the server behave exactly as if it was handling a GET, but it doesn't bother to send the actual document. This makes it much easier to see the response headers, and means you do not have to wait to download the document itself. For example, to see what response headers that www.apacheweek.com sends for /, use: HEAD / HTTP/1.0 HTTP/1.0 200 OK Date: Fri, 16 Aug 1996 11:48:52 GMT Server: Apache/1.1.1 UKWeb/1.0 Content-type: text/html Content-length: 3406 Last-modified: Fri, 09 Aug 1996 14:21:40 GMT Connection closed by foreign host. The first response line is the status - in this case '200' means the request is okay. The rest are response headers, which give information either about the server or the resource. For example, Server: gives the server version, and Last-Modified: is the last modification date of the file. New in HTTP/1.1 The basic operation of HTTP/1.1 remains the same as for HTTP/1.0, and the protocol ensures that browsers and servers of different versions can all interoperate correctly. If the browser understands version 1.1, it uses HTTP/1.1 on the request line instead of HTTP/1.0. When the server sees this is knows it can make use of new 1.1 features (if a 1.1 server sees a lower version, it must adjust its response to use that protocol instead). HTTP/1.1 contains a lot of new facilities, the main ones are: hostname identification, content negotiation, persistent connections, chunked transfers, byte ranges and support for proxies and caches. Every request sent using HTTP/1.1 must identify the hostname of the request. For example, if the URL http://www.apache.org/ is used, the request must include the fact that the hostname part is 'www.apache.org'. In previous versions of HTTP, the server never knew the hostname used in the URL. Letting the server see the hostname allows the implementation of non-IP virtual hosts. For example, if two names, www.apache.org and www.someoneelse.com, point to the same IP address, a HTTP/1.1 server can use the hostname it receives to return different content for each request. HTTP/1.0 servers cannot differentiate between these two requests. The hostname must be passed to the server either as a full URI on the request line, or on the new Host: header. For example, to test how www.apache.org responds to a HTTP/1.1 request, you could send GET / HTTP/1.1 Host: www.apache.org Note that the HTTP version on the GET request is now 'HTTP/1.1'. If the URI does not include the hostname on the Host: header the server will respond with an error. Content Negotiation refers to the ability to have a number of different versions of a single resource. For example, a document might be available in English and French, with each of these available as either HTML or PDF. The possible responses are called representations or variants. There are actually two sorts of content negotiation: Server-driven Negotiation Here the server decides (or guesses) on the best representation to send to the browser, based on information the browser provides in the request Agent-driven Negotiation Here the server does not guess on the best representation, but instead returns of list of the representations it has. The browser can then either automatically request one of these, or present a choice to the use. The first type, server negotiation, has been implemented in Apache since the summer of 1995 and is explained in a special feature from Apache Week issue 25. However, the HTTP/1.1 specification is the first place it is officially documented. The second type, agent negotiation, is not fully documented. The HTTP/1.1 specification just contains basic definitions of some of the headers to be used, but no details. The details of content negotiation are being specified in an Internet draft. This draft also expands on how server-driver negotiation works, and defines how caches can perform negotiation on behalf of either the server or the user agent. Many pages today include inlined documents, usually images but increasingly also sounds and other types such as Shockwave presentations. These pages can be slow to download because each item needs to be requested separately from the server, each on a separate connection. Typically, for each inline document the browser needs to connect to the server, ask for the document, wait for it to be received, and disconnect from the server. (Although some browsers can do multiple requests in parallel). This can be slow, especially across the Internet when there is a delay involved in each connection and disconnection. To help make pages with inline documents quicker to download, HTTP/1.1 defines persistent connections where a number of documents can be requested over a single connection, one at a time. An early implementation of persistent connections was known as keep-alive, and Apache as well as a number of other servers and browsers support this sort of connection. However, persistent connections are first officially documented in HTTP/1.1, and will be implemented slightly differently from keep-alives. For a start, in HTTP/1.1, persistent connections are the default. Unless the browser explicitly tells the server not to use persistent connections, the server should assume that it might be getting multiple requests on a single connection. Persistent connections are controlled by the Connection header. Unless a Connection: close header is given, the connection will remain open. This can be tested by connecting to www.apache.org and sending a simple request, for example: % telnet www.apache.org 80 HEAD / HTTP/1.1 Host: www.apache.org HTTP/1.1 200 OK Server: Apache/1.3.0 ... where the connection will remain open for a short period before closing (this is a server-configurable time out). If the same request is sent with a Connection: close header the connection will close immediately after the request headers have been sent. Normally, when sending back a response the sever has to know everything about the response it is about to send before it sends it. For instance, servers should set the Content-Length header on each response to the length of the response itself. This can be difficult for the server to do if the content is dynamically created (e.g. if it is the output of a CGI script). So in practice servers (including Apache) often do not send a Content-Length with dynamic documents. This has not been a problem with HTTP/1.0, but for persistent connections to work in HTTP/1.1, the Content-Length must be known in advance. The server could find out the length of the output of a CGI script by reading it into memory until the script has finished, then setting the Content-Length and returning the stored content. This might be acceptable for small content, but could be a problem if the CGI produces a lot of output. One possible way around this is to use the new chunked encoding method. This lets the server send output a bit at a time. Each bit (or chunk) is small enough for its content-length to be known before it is sent. Using chunked encoding will let servers send out dynamic content that is either large or produced slowly without having to disable persistent connections. In addition, after a chunked-encoded document has been completely sent, additional response headers can be transmitted. This could allow dynamically produced headers to be associated with the document, even if they are not available until after the script (or whatever produced the document) has finished. Byte ranges allow browsers to request parts of documents. This can be used to continue an interrupted transfer, or to obtain just part of a long document (say, a single page). Byte ranges are implemented by the Range header. For example, to request just the second 500-bytes of a document, the request would include: Range: bytes=500-999 A single request can also ask for more than one range at once (for example, it could ask for the first 500 bytes and the last 500 bytes of a file). When the server replies, it will send back each part in a single response, using MIME multipart encoding to distinguish the parts. HTTP/1.1 includes a lot of information and new features for people implementing proxies and caches. Until now, the operation of proxies and caches has been largely undocumented. In addition to documenting how they are supposed to work, HTTP/1.1 also includes a range of new features to make implementing proxies and caches easier, and in particular to reduce network traffic by allowing proxies and caches to send more 'conditional' requests and to do transparent content negotiation. A conditional request is like a normal request, except the sender (the proxy or cache server) includes some information about whether it really needs the document. For example, a proxy or cache can send an entity-tag which identifies a document it already has, and the server only sends back the document if the cache does not already have this document. Conditional requests can also be based on the last-modified time of the document. There are a lot of other changes between 1.0 and 1.1, including More status response codes New request methods: OPTIONS, TRACE, DELETE, PUT Digest authentication Various new headers such as Retry-After: and Max-Forwards: Definition of the media types message/http and multipart/byteranges How this will Affect Servers and Browsers Users of the Web will notice the following major changes when browsers and servers are available which implement HTTP/1.1: Non-IP virtual Hosts Virtual hosts can be used without needing additional IP addresses. Content Negotiation means more content types and better selection Using content negotiation means that resources can be stored in various formats, and the browser automatically gets the 'best' one (e.g. the correct language). If a best match cannot be determined, the browser or server can offer a list of choices to the user. Faster Response Persistent connections will mean that accessing pages with inline or embedded documents should be quicker. Better handling of interrupted downloads The ability to request byte ranges will let browsers continue interrupted downloads. Better Behaviour and Performance from Caches Caches will be able to use persistent connections to increase performance both when talking to browsers and servers. Use of conditionals and content negotiation will mean caches can identify responses quicker. Using Apache Imagemaps Imagemaps are an easy way to provide an graphical front-end. We explain how to use Apache's imagemap module and the Apache extensions to the NCSA map file format. Using Apache Imagemaps Imagemaps can provide a graphical interface to a web site. If the mouse is clicked over an imagemap image the co-ordinates of that click are sent to the server. The server can decide what page to return based on the location of the click. Traditionally, imagemaps have been implemented at the server end with a CGI program (usually called 'imagemap'). This is configured with a map file which listed what regions on the image correspond to what documents to return. Apache can use CGI imagemaps, but it is more efficient to use the internal imagemap module. This module, compiled in by default, means that the server does not need to run a separate process to handle the image clicks. It is fully upwardly compatible, and also adds some new features. Both of these approaches implement what are called server-side imagemaps because all the processing happens on the server. The main problem with server-side imagemaps is that the user does not get any indication of which areas of the image contain links. An extension to HTML allows client-side imagemaps which tell the browser what areas on the image correspond to what documents. The browser can then highlight or show the active areas as desired. It is possible to use both client-side and server-side imagemaps at once, so that the maximum number of browsers are supported. Older versions of Apache came with an imagemap program in the cgi-src directory. This could be compiled and placed into a CGI directory (typically cgi-bin). The internal imagemap module is faster than using the CGI program and it has replaced all of the functionality. If you are using the imagemap program, you can easily move over to using the imagemap module. First, ensure that an appropriate AddHandler line is enabled in your srm.conf file (see the following section). Then all you need to do is update the HTML documents that refer to the imagemap program. You will probably be using something similar to this: <A HREF="/cgi-bin/imagemap/maps/mapfile"> <IMG SRC="image.gif" ISMAP></a> You need to first of all rename your mapfile to have a suitable extension (as given on the AddHandler imap-file line, for example, .map) if is does not already have this extension. Then change the HTML like this: <A HREF="/maps/mapfile.map"> <IMG SRC="image.gif" ISMAP></a> Note that the HREF is now simpler because the /cgi-bin/imagemap part is not given. The imagemap module is a core part of Apache, and is compiled in by default. To use it, you first need to configure the Apache server. You should pick a file extension to use for imagemap configuration files, typically .map. The AddHandler command below should be added to your srm.conf file: AddHandler imap-file map You will need to restart the server after making this change, by sending it a -HUP signal. Now, any request for a file ending in .map will be treated as an imagemap request. To actually create an imagemap you need to do two things: Create a 'map' file which maps areas of the image onto documents Add the code to an HTML page to tell the browser which image to use and what mapfile. The map file is a text file containing the information needed for the server to map points on the image onto documents to return (or URL's to redirect to). It can also contain statements to control the behaviour of the imagemap. The imagemap module uses map files in standard NCSA format, with optional extensions. Areas and positions on the image can be mapped onto documents or URLs with the following commands. All co-ordinates start at the top-left of the image, position (0,0). These statements can be modified to make use of Apache imagemap extensions (such as to give a 'menu text'). This will be covered later. rect url x1,y1 x2,y2 The rectangle (x1,y1) to (x2,y2). poly url x1,y1 x2,y2 .... The polygon formed by the points given. circle url x1,y1 x2,y2 The circle with its center at (x1,y1) and point (x2,y2) on the circumference. point url x1,y1 The closest point to the clicked position, if the click is not inside any circle, poly or rect. The url part of each of these statements is the document to return if the point clicked was inside the respective area (or in the case of 'point', the closest). It can be either a absolute URL (starting http://, or a URL relative to the document root (starting /), or a relative URL (not starting with a /, and possibly including ../ components to go to parent directories). If the URL is relative, it is taken relative to the directory containing the imagemap configuration file, not the original HTML document (if different). However this can be changed by the base statement, see below. There are various ways to create the co-ordinates for the map file. One is to do it by hand, using positions obtained by (say) an image editing program. Alternatively there are various programs available which will let you mark the shapes on an image and then write out the correct statements, such as those listed in Yahoo's Imagemaps category. The statements which can be used to control the behaviour of the imagemap are: base [ url | map | referer ] Use url as the base for any relative URLs within the map file. Alternatively, the word map can be used, which makes URLs relative to the directory containing the map file (this is the default). Alternatively, relative URLs can be made relative to the HTML document which included the imagemap image, with referer. This only works with browsers which support the Referer request header (most modern browsers support this). default [ url | error | nocontent | referer | menu ] This tells the server what to do if the point clicked was not inside any rect, poly or circle, and there were no point statements. It can either by a URL, or one of these values: error: return a 500 Server Error status; nocontent: return a 204 No Content status, which will cause most browsers to keep the current document; referer: return the document given by the Referer request header, which will be the HTML document which contained the imagemap; menu: return a text (HTML) version of the URLs in the map file. The default is nocontent. The final part of creating an imagemap is to add suitable HTML code to an HTML document. Images are placed using the code <IMG SRC="...">. To place an imagemap, surround this tag with a <A HREF...> tag which refers to the map file, and include the attribute ISMAP in the <IMG SRC...>. For example: <A HREF="/docs/home.map"><IMG SRC="/graphics/image.gif" ISMAP></A> where docs/home.map is the URL of the map file, relative to the server's document root. The ISMAP attribute in the <IMG SRC...> tag tells the browser that this is an imagemap. When the image is clicked, it sends a request for the given HREF URL, followed by the position of the image click, such as: GET /docs/home.map?20,35 if the image was clicked at position (20,35). One of the big problems with imagemaps in past has been that they do not work with text-only browsers. The imagemap module is written to provide support for text-only browsers, which usually ignore the ISMAP attribute. The imagemap module recognises this and will return a text (HTML) document containing a menu of the possible selections from the map file. In addition, a menu document can be returned if the user of a graphical browser selects a point outside any of the defined areas, if the statement "default menu" is given in the map file. The type of menu returned can be configured with the ImapMenu directive. This can be placed in a <Directory> or <Location> section, or in a .htaccess file. It takes a single argument which gives the type of menu to return: none Do not show a menu formatted Output a formatted document, with a suitable heading and with the map lines shown as <pre> text. semiformatted Format the map lines as <pre> text, and also show comment text on other lines (comments start with a hash character, #), but do not output a header. unformatted Do not format map lines as <pre> text, and output text from comment lines, but do not output a header. The semiformatted and unformatted options let you add additional text and mark-up to the map document. The difference between these two is the with semiformatted, the map links are output as <pre> sections, which forces them onto separate lines. The unformatted option does not impose any restrictions, so it is possible to build up a map document with multiple links on a line, for instance. The links in the menu document correspond to the URLs for each of the areas defined in the map file. The text of the link will be the URL itself. However this can be replaced with more meaningful text by giving this text as a argument before or after the co-ordinates. For example: rect /welcome.html 1,1 20,20 "Welcome to this site" The imagemap module supports three directives: the first configures the type of menu to return (if any). This is the ImapMenu directive already covered. The other two directives provide alternate ways of setting the base and default actions (see the base and default map configuration statements, above). The corresponding directives are ImapBase and ImapDefault, and they take the same arguments. The directives can be given in <Directory> and <Location> sections, and in .htaccess files. Say you have an image which contains two areas you want to make active (see the example image, right): a circle, which should lead onto a contents page (contents.html) and a square which gives information about your company (about.html). The basic map file to do this would be: circle contents.html 25,25 0,25 rect about.html 50,0 100,50 This would be included in a HTML document like this: <A HREF="/maps/home.map"><IMG SRC="/img/logo.gif" ISMAP></A> If the user clicks inside the circle or square area, they will get the associated document, relative to the mapfile location. The requested files would be: /maps/contents.html and /maps/about.html. This probably is not what is wanted. The URLs in the map file could be given as relative to the document root, for example: circle /contents.html 25,25 0,25 rect /about.html 50,0 100,50 Alternatively, the base statement could be used to set the base URL, as in: base / circle contents.html 25,25 0,25 rect about.html 50,0 100,50 Rather than putting the URL in the map file like this, it might be better to make all the URLs relative to the location of the HTML document containing the imagemap, with base referer If the user clicks an area outside the circle and the square the will, by default, get a HTML menu of the URLs in the map file. Users of non-graphics browsers will also get this menu. To make it more readable, add some descriptions: base referer circle contents.html 25,25 0,25 "Contents" rect about.html 50,0 100,50 "About our company" which will produce the following map document: Contents About our company (In this example the links do not go anywhere). The map document produced will just contain these two links. To make it more elaborate, you can either include your own mark-up text (on comment lines), or set the ImapMenu directive to the value formatted. To include your own mark-up, put it on lines which start with a # character: base referer # <h1>Menu Bar</h1> circle contents.html 25,25 0,25 "Contents" rect about.html 50,0 100,50 "About our company" # Select one of the options above which produces: Menu Bar Contents About our company Select one of the options above This works because the default value for the ImapMenu option is semiformatted, which outputs comment text (after the # symbol) as part of the map document. For more elaborate formatted, you could include ImapMenu unformatted in your access.conf or .htaccess file, and use, say: base referer # <h1>Menu Bar</h1> # Select an option: circle contents.html 25,25 0,25 "Contents" # or rect about.html 50,0 100,50 "About our company" which produces: Menu Bar Select an option: Contents or About our company Client-side imagemaps move the processing of the co-ordinate information to the browser. The HTML includes the information about the areas on the image and the documents they lead onto. This means the browsers can give positive feedback when the mouse is over an active area. This obviously only works in browsers which support it, but it is possible to use a single image as both a server-side and client-side imagemap. Here is an example image setup for both server- and client-side imagemap: <A HREF="/docs/home.map"><IMG SRC="/graphics/image.gif" ISMAP USEMAP="#thismap"></A> <MAP NAME="thismap"> <AREA SHAPE=CIRCLE COORDS="25,25,25" HREF="contents.html"> <AREA SHAPE=RECT COORDS="50,0,100,50" HREF="about.html"> </MAP> Note that the circle here uses the centre point and a radius, rather than a point on the circumference. An example USEMAP imagemap is shown to the right. The format for client-side imagemaps is defined in RFC1980. Gathering Visitor Information: Customising Your Logfiles Apache 1.2 makes it easy to create multiple customised log files so you can record details of who is browser your site. Gathering Visitor Information: Customising Your Logfiles Every time a browser hits your site it leaves a trail in your access log. This file is enough to tell you how many hits you received and gives you some basic information about the browser, such as their hostname. But there is a lot more information readily available that you could be gathering. Want to know which browser is most common on your site, or what languages your readers can understand? In Apache 1.2 logging information like this is easy. First published in Apache Week issue 51 (7th February 1997). Apache uses the TransferLog command set create a single log file for storing details of every request. However Apache's logging capabilities are far more advanced: it can write the log file in any format, it can write multiple log files (each with a different format), and it can send log messages to an external process via a "pipe". This feature will explain first how to customise the format of your existing log file, then show how to create multiple log files. Finally it will cover how logging works when you have virtual hosts, where you can chose whether to log a virtual host into the main log files or have separate log files for each host. The traditional format for web log files looks like this: jupiter.eu.c2.net - - [03/Feb/1997:00:06:59 +0000] "GET / HTTP/1.0" 200 4571 jupiter.eu.c2.net - - [03/Feb/1997:00:07:00 +0000] "GET /img/awlogo.gif HTTP/1.0" 200 12706 (There are two lines here, both starting with "jupiter.eu.c2.net". If you see more than two, the lines have been wrapped on the screen). This format is called the common log format and is standard across most web servers (although it is not very well documented). There are various tools to analyse data in this format, and it is not too difficult to write custom tools (in, say, perl) to extract the data. But the lack of a common field delimiter makes such tools more complex than necessary and prevents the use of simple Unix programs such as cut. You can customise this format. There are probably two common reasons for doing this: firstly, to make the format simpler by using a common delimiter character, and secondly to log addition information such as the browser type at the end of each line (placing it at the end means the file can still be analysed by standard log analysis programs). You customise the format by telling Apache a format to use. Special character sequences are used to represent specific information. For example, the sequence %h will be replaced with the name of the remote host. The common log format is defined like this: %h %l %u %t "%r" %>s %b Additional sequences here are %l (the remote username, if using identd), %u (the HTTP authenticated username, if any), %t (the time in common-log format), %r (the request), %s (the returned status) and %b (the number of bytes in the document served). Say, for example, you would prefer a file format with a common delimiter character between each field, so that you could use cut or write very simple perl scripts to extract the data. Using the common log format above as a guide, you could use %h|%l|%u|%t|%r|%>s|%b Here the | character is being used as a delimiter. Note that this can cause problems if this occurs within a field (which is possible in the %r request field). To set this format for your log file, you use the LogFormat directive. For example LogFormat "%h|%l|%u|%t|%r|%>s|%b" The % sequences introduced so far let you log various aspects of the request. There are some more sequences (covered below) that log additional aspects of the request. However one of the most important features of the custom log format is being able to log any of the request headers supplied by the browser. This lets you log things like the users language preferences, browsers type and the page they just came from. Logging a request header is doing using the %{}i sequence. You put the name of the request header between the braces. For example, to log the browser type, you would use %{user-agent}i This information is typically added to the end of the common log format in Apache 1.1.1 (in Apache 1.2, you can put it in a separate log file, which is much more convenient. This is explained later). To add the user-agent information to the end of the common log format, use LogFormat "%h %l %u %t \"%r\" %>s %b %{user-agent}i" If the browser does not send a user-agent, the text "-" will be logged as the user-agent. Otherwise you will get the browser name, such as "Mozilla/3.0Gold (Win95; I)" or "Mozilla/2.0 (compatible; MSIE 3.01; Windows 95)" (the former is Netscape Gold version 3, the latter Microsoft Internet Explorer version 3, pretending to be Netscape 2). In addition to %{...}i, there is a corresponding sequence %{...}o to log any of the response headers (in these sequences, the i means incoming and the o outgoing headers). Adding extra fields onto the end of the common log file format can be inconvenient, especially if you already have software which processes the log files in their current format. Luckily, Apache offers a completely customisable log file interface: you can create any number of logs files each in a different format. It is now almost trivial to add a log file for (say) user-agents or requested languages, without needing to compile in a new module or modify the Apache source code. You can even log all the common log file information into both common log format (for existing analysers) and in a delimited format at the same time! The interface to all this is via a single, simple directive: CustomLog. This directive takes both a file name to log to, and a custom format. For example, to log user-agents to a file called agents in the logs directory, you would use: CustomLog logs/agent "%{user-agent}i" Other useful log files can also be created. This next two directives create a referrer log and a log of language preferences of your clients: CustomLog logs/referer "%{referer}i -> %U" CustomLog logs/language "%{accept-language}i" You can tell the format to only log particular fields if the response status is (or is not) a particular value. For example, to only log the language preference for 200 or 304 statuses, use %200,304{accept-language}i. You can put a exclamation mark (!) straight after the % to reverse the condition (i.e. to only log if the status was not 200 or 304). The time logged by %t is in common log file format. If you want to use another format, use %{format}t, where format is a date and time format as used by strftime (see man strftime for more information). In some cases, the request will be handled by an internal redirect (this is common for things like requests satisfied by a DirectoryIndex file). In these cases, the configuration options can apply to either the original response, or the one actually delivered. The characters < and > after the % determine whether to log the original value, or the redirected value. For example, in %s you always want the value of the status actually returned, so %>s is used in the common log file definition. Each % sequence knows whether it should use the original response or the real response - for example, %r (the request line) uses the original response. The logging directives, TransferLog, LogFormat and CustomLog can be used inside virtual hosts. The way they interact with the logs set up outside the virtual hosts is like this: If there are no TransferLog or CustomLog directives inside the virtual host, log requests for this host to the logs defined in the main server. Otherwise log requests to the log files defined in this virtual host and do not use any of the log files defined in the main server. If Logformat is used in a virtual host, the format it defines is used for all TransferLog files defined inside that virtual host Otherwise the log format defined outside the virtual host is used by the TransferLogs defined inside the host, defaulting to the common log format if no LogFormat is defined in the main server. Here are all the % sequences allowed in the configurable log format in Apache. %b bytes sent, excluding HTTP headers %f filename %h remote host %{Header}i The contents of Header: header line(s) in the request sent from the client %l remote username (from identd, if supplied) %{Note}n The contents of note "Note" from another module %{Header}o The contents of Header: header line(s) in the reply %p the port the request was served to %P the process ID of the child that serviced the request %r first line of request %s response status. For requests that got internally redirected, this is status of the original request: use %>s for the returned status %t time, in common log format time format %{format}t The time, in the form given by format, which should be in strftime format %T the time taken to serve the request, in seconds %u remote user (from auth; may be bogus if return status (%s) is 401) %U the URL path requested %v the name of the server (i.e. the virtual host) Orton, Joe Web Authoring and HTTP Joe Orton explains WebDAV, the distributed authoring protocol for HTTP. Feature: Web Authoring and HTTP Traditionally, HTTP has only been used for web browsing, not web authoring. In situations where the author of a web site does not have direct access to the file-system which is being served, a protocol is used such as NFS, or a version control system which allows remote access, such as CVS. Alternatively, less privileged authors, who are using a dial-up Internet Service Provider, might be given FTP access to an area on a web server. Before giving a description of WebDAV, it is useful to give a brief introduction to HTTP itself. The protocol consists of a request: a request message sent by the client to the server, followed by a response: the reply to the message, sent from the server back to the client. There are three important elements of an HTTP request: the method, the URI, and the headers. The method describes the type of the request. The HTTP specification, RFC 2616, defines eight different methods, from the familiar GET, to the obscure TRACE. The URI identifies the resource on which the method is intended to operate. Headers provide any extra information about the request that is required. A syntactically valid (but meaningless) HTTP request and the response is given below. It uses the FOOBAR method, includes three headers "Host", "Something", and "Another", and is target at the resource "/sample/uri.html". The response uses the "501 Method Not implemented" status code, telling the client that the server does not understand the request. FOOBAR /sample/uri.html HTTP/1.1 Host: www.somewhere.com Something: else Another: header HTTP/1.1 501 Method Not Implemented Date: Mon, 16 Oct 2000 15:19:09 GMT Server: Apache/1.3.12 (Unix) DAV/1.0.2 Connection: close Allow: GET, HEAD, OPTIONS, TRACE ... During web browsing, the only HTTP methods that are normally used are GET, to retrieve documents, and POST, to submit form data back to the server. The WebDAV specification, RFC 2518, describes a set of new methods which allow clients to publish documents, and manipulate a remote repository in a variety of ways to meet the needs of web authoring. The methods fall into three groups: PROPFIND and PROPPATCH; for querying and manipulating properties. LOCK and UNLOCK; for locking purposes. MOVE, COPY and MKCOL; for basic repository manipulation. In addition to the new methods, WebDAV refines the definition of the PUT and DELETE methods, which are already present in the HTTP specification. The PUT method, as covered in a previous feature article, provides the most basic form of web publishing. This method is used to upload new or changed documents to the server. WebDAV introduces the concept of a collection of resources to HTTP. A collection is analogous to a directory in traditional file-system terms: it has a name which ends in a /, and is a container for both normal resources, and also other collections. Collections can be created using the MKCOL method, which is similar to creating directories using the mkdir command. MKCOL /dav/newcollection/ HTTP/1.0 Host: test.webdav.org HTTP/1.1 201 Created Server: Apache/1.3.11 (Unix) DAV/1.0.2 Content-Type: text/html Date: Mon, 16 Oct 2000 09:10:06 GMT ... The last two methods required for basic web authoring are the COPY and MOVE methods. These methods can operate in one of two ways: on a collection resource, they can recurse down an entire tree of resources, or alternatively, they can just operate on a single resource (of any type). The Depth HTTP header is used by the client to indicate which mode of operation is desired for a particular request; Depth: infinity meaning operate recursively, and Depth: 0 meaning operate only on a single resource. WebDAV allows you to define properties on resources. Two types of properties are used: live properties, which are defined by the server, store information like the last date on which the document was modified. Dead properties are used by clients as simple data stores. An example of a dead property is the name of the author of the page. The first method which is used with properties is PROPFIND: used to simply request all properties available on a document, or alternatively, just a specific set of properties. XML is used in the request body to give the parameters for the PROPFIND request, and also in the response, to list the property names and their values. The Depth header is also used with PROFIND requests: taking the values 0 and infinity as before, meaning in this case "give properties for a single resource only", or "give properties for all resources in this collection and below" respectively. The value 1 is also allowed, which requests properties on a collection resource, and it's immediate descendants only, without recursing into any child collections. A simple request for the properties "getlastmodified" and "getcontentlength" is given below: (the values returned for these properties are highlighted in italics) PROPFIND /dav/test.html HTTP/1.1 Host: test.webdav.org Depth: 0 Content-type: text/xml Content-Length: 174 <?xml version="1.0" encoding="utf-8" ?> <propfind xmlns="DAV:"> <prop> <getlastmodified/> <getcontentlength/> </prop> </propfind> HTTP/1.1 207 Multi-Status Server: Apache/1.3.11 (Unix) DAV/1.0.2 Content-Type: text/xml; charset="utf-8" Date: Fri, 13 Oct 2000 13:51:25 GMT <?xml version="1.0" encoding="utf-8"?> <D:multistatus xmlns:D="DAV:"> <D:response xmlns:lp0="DAV:" xmlns:lp1="http://apache.org/dav/props/"> <D:href>/dav/test.html</D:href> <D:propstat> <D:prop> <lp0:getlastmodified>Fri, 13 Oct 2000 12:51:56 GMT</lp0:getlastmodified> <lp0:getcontentlength>105</lp0:getcontentlength> </D:prop> <D:status>HTTP/1.1 200 OK</D:status> </D:propstat> </D:response> </D:multistatus> The PROPPATCH method, similarly, uses an XML request body to specify the changes which should be made to a set of properties. PROPPATCH requests are made up of a combination of the following two operations: delete a named property submit a new value for a named property A lot of web authoring will involve more than one person working on a site at the sime time. Under these circumstances the lost update problem can occur, where two authors download a document and make some changes, then later, both authors upload their changes again, one set overwriting the other. WebDAV provides a mechanism which can be used to prevent this situation, by allowing authors to lock a document while they are editing it. Once an author has locked a document, they are guaranteed that nobody else will be able to upload changes to the document. The WebDAV specification makes locking support optional for server implementors. The level of server support for WebDAV is defined to in one of two classes: and Class 1, all requirements are met for basic web authoring, and Class 2, which extends Class 1 to include locking support. The mod_dav module adds WebDAV support to an Apache 1.3 server. mod_dav has been under development for two years, and is currently at version 1.0.2. The module has also been integrated into the Apache 2.0 source tree, and is distributed as part of the recent alpha 7 release. Commercial WebDAV servers are available from Microsoft, Xythos, and Novell, amongst others. The on-line storage market has eagerly embraced WebDAV, with sites like Sharemation, MyDocsOnline, and Driveway all offering access to private or shared WebDAV repositories for free. Microsoft are providing strong support for WebDAV on the client side: Internet Explorer 5 is provided with "Web Folders", which allow the user to view and manipulate a WebDAV repository inside the web browser. Office 2000 also supports editing web pages in-place using DAV, and makes use of the locking methods to prevent the lost update problem as described above. Microsoft's web publishing package FrontPage, ironically, lacks WebDAV support. Adobe GoLive 5 also supports WebDAV. There are several Open Source WebDAV projects. cadaver provides a command-line interface similar to the ubiquitous ftp client. For Macintosh users, Goliath has a familiar Finder-like interface. For more information, refer to the list hosted at the webdav.org site hosts of open source and commercial projects with WebDAV support. Module Soup Customise Apache to do what you want it to do by adding in extra modules, or remove modules you do not need. Module Soup Apache's 'modular' architecture makes is possible for anyone to add new functions to the server. In fact, most of the code that comes as part of the Apache distribution is in the form of modules, and can be removed or replaced. For example, if the 'asis' function is never needed, the asis module (mod_asis) can be removed, making the server executable smaller and potentially reducing the load on the server host. There are a large number of modules now written for Apache. Besides those included with the distribution, modules are also written to add functions not already in the code, or to do things which are needed on some sites but are not of widespread use. Some of these modules are written by Apache developers. Most of them, however, are written by other users of Apache who want to adapt its functionality for their needs. In this article, we will look at a range of Apache modules which can be added to the server. First though, we show how to add a new module. It is easy to add a module to Apache: Obtain the module source code file and place in the Apache src directory Add the module definition to the Apache 'Configuration' Re-compile Apache Install the server executable and re-start the server So first you need to download the new module. Most modules come as a single source file, called mod_something.c. Place this file in Apache's src directory. If the module comes as more than one file (for example, the PHP/FI module) follow the instructions that come with the module. Having got the module source, Apache needs to be configured so that it will compile this code. To do this, edit the Configuration file in the src directory, and add a suitable Module line. This will have the format Module name_module mod_something.o The first argument, name_module, must match the name given in the module's source code - look for the 'module definition' near the end of the file, which will look like this: module name_module = { NULL, ... }; The name_module text in the Configuration file must match the name_module text in the module source exactly. The second argument on the Module line is the filename of the module, with the final .c replaced by .o. After editing Configuration, re-compile Apache by running ./Configure make Finally, stop your current server (with kill -TERM pid), install the new httpd executable, and start it running (e.g. ./httpd -d /usr/local/httpd). If you have not looked at the standard modules which come with Apache, you might be missing some functions you could find useful. In addition, you might be compiling in some things you never use. All the standard Apache modules are listed in the Configuration file. The next release of Apache will come with a few more standard modules, such as a module to rewrite URLs on the fly, and a module to add PICS content-rating labels to responses. Modules can be found in several different places: In the Apache 'src' directory In the Apache 'contrib/modules' directory In the 'Module Registry' Other sites (try a search engine and look for "Apache Module"). To simplify finding modules to do what you want, here is the Apache Week guide to add-on modules by function. These are taken from all the above sources, and are presented as an example of what is available. We cannot guarantee that these modules with do what they say they do, or even that they work with all versions of Apache. If a module named below is not a link, then that module is distributed with Apache 1.1.1. Otherwise the link will take you to that module (if the link is to a .c or .tar file, save it to a file, else the link goes to an HTML page or FTP directory). Authentication There are a whole range of options for different authentication schemes. The usernames and passwords can be stored in flat files (with the standard mod_auth), or in DBM or Berkeley-DB files (with mod_auth_dbm or mod_auth_db respectively). For more complex applications, usernames and password can be stored in mSQL, Postgres95 or DBI-compatible databases, using mod_auth_msql, mod_auth_pg95 or mod_auth_dbi. If passwords cannot be stored in a file or database (perhaps because they are obtained at run-time from another network service), the mod_auth_external.c module lets you call an external program to check whether the given username and password is valid. If your site uses Kerebos, mod_auth_kerb allows Kerebos-based authentication. For LDAP authentication, see mod_auth_ldap. The mod_auth_anon module can be used to allow an 'anonymous-ftp' style access to authenticated areas, where users give an anonymous username and a real email address as password. There are also modules to hold authentication information in cookies, and to authenticate against standard /etc/passwd and NIS password services. See the Module Registry. Blocking Access mod_block.c blocks access to pages based on the 'referer' field. This can be used to help prevent (for example) your images being used on other people's pages. For more complex cases, mod_rewrite can be used to implement blocking based on arbitrary headers (e.g. referer and user-agent), as well as on the URL itself. Counters There are a number of counter modules available, including mod_counter.c and mod_cntr. Some server-side scripting languages, such as PHP/FI can also provide access counters. Faster CGI Programs Perl CGIs can be sped up considerably by using the mod_perl modules, which build a perl interpreter into the Apache executable, and optionally allows scripts to start up when the server starts. Alternatively, the mod_fastcgi module implements FastCGI on Apache, giving much better performance from a CGI-like protocol. Languages and Internationalisation The Russian Character Set (RCS) module provides support for Russian character sets, while mod_fontxlate can translate characters in single-byte character sets, for countries with multiple non-standard character sets. Miscellaneous mod_speling.c attempts to fix mis-capitalised URLs, by comparing with files and directories in a case-insensitive manner. A module which makes your ftp archive into web pages is available at mod_conv.tar.gz. Server-Side Scripting There are several different modules which allow simple (or not so simple) scripts to be embedded into HTML pages. XSSI is an extended version of standard SSI commands, while PHP and NeoScript are more powerful scripting languages. Throttling connections mod_simultaneous.c limits the number of simultaneous accesses to particular directories, which could be a way of implementing limits for images directories. mod_bandwidth provides a similar service. mod_throttle can be used to slow down responses for users who exceed a given "bytes per second" download rate. URL rewriting The mod_rewrite module is a powerful (and complex) way of mapping the request URL onto a new URL on the fly, using regular expressions and optionally mapping files in text or DBM format. It can also implement conditional rewrites based on other request headers (e.g. User-Agent). Converting from NCSA The differences between Apache and NCSA HTTPd. Also, how to convert an existing NCSA HTTPd installation over to Apache. Converting from NCSA The two most popular Web servers according to the Netcraft Survey are Apache and NCSA HTTPd. Both servers are widely used, although according to the server survey Apache is used on over twice as many sites as NCSA, and the market share of NCSA is dropping while Apache's is growing. This feature is designed to explain the differences between NCSA HTTPd and Apache, so that users of either server can decide if the other meets their requirements better. We then look in detail at the directives changed between NCSA and Apache, which can be used by existing NCSA users if they decide to convert to Apache. Or it can also act as a guide to converting the other way. NCSA version 1.3 was the base for Apache development. Initially, Apache was a drop-in replacement for the NCSA HTTPd, however as both have developed there are now some differences between the two servers. Since then, much of Apache's code has been considerably rewritten, in particular to allow the functionality to be extended with modules. This feature explains how the current versions of Apache and NCSA HTTPd differ, what features Apache adds, and those it lacks. This is followed by a detailed list changes between NCSA and Apache. The versions used for the comparison are Apache 1.3 and NCSA HTTPd 1.5.2. Perhaps the most important difference between Apache and NCSA is that Apache is extensible via a programming API. The means that the functionality of Apache can be extended almost arbitrarily, via modules. The list of Apache features given here concentrates on the functions provided by the server in its default configuration, or with the addition of modules distributed as part of Apache. However there are a lot of additional modules which can be added to perform specific tasks. See our feature on additional modules for an idea of the extensibility of Apache. Unless stated otherwise, the features listed in this section are available with the default server configuration. If the module is marked as optional, then it is part of the official Apache distribution, but not compiled in by default. If a module is described as third party then it is not part of the Apache distribution. Leaving aside the third-party modules, the main features that Apache supports and NCSA does not are: Additional authentication options: anonymous, from a Berkeley DB file, from an mSQL or Postgres95 database All directives can appear in any of the configuration files Automatically set the mime type of responses based on the file contents (using mod_mime_magic) Call a CGI program when file of particular mime type is accessed, with Action directive Configurable logging format (with LogFormat) and multiple log files (with CustomgLog) Correct some typos in URLs with the optional "spelling" module (mod_speling) Create a user clickstream log (optional mod_usertrack module) Customise CGI environment variables (optional mod_env module) Dynamic module loading (optional mod_so module) Enhanced server-side includes (SSI) Imagemap extensions - internal support (like NCSA 1.5) with additional directives (ImapMenu, ImapBase, ImapDefault) Info module which displays the compiled in modules and current configuration Listen on selected addresses and ports (Listen directive) Pipe any log file to another process, instead of writing to a file Proxy module to provide HTTP and FTP proxying. Can also operate as a "reverse proxy" to load-balance multiple servers. Restrict access by URL with <Location> sections, which compliments <Directory>. Restrict access by filename with <Files>. All of these can also match against regular expressions. Rewrite URLs based on complex criteria (including conditionals), with mod_rewrite Server pool tuning with MaxSpareServers and MinSpareServers Server-based content negotiation, based either on a file listing the variants, or automatically generated from file extensions Set actions for files with particular extensions (SetHandler and AddHandler directives) Set environment variables based on any received headers or other information about the request, with SetenvIf. Set mime type for all files in directory with ForceType directive Status module to see the status of the child processes and what request they are currently servicing Turn DNS lookups on/off at run time (HostnameLookups directive) USER_NAME environment variable set when SSI execs a CGI, giving owner of SSI file Use CERN format 'metafiles' to add header info to response (optional mod_cern_meta module) Unbuffered CGI output (actually Apache does buffer CGI output for efficient use of the network, but will send output to the client as soon as the CGI is no longer providing more output) Year 2000 compliant <VirtualHost> sections can contain almost any configuration directive, with no need for <SRMOptions> sections Apache does not implement these features: Kerberos Parsing output of CGI for SSI directives Authentication against NIS usernames and passwords (although there are third party modules which do this) Some features that are available in both NCSA and Apache are implemented differently in the servers. The detailed list of changed directives, below, gives more information. This is a summary of the main changes. .htaccess files restricting by host .htaccess files written using the examples on the NCSA site which restrict by host, not by user, may not work with Apache. The examples to restrict by host also include the AuthName and AuthType directives, which are only used in user authentication. The fix is to remove any of these commands from .htaccess files which only restrict by hostname (any AuthUserFile and AuthGroupFile directives should also be removed). DBM User and Group Files Apache supports DBM user and group files for authentication if the optional DBM module is used (mod_auth_dbm). This is configured by different directives, unlike NCSA which uses the same directives with a second argument to specify DBM format. (Apache 1.2 will also allow use of the same directive syntax as NCSA) Digest Support Digest authentication can be added to Apache with the optional digest module (mod_digest). FastCGI support This is available for Apache with an third party module (from the fastCGI site). Non-IP Virtual Hosts In Apache, these are implemented using the normal <VirtualHost> sections. The name to respond to is given in the <VirtualHost> directive or on a ServerAlias directive. The NameVirtualHost directive must be given to specify which interfaces are used for name-based virtual hosts. Apache does not implement <Host>. Log Files Apache logs to the transfer log in the standard common log format. It does not support the LogOptions directive to build user agent and referrer information into the log file. However, the log format can be completely customised with the LogFormat directive and multiple logs can be created with CustomLog. KeepAlive The directives to support keepalive (persistent connections) use a different syntax. Server Pool Both Apache and NCSA 'pre-fork' a pool of servers to handle requests. However, in Apache the main (parent) process does not handle any part of the request. In NCSA, the parent process receives each request then hands it to a suitable child. In Apache, a pool of 'spare' servers is maintained, and the number of servers is configurable. XBITHACK This is a runtime directive in Apache. This section lists all the directives that NCSA supports. For each directive, we say whether that directive exists in Apache, and if it does, whether there is any change in meaning or syntax. Where directives do not exist in Apache, we either give an alternative method of implementing it in Apache, or state that the feature related to that directive is not implemented (if it will be implemented in Apache 1.2, we note it here). Apache does not distinguish between the three configuration files that NCSA HTTPd uses. That is, in Apache, any directive can appear in any of the configuration files (and in fact it is possible to put all the directives into a single file, if desired). However, this list of directives is split into sections for each of the configuration files, and the directives listed in the same order as given in the NSCA documentation. Directives valid in NCSA's Server Configuration file (httpd.conf): ServerType: same Port: same User: same Group: same ServerAdmin: same ServerRoot: same ServerName: same StartServers: same (but NCSA does not use the same method for its pool of servers) MaxServers: use MaxClients instead, same syntax and meaning MaxRequestsPerChild: same TimeOut: same, except Apache resets the timeout on sending each time data is written (when sending a file), so this is not an overall timeout. AccessConfig: same ResourceConfig: same TypesConfig: same IdentityCheck: same, except can be set in .htaccess files BindAddress: same syntax. Can however be used with virtual host configurations. See also new Listen directive for more control over addresses bound to <Host>: not valid. Implement non-IP virtual hosting using normal <VirtualHost> section and NameVirtualHost <VirtualHost>: same, except Apache does not support the errorlevel argument (it effectively defaults to 'required'). <VirtualHost> can take multple hosts and IP addresses. <VirtualHost> is used to implement non-IP vhosts (see NCSA Host directive) when combined with NameVirtualHost. Almost all directives are valid within a <VirtualHost> section, so the NCSA <SRMOptions> section is not needed in Apache. <SRMOptions>: not applicable. Apache does not distinguish between the three config files, so directives are valid in all. You can just remove the <SRMOptions> and </SRMOptions> lines. ErrorLog: same, except Apache can log to a pipe (ErrorLog |program) TransferLog: same. Apache can also log to a pipe (i.e. another process) with "TransferLog |program". Log file is in standard 'common log format'. No LogOptions Combined format to include user agent or referer information, howeve the log format can be set with LogFormat directive and multiple log files created with CustomLog AgentLog: available if mod_log_agent compiled in. Syntax same, except Apache may log to a pipe, AgentLog |program. RefererLog: available if mod_log_referer compiled in. Syntax same, except Apache may log to a pipe, RefererLog |program. RefererIgnore: available if mod_log_referer compiled in. Syntax same. PidFile: same LogDirGroupWriteOK: not implemented. LogDirOtherWriteOK: not implemented. LogOptions: not valid in Apache. To specify formats, use the mod_log_config module and LogFormat instead. For separate agent and referer logs, use mod_log_agent and mod_log_referer modules. KeepAlive: on Apache, argument is the maximum number of requests per connection. Use a value of 0 to disable keepalives. KeepAliveTimeout: same syntax. If not given, Apache defaults to 15, NCSA 10 MaxKeepAliveRequests: not valid. Use KeepAlive instead, except a value of 0 in NCSA means stay alive forever, in Apache it disables keepalives completely AssumeDigestSupport: not valid (but it doesn't do anything in NCSA anyway) Annotation-Server: not valid Directives valid in NCSA's Resouce Configuration file (srm.conf): DocumentRoot: same UserDir: same, except apache can also use a full-path with * to represent username (e.g. UserDir /home/*/public_html). Also Apache can redirect to a full URL. AccessFileName: same Redirect same (but the order that Alias and Redirects are applied may be different). Apache can only redirect to a full URL, not a relative URL. RedirectPermanent: same, but "Redirect permanent" is prefered RedirectTemp: same, but "Redirect temp" is prefered Alias: same ScriptAlias : same AddType same, except Apache can have multiple extensions listed AddEncoding same, except Apache can have multiple extensions listed DefaultType: same DirectoryIndex: same (Apache can use multiple names, as can HTTPd 1.5). Apache can list names as URLs relative to the server root. FancyIndexing: same, but IndexOptions preferred DefaultIcon: same ReadmeName: same HeaderName: same AddDescription: same AddIcon: same AddIconByType: same AddIconByEncoding: same IndexIgnore: same IndexOptions: same ErrorDocument: same, except Apache can also output a static string with ErrorDocument "string, or redirect to a full URL. Apache passes on more REDIRECT_xxx env variables (all variables existing at time of the redirect are renamed REDIRECT_variable). But it does not pass on the error message in QUERY_STRING, or REDIRECT_REQUEST (use REDIRECT_URL instead). Apache can put ErrorDocument in .htaccess. Directives valid in NCSA's Access configuration file (access.conf, or .htaccess files where allowed): <Directory>: same Options: same, except Apache also supports MultiViews option (for server-side content negotiation) AllowOverride: same, except Apache does not use Redirect (use FileInfo instead to control Redirects in .htaccess file) AuthName: same, except in Apache any realm name containing spaces must be enclosed in double quotes AuthType: same (basic only) digest supported by optional mod_digest AuthUserFile: same, except Apache does not support the second argument (standard, dbm or nis). Use AuthDBMUserFile instead for DBM format (1.2 will implement second arg to AuthUserFile). There are third party modules which implement NIS authentication. AuthGroupFile: same differences as AuthUserFile. AuthDigestFile same, if optional mod_digest compiled in <Limit>: same. Note that in Apache, the directives valid inside <Limit> can also appear outisde, in which case they apply to all methods order: same deny: same (but see allow for note about partial comparisons) allow: same, except Apache applies comparisons against full components only, eg bar.edu matchs x.bar.edu, but does not match x.foobar.edu. require: same referer: not valid in Apache. To restrict by referer, or any other request header, use third party module mod_rewrite (to be distributed in Apache 1.2) satisfy: same OnDeny: not valid. Can be implemented by specifying an ErrorDocument 401 Other changes: The XBITHACK functionality is configurable at runtime with XBitHack directive All configuration directives can be used in any of the config file Apache does not set the SERVER_ROOT, REMOTE_GROUP or ANNOTATION_SERVER CGI variables Content Negotiation How to use Apache's content negotiation to transparently serve files in different languages, media types or character sets. Content Negotiation Explained Content Negotiation is an often over-looked feature of Apache, but correctly used it can let you present documents in different languages and formats based on what the user wants. Apache is one of the few servers that actually implements content negotiation. However there are a few problems caused by browsers which do not do the right thing. We explain how to use negotiation correctly, and why some browsers make this difficult. Content negotiation is a very powerful tool where the browser says what type of information it can accept, and the server decides what (if any) type of information to return. The term type is used very loosely here, because negotiation can apply to several aspects of the information. For example, it can be used to choose the appropriate human language for a document (say, French or German), or to choose the media type that the browser can display (say, GIF or JPEG). In order for the server to deliver the correct representation of the data, the browser must send some information about what it can accept. A browser used on a French-language machine, for instance, should indicate that it can accept data in French (of course, this should also be user-configurable). The most common use of content negotiation at the moment is to select data based on media type. Here, the browser says what sort of data it can display. For example, when requesting an inline image, the browser could tell the server that it can accept GIF and JPEG images. Infact, the browser might prefer to JPEG over GIF images because they are quicker to download, so it can specify this as well. The ability to indicate what content types a browser can accept is particularily important now that plug-ins can extend the browser capabilities. Unfortunately many current browsers don't supply the correct information to the server. To use negotiation, you need two things. Firstly, you need a resource that exists in more than one format (for example, a document in French and German, or an image stored as a GIF and a JPEG), and secondly you need to configure Apache to know that each of these files is actually the same resource. Apache has two methods for doing this: either using a special index file to identify the various versions of the information, or using the MultiViews facility where Apache gets the information it needs from file extensions. The first method involves creating a variants file, usually referred to as a var file. This lists each of the files which contains the same resource, along with details of what representation it is. Any request for this var file causes Apache to return the best file, based on the contents of the var file and the information supplied by the browser. To get Apache to use variant files, first uncomment the following line in srm.conf: AddHandler type-map var and restart the server as normal. As an example, say there is a file in English and a file in German containing the same information. The files could be called english.html and german.html (they are both HTML files). So create a var file listing each of these files, and specifying which languages they are in. Create a var file called (say) info.var containing: URI: english.html Content-Language: en URI: german.html Content-Language: de This file consists of a series of sections, separated by blank lines. Each section contains the name of the file (on the URI: line) and header information used in the negotiation. Now, when a request for info.var is received, the server will read the var file and return the best file, based on which languages the browser has said it can accept. Similarly, the var file could be used to select files based on content type (using Content-Type:) or content encoding (using Content-Encoding:), or any combination. The Content-Type: line in a variants file can also give any other content type parameters, such as the subjective qualify factor. This will be used in the negotation when picking the 'best' match. For example, an image available as a JPEG might be regarded as having higher quality then the same image in GIF format. To tell this to the server, the following .var contents could be used: URI: image.jpg Content-Type: image/jpeg; qs=0.6 URI: image.gif Content-Type: image/gif; qs=0.4 Here the qs parameters give the 'source quality' for these two files, in the range 0.000 to 1.000, with the highest value being the most desirable. A browser than indicates it can handle both GIF and JPEG files equally would see the JPEG version rather than the GIF. Using variant files gives complete control over the scope of the negotiation, however it does require the file to be created and maintained for each resource. An alternative interface to the negotiation mechanism is to get Apache to identify the negotiation parameters (language, content type, encoding) from the file extensions. Instead of using a var file, file extensions can be used to identify the content of files. For example, the extension eng could be used on English files, and ger on German files. Then the AddLanguage directive can be used to map these extensions onto the standard language tags. To use this feature, the MultiViews option must first be turned on in the directory, either in access.conf or a .htaccess file. Note that Options All does not turn on multiviews. After enabling multiviews, the directives which map extensions onto representation types can be given. These are AddLanguage, AddEncoding and AddType (content types are also set in the mime.types file). For example: AddLanguage en .eng AddLanguage de .ger AddEncoding x-compress .Z AddType application/pdf pdf (the last line is shown as an example only, this is actually set in the mime.types on recent Apache versions). When a request is received, the server looks at all the files in the directory which start with the same filename. So a request for /about/info would cause the server to negotiate between all the files names /about/info.* For each matching file, the server checks its extensions and sets the content type, language and encodings appropriately. For example, a file called info.eng.html would be associated with the language tag en and the content type text/html. The source quality is assumed to be 1.000 for all files (this can actually be set on the mime type, like "text/html;qs=0.5" but this confuses most browsers so is probably best not used). The extensions can be listed in any order, and the request itself can include one or more extensions. For example, the files info.html.eng and info.html.ger could be requested with the URL info.html. This provides an easy way to upgrade a site to use negotiation without having to change existing links. Of course, for negotiation to work browsers must send the correct information. While most make a reasonable attempt there are some problems. For negotiation to work, browsers must send the correct request information. For human languages, browsers should let the user pick what lanuguage or languages they are interested in. Recent betas versions of Netscape let the user select one or more languages (see the Options, General Preferences, Languages section). For content-types, the browser should send a list of types it can accept. For example, "text/html, text/plain, image/jpeg, image/gif". Most browsers also add the catch-all type of "*/*" to indicate that they can accept any content type. The server treats this entry with lower priority than a direct match. Unfortunately, the */* type is sometimes used instead of listing explicitly acceptable types. For example, if the Adobe Acrobat Reader plug-in is installed into Netscape, Netscape should add application/pdf to its acceptable content types. This would let the server transparently send the most appropriate content type (PDF files to suitable browsers, else HTML). Netscape does not send the content types it can accept, instead relying on the */* catch-all. This makes transparent content-negotiation impossible. In addition, most browsers do not indicate a preferences for particular types. This should be done by adding a preference factor (q) to the content type. For example, a browser which can accept Acrobat files might prefer them to HTML, so it could send an accept type list which includes text/html: q=0.7, application/pdf: q=0.8. When the server handles the request, it would combine this information with its source quality information (if any) to pick the 'best' content type to return. The new HTTP/1.1 specification defines how content negotiation works for the first time. It also adds some new facilities which are not yet available in any browser or server. This includes the ability for the server to return a list of possible matches if it cannot identify the best one to use. Apache implements the server end of HTTP/1.1 content negotiation. Publishing Pages with PUT Apache can support publishing pages with PUT, but it requires some work. Publishing Pages with PUT One of the most common questions we get asked is whether Apache supports web publishing with the PUT method. Netscape Navigator Gold, AOLPress and Amaya all support this method of publishing pages. Technically the answer is yes, Apache supports that method. However it does not come with any scripts or programs which actually implement the publishing behaviour. This article explains what the PUT method is, how it can be used in Apache, and what is required to support publishing with it. It also gives a basic script to handle publishing, and explains why this script should be used very carefully to prevent security problems. First published in Apache Week issue 59 (4th April 1997). When a browser requests a normal page from a server, it uses the "GET" method. This is the standard way to get back information from a server. The information itself may come from a static page, a CGI program, a server-side include page or any other source handled by the server. By definition it is safe for a browser to obtain a page by GET as many times as it likes - it will never cause any permanent action on the server (such as entering a product order). To perform a permanent action on the server, the "POST" method is used. This method must be handled by a program or script, and the browser should not re-request a POST page without getting the user to confirm it. This POST method is used when a script or program requires a lot of form data input or when the request makes the server perform a real action such as entering an order. The "PUT" method is similar to the POST method in that it can cause information to be updated on the server. The difference is that the POST method is normally handed a script which is explicitly named by the resource (that is, something that already exists), while a PUT request could be directed at a resource which does not (yet) exist. Another difference is that the POST method can be used in response to a form, while the PUT method can only contain a single data item. The PUT method is suited for publishing pages. There is some confusion about whether Apache supports the PUT method. In fact, Apache handles PUT exactly like it handles the POST method. That is, it supports it, but in order for it to do anything useful you need to supply a suitable CGI program. This is on contrast to the GET method, which Apache supports internally by sending back files or SSI documents. If you have a script which is capable of handling PUT requests, you can easily configure Apache to support that script. This is done with the Script directive. This specifies a script (i.e. a CGI program) to be run whenever a PUT request is received. For example, if you put your CGI program which handles PUT requests into /cgi-bin/put, you would add this like Script PUT /cgi-bin/put into your srm.conf or access.conf (depending on whether you want your entire contents to handled by this script, or just a specific subdirectory). Note that you also need to make sure that this script is executable, by either placing it in a ScriptAlias directory, or giving it a suitable extension and turning on CGI execution for that extension. The CGI script has to be able to accept a page sent it, and look and the request URL to decide where to place the file. If it is successful it should return a status of 201 or 204 if everything went ok. The basic operation of a PUT script should be: Check that request comes from the PUT method Get the file to update or create from PATH_TRANSLATED Read the data (read CONTENT_LENGTH bytes from standard input) Write the data to the file Return a 201 or 204 status. A simplistic script to implement PUT handling like this is available in put1. Among aother limitations, this script does not check to see if you are attempting to upload a CGI script or if the destination is a directory. However the main failing is that it implements no security checks, and if you have a secure setup it will not even have permission to update the files. Configuring Apache is the easy part: the hard part is creating a server environment and script which are secure. Some of the main security requirements are: Make sure the PUT script can only be run by authorised users Make sure that the script can update only web content files Make sure the authorised users can only update their pages, not other people's pages on the same server The first issue can be addressed by making sure that the script is protected by username and password authentication. The second issue is more complex. To be able to update the files on your server the script must have enough permission to write or create the content files. This in itself is a security risk, since it means if a bug or security hole is found in any of your other CGI programs anyone on the Internet could potentially change any of your files. On most servers the httpd process runs a some relatively unprivileged user, such as "nobody". This user should not own or have write access to any of the files on the server. So the first problem with generating a secure PUT script is determining how the script can get permissions to update files owned by a different user. One way of doing this, new in Apache 1.2 betas, is to use the "suEXEC" code. This allows a script to be run as a different user. This comes with Apache but is not installed by default, because of the security risks it can create if used inappropriately. You need to install it, and arrange it so that the PUT script is executed as the user that owns your web files. In this case, it would be sensible to ensure that this user does not have write access to any other parts of the file system, such as your Apache configuration files or .htaccess files. The final security issue applies if you have multiple content providers (such as different customers) where you cannot trust them not to try to update each other's pages. There are several ways to add fix this: If the customers are in different virtual hosts, use the suEXEC mechanism to give each customer a different Unix username and execute the script as that user. Use a different PUT script for each customer, with individual access authentication for each user, and hard-code the paths that they are allowed to update into the script. Add lots of careful checks into the PUT script to ensure that each REMOTE_USER can only update pages in their area Netscape Navigator Gold, AOLPress and Amaya can publish pages with the PUT method. Assuming you have a PUT script which provides a level of security you are happy with, this section explains how to use these programs to publish pages. Other Web publishing program should be similar. To publish pages, you need to configure your server as given above. This section shows how to do this in more detail with better user security. First, decide which areas of your document tree you want to allow people to publish to. For this example, we will assume people can publish to any page on the server. You need to add a Script PUT directive into the <Directory> section for the directory where you want to enable PUT uploading, and put the PUT script into a user-authenticated directory. For example <Directory /usr/local/etc/httpd/htdocs> Script PUT /cgi-bin-putusers/put.cgi </Directory> <Directory /usr/local/etc/httpd/cgi-bin-putusers> AuthType Basic AuthName "Authorised PUT Publishers" AuthUserFile /usr/local/etc/httpd/htpasswd-putusers Require valid-user </Directory> ScriptAlias /cgi-bin-putusers /usr/local/etc/httpd/cgi-bin-putusers You will have to modify this for your setup. You also need to enter a username and password into the htpasswd-putusers file using htpasswd. Note that there are many other ways to configure user authentication for a PUT script, including using a <Files> to apply a restriction to just the PUT script, or using <LIMIT PUT> to limit just the PUT method scripts within your existing cgi-bin directory. With this configuration, all PUT requests will be handled by the named script (/usr/local/etc/httpd/cgi-bin-putusers/put.cgi). Now all you need to do is author a page then select the publish function. In AOLPress and Amaya, you do File, Save (or File, Save As) and type the full URL of the location to publish the file to (e.g. http://www.my_server.com/first.html). In Navigator Gold, select File, Publish. In the "Upload Files to this location" box, enter the full URL of the page to create. For example, if your server is called www.my_server.com and you want to upload to a file called "first.html" in the document root, you would enter http://www.my_server.com/first.html Also enter the username and password you created in the htpasswd-putusers file. With AOLPress, select File|Save As, then type the full URL of the page to upload into the "Location" box. There are few scripts available which implement PUT handling securely. For this reason the general recommendation for using publishing functions is to use FTP rather than HTTP where possible. However if you want to implement PUT-based publishing, you might like to start which one of these programs: A PUT program in C designed for the CERN server mod_put Apache module for PUT and DELETE The issues raised in the above section on security apply to these programs as well, so before you use them review the source code, install them in a user-authenticated area, and make sure that when run from the httpd server they only have write permission to the content files you want to be able to update. Using Server Side Includes Server Side Includes make adding dynamic content to your documents easy. We show how to use SSI on your site, and the extensions that Apache supports. Using Server Side Includes While standard HTML files are fine for storing pages, it is very useful to be able to create some content dynamically. For example, to add a footer or header to all files, or to insert document information such as last modified times automatically. This can be done with CGI, but that can be complex and requires programming or scripting skills. For simple dynamic documents there is an alternative: server-side-includes (SSI). SSI lets you embed a number of special 'commands' into the HTML itself. When the server reads an SSI document, it looks for these commands and performs the necessary action. For example, there is an SSI command which inserts the document's last modification time. When the server reads a file with this command in, it replaces the command with the appropriate time. Apache includes a set of SSI commands based on those found in the NCSA server plus various extensions. This is implemented by the includes module (mod_includes). By default, the server does not bother looking in HTML files for the SSI commands. This would slow down every access to a HTML file. To use SSI you need to tell Apache which documents contain the SSI commands. One way to do this is to use a special file extension. .shtml is often used, and this can be configured with this directive: AddHandler server-parsed .shtml AddType text/html shtml The AddHandler directive tells Apache to treat every .shtml file as one that can include SSI commands. The AddType directive makes such that the resulting content is marked as HTML so that the browser displays it properly. An alternative method of telling the server which files include SSI commands is to use the so-called XBitHack. This involves setting the execute bit on HTML files. Any file with a content type of text/html (i.e. an extension .html) and with the execute bit set will be checked for SSI commands. This needs to be turned on with the XBitHack directive. For either method, the server also needs to be configured to allow SSIs. This is done with the Options Includes directive, which can be placed in either the global access.conf or a local .htaccess (although the latter must first be enabled with AllowOverride Options). Since some SSI commands let the use execute programs which could be a security risk, an alternative option, IncludesNOExec lets SSI commands work except for any which would execute a program. All SSI commands are stored within the HTML in HTML comments. A typical SSI command looks like this:  In this case the command is flastmod, which means output the last modified time of the file given. The arguments specify the file "this.html" (which might be the name of the file containing this command). The whole of the command text, including the comment marker  will be replaced with the result of this command. In general, all commands take the format:  where arg1, arg2, etc are the names of the arguments and value1, value2 etc are the values of those arguments. In the flastmod example, the argument is 'file' and it's value is 'this.html'. Often commands can take different argument names. For example, flastmod can be given a URL with the argument virtual, to get the last modified time from the server. For example:  to get the last modification time of the home page on the server (this is useful if the page being accessed might have a different file name, for instance). Besides flastmod, there are SSI commands which get the size of a file or URL, the contents of a variable (passed in by the server), the contents of another file or URL, or the result of running a local file. These are documented in the NCSA tutorial on server side includes. When SSI commands are executed, a number of 'environment variables' are set. This include the CGI variables (REMOTE_HOST etc), and some more, such as DOCUMENT_NAME and LAST_MODIFIED. These can be output with the echo command (so a better way of getting the last modification time of the current file would be ). Apache extends the standard (NCSA-compatible) SSI language considerably. Some of the extensions include: Variables in commands: Apache allows variables to be used in any SSI commands. For example, the last modification time of the current document could be obtained with  Setting variables: the set command set be used within the SSI to set variables. Conditionals: SSI commands if, else, elif and endif can be used to include parts of the file based on conditional tests. For example, the $HTTP_USER_AGENT variable could be tested to see the type of browser and different HTML codes output depending on the browser capabilities. Here are some examples of using SSI: Displaying document information The following code puts the document modification time on the page: Last modified:  Adding a footer to many documents Add the following text to the bottom of each of the documents:  Hide links from external users Use the if command and the REMOTE_ADDR CGI variable to see if the user is in the local domain:  <a href="internal-documents.html">Internal Documents</a>  (Where 1.2.3 is the IP address prefix of the local domain). Apache and Secure Transactions All about Apache and SSL, including US export restrictions, RSA licensing, ciphers, key escrow, certificates and authorities. Feature: Apache and Secure Transactions We explain what SSL is, why Apache does not have it built in, and why it is such a complex issue. We examine the restrictive US government rules and commercial interests that together restrict what can be imported and exported from the US and Canada. First published in Apache Week issue 24 (19th July 1996). Last updated 1st September 1998. Most of the information passed across the Internet is not particularly sensitive. In fact, most if it is specifically designed to be as widely read as possible. But some information is sensitive. For example, when ordering from a site via credit card, the credit card number is transmitted across the Internet from the browser to the server. In theory, a third party could intercept this information at some point on the network between the browser and the server. To prevent this, some form of encryption can be used so that even if someone intercepts the data they cannot decode it back to the original credit card number (or what ever else it was that was encrypted). Obviously both the browser and the server need to use the same encryption method. The most widely implemented encryption system for the Web at present is SSL. SSL stands for Secure Socket Layer, a protocol developed by Netscape for secure transactions across the Web. It uses a form of public key encryption, where the information can be encoded by the browser using a publicly available public key, but can only be decoded by someone who knows the corresponding private key. Any product can incorporate SSL technology without paying any royalties. Extending Apache to handle SSL is a programming job, made relatively easy by the availability of a free SSL implementation, called SSLeay. However, the US government effectively prevents Apache from doing this. Although it is the SSL standard that defines how the encryption is applied to Web transactions, the actual encryption itself is performed by a number of cipher algorithms. When an SSL browser and SSL server first communicate they mutually pick a cipher algorithm that both support. Some commonly used ciphers are listed in this table: CIPHER BITS DESCRIPTION 3DES 168 These are well-proven, 168-bit, triple-encryption ciphers. Supported by products based on SSLeay such as Stronghold and SafePassage but not by products from Microsoft or Netscape. IDEA 128 This cipher uses 128-bit keys but it is not commonly found in web browsers or servers. It is possible, but very slow, to use triple-IDEA with 384 bit keys. In the USA and Europe a license from Ascom AG is required to use these ciphers. RC4 and RC2 128 These ciphers use 128-bit keys, which normally offer a high degree of security. Inside the USA a license from RSA is required to use these ciphers. Export RC4 and RC2 40 These ciphers use 40-bit keys but are otherwise identical to their equivalent 128-bit versions. Servers and browsers produced by Netscape and Microsoft support these ciphers. Inside the USA a license from RSA is required to use these ciphers. An interactive tool from Netcraft is available that can query any secure Web site and show which ciphers it supports. Experts agree that 40 bit encryption does not provide an adequate level of safety and there have been several publicised hacks (See C|Net story). A panel of cryptographic experts including Whitfield Diffie, the inventor of public key cryptography, issued a report in January 1996 that said a minimum of 75 bits was necessary for "adequate protection against the most serious threats" and 90 bits was necessary to thwart advances in hacking techniques for the next 20 years. The US Government imposes export restrictions on arms, in a set of rules called ITAR (International Traffic in Arms Regulations). Amongst the restricted arms is "strong" encryption software. (See the EFF archive on ITAR). Software that implements SSL in the US cannot be exported because of these rules (actually, it can be exported to Canada, but no further). SSL enabled software can be exported outside of the US if the software can only encrypt using a maximum of a 40 bit key. Commercial server vendors in the US such as Netscape and Microsoft export secure servers using this weekened 40 bit encryption. Recent legislation allows for registered companies to export software that uses 56 bit keys, but only if they allow the US government to access the data under certain circumstances. This is normally done by allowing a third-party to store or recover the keys - a system referred to as "key escrow". Higher levels of encryption can also be exported to approved financial institutions (primarily banks). The US and other governments are worried that they cannot access information once it has been encrypted. They would like to be able to decrypt all encrypted data. For some time, the US government has only supported encryption schemes which would allow them to decrypt the encrypted data if necessary, such as the "Clipper" chip. In normal (secure) encryption, the only people that can decrypt the data are the sender and recipient, who between them have the necessary keys. But in key escrow schemes a third-party will also have the ability to decrypt the data (this third-party may be the developer of the encryption product, the US government, or some other "trusted" organisation). Key escrow is also referred to as key storage or key recovery. From January 1997 the US government has been allowing the export of encryption technology up to 56 bits, but only if the exporter agrees to key escrow. This would allow the US government to decrypt any data encrypted with these exported 56 bit systems. Companies which wish to export 56 bit encryption products need to be specially licensed by the US government. Apache is developed by an international team of individuals, using a server in the US. The ITAR rules mean that if the Apache server included SSL it could not be exported outside the US. This would prevent the non-US developers from continuing to work on it, and would stop anyone outside the US from using Apache. A solution to this problem adopted by some free software developers is to run a parallel development effort outside the US. The US development would not contain any SSL or encryption technology, while the non-US version would. The main problem with this arrangement is ensuring the parallel development of the two versions, and it would also require a non-US site to host the development. The problems with the export restrictions of ITAR are not limited to Apache or other free software. Many US corporations are concerned that their competitors in other countries are able to make and sell encryption-enhanced products which they are forbidden to export. (See C|Net report). In the meantime, while Apache remains an international software development based on a server in the US, it cannot incorporate SSL. There are patches to link Apache with SSL (using SSLeay), such as mod_ssl and Apache-SSL. These are legally useable for free anywhere in the world, except for the US. The problem with using this version in the US is not the export regulations (which only apply to export, not import), but rather because of the sometimes confusing issues of encryption patents and certificate authorities. Commercial servers such as Netscape base their SSL implementations on ciphers that are developed and patented by RSA Data Security in the US. Use of this technology normally requires a license fee inside the US. If Apache-SSL or mod_ssl is imported into the US, then any user would have to arrange to pay the appropriate license for the patented encryption methods which are part of SSLeay (although non-commercial users can use a license-free implementation of RSA, called RSAref). It may be difficult for an individual to license RSA. The alternative to paying the RSA license individually is to buy a commercial version of Apache with SSL for which RSA has already been licensed by the developer. Examples of such products are the Apache module Raven and the web server Stronghold. Stronghold is developed outside the US so it can also be used with full 128-bit encryption outside the US and Canada. Raven is not available outside the US and Canada with 128-bit security. Outside the US, no license fee is required for the use of the RSA methods because they are only patented inside the US and SSLeay uses an independant implementation of the cipher algorithms. This means that outside the US Apache-SSL and mod_ssl can be used for free. Having got a server, the final thing required before it can be used for secure transactions is a certificate. A server certificate is a piece of digitally-encrypted information that lets the browser know what organisation it is accessing. To prevent people just making up certificates and pretending to be official organisations, certificates can be obtained from a certificate authority, who use their position as a third-party to verify that the organisation using the certificate is who they say they are. Probably the best know authority is Verisign in the US. In fact, early versions of Netscape Navigator (version 1) would only accept certificates from Verisign. Other certificate authorities can be used but unless they are recognised by the browser manufacturers they will either be rejected when a user tries to connect or the user will be given a long sequence of warning screens. An example of this is Thawte, whose certificates are accepted by Navigator version 3 and Internet Explorer version 3.01 but not previous versions of either browser. If the server operator wants their certificates to be accepted transparently by all versions of Netscape and Internet Explorer they will have to get certificates signed by Verisign. To get a certificate from Verisign the server in use must be approved. Most commercial secure servers will have been submitted for approval by their developer, and certificates are available for Stronghold. Verisign will also issue certificates for web servers using the free SSLeay libraries, such as Apache-SSL. To get a secure server based on Apache, first decide on your certificate authority. If you want every browser to connect seamlessly you'll need a certificate from Verisign. If you don't mind that older browsers will have to go though the Netscape security wizard or be unable to connect you could use Thawte. If you are in an Intranet environment you can distribute browsers with your certificate authority already configured so you may wish to issue your own certificate. Then: Inside the US and Canada Either Buy a Verisign-accredited, RSA licensed server (such as Stronghold) or add Raven to Apache, and buy a certificate, or Download Apache and Apache-SSL or mod_ssl patches, compile, pay RSA license for RSA-patented technology, and buy a certificate or sign own certificate (however RSA may not license RSA to individuals) Outside the US and Canada Either Buy a Verisign-accredited server from a non-US vendor (e.g. Stronghold) and buy a certificate, or Download Apache and Apache-SSL or mod_ssl patches, compile, and buy a certificate or sign own certificate What the Web Server Surveys Reveal We look behind the headline figures of two popular web server surveys with an in-depth analysis of which Apache versions are being used and how long it takes the Apache community to adopt new releases. What the Web Server Surveys Reveal ApacheWeek has often reported on the success of the Apache Web Server as shown by the E-Soft Web Server and Netcraft surveys, and how they have consistently shown Apache to be the most popular and more widely deployed server than all the others combined. In this 200th issue of ApacheWeek, we look behind the headline figures of those surveys with an in-depth analysis of which Apache versions are being used and how long it takes the Apache community to adopt new releases. Although both surveys show the total number of sites using Apache, the E-Soft survey figures also reveal some interesting facts about which versions of Apache are in use, and that take up of newer releases is not immediate. Plotting the number of sites using 1.3.x versions month for month from release date indicates migration from older versions is slow. As a percentage of Apache powered sites, in the case of almost all versions, their use continues to remain constant for a few months even after a new release. Take into account the number of sites using Apache is increasing every month; and the actual number of sites using older releases continues to rise for anything up to three months after a new release becomes available. Graph 1: Individual release take up It wasn't until April this year, with Apache 1.3.9 released 9 months earlier, that the use of a single 1.3 version exceeded that of older 1.1 and 1.2 versions. Even today, only 6% of sites are using the most recent release, 1.3.12, and over 25% of sites are still powered by older Apache versions from the 1.0, 1.1, and 1.2 generations. Graph 2: Apache releases in use, May 2000 One of the most interesting findings from the survey is to see how new releases may influence the take-up of Apache as a server. Looking at the monthly increase in the number of sites powered by the server, some of the largest rises follow particular release dates. The month following the release of Apache 1.3.3 (released on October 9 1998) saw one of the highest monthly increases in use. Apache 1.3.3 was a minor upgrade to Apache 1.3.2, but fixed one quite important problem; various error responses, such as "404 Not Found" displayed the full path to the missing file. Other problem fixes included the spelling module - which in 1.3.2 did not return the list of possible matches when more than one file is similar to the requested URL - and a problem where missing .htaccess files could result in a "Forbidden" response. Some platform specific bug fixes - including the Windows zombie processes problem - were also fixed. Graph 3: Monthly increase in sites powered by Apache Apache 1.3.12, the most current version, has also seen a huge increase in use in the month following its release. This addressed security issues raised by a CERT advisory on cross-site scripting which wasn't specific to Apache and had wide reaching consequences for anyone who uses or writes scripts for web servers. Patches were quickly made available for the previous version (1.3.11) followed shortly afterwards with the release of 1.3.12 at the end of February. Once again, it was shown that the contributors to open source projects can respond as efficiently as commercial developers to major security issues. The surveys can't tell us whether the increases are attributable to upgraders or new adopters, and it is purely speculative as to whether the rapid provision of a security fix to a problem contributed to the migration from other servers to Apache. However, the E-Soft Survey shows there was an increase of 76,000 sites using Apache in March 2000, and 36,000 sites using 1.3.12. What can not be disputed is the phenomenal success of the Apache web server, now with a share of the server software market that commercial vendors only dream of. Whichever version is in use, it's all part of the ever-growing Apache community which Apache Week will continue to support. First published in Apache Week issue 35 (4th October 1996). Hints and Tips Apache Week regularily contains information about how to get the most out of the Apache server. To save you having to wade through all the past issues, here is a summary of the hints and tips we've carried, plus a few more for good measure. Hints and Tips If you are planning on upgrading to 1.3, read our Guide to Apache 1.3. See our feature on Content Negotiation As it implies, the <Directory> directive only applies to directories. Restricting access to particular files The <Location> directive can be used to restrict access based on the request URL. So it can applied to individual files. For example, to prevent access to the file /prices/internal.html by anyone outside 'domain.com', you could use <Location /prices/internal.html> order deny,allow deny from all allow from .domain.com </Location> The NCSA tutorial on .htaccess files shown an example .htacces file like this: AuthUserFile /dev/null AuthGroupFile /dev/null AuthName EnterPassword AuthType Basic <Limit GET> order deny,allow deny from all allow from .my.domain </Limit> This is designed to restrict access based on browser address, and not require any user authentication. The problem is that Apache will ask for user authentication, which fails because none has been setup. Apache does this because of the Auth* directives, which are unneccessary. The fix is to remove the Auth* lines. There are a number of things which can be done to tune the performance of the server. One quick and effective thing to try is to reduce the number of .htaccess files it tries to access on every request. Whenever Apache handles a request, it processes .htaccess files which determine access authorisation, and can set other options (e.g. AddType). It checks and processes .htaccess files in the same directory as the file it is serving, and also in all the parent directories. For instance, if you request the URL /docs/about.html and your document root is /usr/local/etc/httpd/htdocs, Apache tries to process .htaccess files in all these directories: / /usr /usr/local /usr/local/etc /usr/local/etc/httpd /usr/local/etc/httpd/htdocs /usr/local/etc/httpd/htdocs/docs Normally, there will be no .htaccess files above the document root, but Apache still needs to check the filesystem to make sure. This can be eliminated by using the trick that if the AllowOverride option is set to None, Apache doesn't bother checking for .htaccess files. So set AllowOverride to None for directory /, and turn AllowOverride back on for whatever settings are really needed for the directory /usr/local/etc/httpd/htdocs. For example, the following code in access.conf would speed up Apache: <Directory /> AllowOverride None </Directory> <Directory /usr/local/etc/httpd/htdocs> AllowOverride All </Directory> The second directory section turns on AllowOverrides, so that .htaccess files are processed again. The 'All' can be replaced with whatever level of configurability is wanted. If you have web documents in different directories besides the document root, you will need to turn on .htaccess file in them as well (if desired). For instance, if you are using UserDir to allow access to files in home directories, you will need to set a suitable AllowOverride (and possibly other restrictions) with something like: <Directory /home/*/public_html> AllowOverride FileInfo Indexes IncludesNOEXEC </Directory> Sending the parent Apache process a USR1 signal will make it close the current log files, and re-open them, without loosing any connections currently in progress. This should be used instead of a HUP signal in any log rotation script. The script should first move the current log files to new names (the logs are still open at this stage). Then it should send a USR1 signal to the parent Apache process. The parent will tell the child process to die when they have finished processing their current request, and will open the log files for newly created children (since the old files have been renamed, the opened files will be newly created). As the old children finish their current requests they will close their handle to the (old) log files, and exit. When all the children are dead you can safely process the old log files (for example, by compressing it). Since you cannot know for definite when the old children have all died, the best way to do this is to make your log rotation script sleep for a while after sending the USR1 signal. An alternative way to implement log rotation is to get Apache to send log messages to a program of your choice via a pipe. This program can then decide how and when to rotate the log files. A program which may be useful for doing this is available as cronolog (not part of Apache). Apache comes with an uses three different config files (the srm.conf, access.conf and httpd.conf files). However it treats them all identically. So all the configuration could take place in a single file - httpd.conf (which is the first one read). This file should include the directives AccessConfig /dev/null ResourceConfig /dev/null to prevent it complaining about the missing srm.conf and access.conf files. CGI programs always run as the same user that owns the Apache server process. This is set with the User directive in the config file, and is typically a normal user such as 'www', or the 'nobody' user. In most cases, this is fine, since CGI scripts should run with few privileges to limit any potential malicious damage to the system. However, in some cases it would be nice to be able to run CGI programs as other users. For example: On a virtual host system, with multiple customers, CGIs could run as the customer's user, to let them read and write to the customer's files. On other systems with multiple users, CGIs in home directories could run as that user. The ability to run CGI programs as other users is referred to as 'running setuid', after the Unix filesystem ability to run a program as another user. The biggest problem with having a setuid CGI facility on a web server is security. It has to be very careful to ensure that the program running setuid cannot be invoked to do malicious damage to the system. Having setuid programs on a system can be dangerous, particularly if you do not trust all the other users on the system (which would be the case with both the example above). The risk is that other users could run the setuid program manually (from the command line) and give it an environment or command arguments that make it perform undesired activities. The risks of setuid programs are well-known to Unix system administrators, but a lot of web administrators do not have so much experience of Unix or setuid security programming. The suEXEC program included with Apache provides one method of running CGI programs as other users. The are some files that should probably never be served up to the user: files called .htaccess, .htpasswd, *.pl, *~ and so on. This can be done by preventing access to these files using a <Files> section. For example <Files .htaccess> order allow,deny deny from all </Files> An easy way to add new capabilities to the server without too much programming is to use some sort of "parsed HTML". This is a souped-up version of server side includes which lets you use variables, conditionals, loops and so on. Like SSI, these scripts get parsed on the server so they work with all browsers. There are several implementations of HTML scripting now available: SSI (part of Apache); NeoScript (linked to apache by a module); PHP. New directives have been added to force all the files in a particular directory to be processed by a given handler, or to be returned with a particular type. To set a handler, use SetHandler, and to set a mime type use ForceType. Note that these directives force the given type or handler to be applied to all files in the section, irrespective of the usually extension mapping rules. For example, a download directory could use "ForceType applicaton/octet-stream" in a .htaccess file to make the browser save the files, rather than try and display them. Or, all files in a directory could be treated as CGI programs with <Directory /usr/local/etc/httpd/cgi-bin> SetHandler cgi-script </Directory> Quick Questions How can I get my server to listen to more than one IP address, or more than one port? Use the "Listen" directive. How can I create extra virtual hosts without using extra IP address? Use the "name-based virtual hosts" as specified in the HTTP/1.1 spec. While this does not work with all browsers at present, the directives ServerPath and ServerAlias can be used to make you site work gracefully for older browsers as well. Can I let users access protected areas 'anonymously' (like anonymous ftp)? Yes, Use the anonymous authentication module, mod_auth_anon How can I convert the requested URL into some other format? The rewrite module provides a powerful means for translating URLs into other URLs or filenames. How do I set up Apache to handle PUT (or DELETE) requests for my authoring program? Use "Script PUT cgi-script" to call a CGI program to implement the PUT request. Can I implement NCSA's 'Satisfy' function? Yes, this is now in 1.2. Why does Apache 'lock-up' when accessed from Netscape 2? This might be due to a bug in Netscape when using Keep-Alives. The work-around is to turn off keep-alives on the server. In Apache 1.2, use the following directive: BrowserMatch Mozilla/2 nokeepalive Using User Authentication Restrict your documents to people with a valid username and password. Using User Authentication There are two ways of restricting access to documents: either by the hostname of the browser being used, or by asking for a username and password. The former can be used to, for example, restrict documents to use within a company. However if the people who are allowed to access the documents are widely dispersed, or the server administrator needs to be able to control access on an individual basis, it is possible to require a username and password before being allowed access to a document. This is called user authentication. Setting up user authentication takes two steps: firstly, you create a file containing the usernames and passwords. Secondly, you tell the server what resources are to be protected and which users are allowed (after entering a valid password) to access them. A list of users and passwords needs to be created in a file. For security reasons, this file should not be under the document root. The examples here will assume you want to use a file call users in your server root at /usr/local/etc/httpd. The file will consist of a list of usernames and a password for each. The format is similar to the standard Unix password file, with the username and password being separated by a colon. However you cannot just type in the usernames and passwords because the passwords are stored in an encrypted format. The program htpasswd is used to add create a user file and to add or modify users. htpasswd is a C program that is supplied in the support directory of the Apache distribution. If it is not already compiled, you will to compile it first. Run make htpasswd in the support directory to compile it (you might need to modify the Makefile first, since any configuration you did when compiling the server itself is not available to this makefile). After compilation, you can either leave the htpasswd binary where it is, or move it to a directory on your path (e.g. /usr/local/bin). In the former case, you will need to remember to give the full pathname to run it. The examples here will assume that it is installed somewhere on your path. To create a new user file and add the username "martin" with the password "hampster" to the file /usr/local/etc/httpd/users: htpasswd -c /usr/local/etc/httpd/users martin The -c argument tells htpasswd to create new users file. When you run this command, you will be prompted to enter a password for martin, and confirm it by entering it again. Other users can be added to the existing file in the same way, except that the -c argument is not needed. The same command can also be used to modify the password of an existing user. After adding a few users, the /usr/local/etc/httpd/users file might look like this: martin:WrU808BHQai36 jane:iABCQFQs40E8M art:FAdHN3W753sSU The first field is the username, and the second field is the encrypted password. To get the server to use the usernames and passwords in this file, you need to configure a realm. This is a section of your site that is to be restricted to some or all of the users listed in this file. This is typically done on a per-directory basis, with a directory (and all its subdirectories) being protected (Apache 1.2 and later also let you protect individual files). The directives to create the protected area can be placed in a .htaccess file in the directory concerned, or in a <Directory> section in the access.conf file. To allow a directory to be restricted within a .htaccess file, you first need to ensure that the access.conf file allows user authentication to be set up in a .htaccess file. This is controlled by the AuthConfig override. The access.conf file should include AllowOverride AuthConfig to allow the authentication directives to be used in a .htaccess file. To restrict a directory to any user listed in the users file just created, you should create a .htaccess file containing: AuthName "restricted stuff" AuthType Basic AuthUserFile /usr/local/etc/httpd/users require valid-user The first directive, AuthName, specifies a realm name for this protection. Once a user has entered a valid username and password, any other resources within the same realm name can be accessed with the same username and password. This can be used to create two areas which share the same username and password. The AuthType directive tells the server what protocol is to be used for authentication. At the moment, Basic is the only method available. However a new method, Digest, is about to be standardised, and once browsers start to implement it, digest authentication will provide more security than the basic authentication. AuthUserFile tells the server the location of the user file created by htpasswd. A similar directive, AuthGroupFile, can be used to tell the server the location of a groups file (see below). These four directives have between them tell the server where to find the usernames and passwords and what authentication protocol to use. The server now knows that this resource is restricted to valid users. The final stage is to tell the server which usernames from the file are valid for particular access methods. This is done with the require directive. In this example, the argument valid-user tells the server that any username in the users file can be used. But it could be configured to allow only certain users in: require user martin jane would only allow users martin and jane access (after they entered a correct password). If user art (or any other user) tried to access this directory - even with the correct password - they would be denied. This is useful to restrict different areas of your server to different people with the same users file. If a user is allowed to access the different areas, they only have to remember a single password. Note that if the realm name differs in the different areas, the user will have to re-enter their password. If you want to allow only selected users from the users file in to a particular area, you can list all the allowed usernames on the require line. However this means you are building username information into your .htaccess files, and might not been convenient if there are a lot of users, and . Fortunately there is a way round this, using a group file. This operates in a similar way to standard Unix groups: any particular user can be a member of any number of groups. You can then use the require line to restrict users to one or more particular groups. For example, you could create a group called staff containing users who are allowed to access internal pages. To restrict access to just users in the staff group, you would use require group staff Multiple groups can be listed, and require user can also be given, in which case any user in any of the listed groups, or any user listed explicitly, can access the resource. For example require group staff admin require user adminuser which would allow any user in group staff or group admin, or the user adminuser, to access this resource after entering a valid password. A group file consists of lines giving a group name followed by a space-separated list of users in that group. For example: staff:martin jane admin:art adminuser The AuthGroupFile directive is used to tell the server the location of the group file. Note that the maximum line length within the group file in about 8000 characters (actually 8kB). If you have more users in a group than will fit within that line length, you can have more than one line with the same group name within the file. Using htpasswd to create a text list of users, and maintaining a list of groups in a plain text file is relatively easy. However if the number of users becomes large, the server has a lot of processing to do to find a user's group and password details. This processing has to be done for every request inside the protected area (even though the user only enters their password once, the server has to re-authenticate them on every request). This can be slow with a lot of users, and adds to the server load. Much faster access is possible using DBM format files. This allows the server to do a very quick lookup of names, without having to read through a large text file. However managing DBM files is more complex. Apache Week will cover the use of DBM authentication in a future issue. While Apache by default can only access user details in plain text files, various add-on modules are available to allow user details to be stored in databases. Besides DBM format (available with the mod_auth_dbm module), user and group lists can be stored in DB format files (with mod_auth_db). Or full databases can be used, such as mSQL (with mod_auth_msql), Postgres95 (mod_auth_pg95) or any DBI-compatible database (mod_auth_dbi). It is also possible to have an arbitrary external program check whether the given username and password is valid (this could be used to write an interface to check against any other database or authentication service). Modules are also available to check against the system password file, or to use a Kerberos system. See the feature on Adding Modules for more information. In the example .htaccess file above, the require directory is not given inside a <Limit> section. This is valid in Apache, and means it applies to all request methods. In other servers and most example .htaccess files, the require directive is given inside a <Limit> section, such as this: <Limit GET POST PUT> require valid-user </Limit> In Apache it is better to omit the <Limit> and </Limit> lines, to ensure that the protection applies to all methods. However, this format can be used to limit particular methods. For example, to limit just the POST method, use AuthName "restrict posting" AuthType Basic AuthUserFile /usr/local/etc/httpd/users <Limit POST> require group staff </Limit> Now only members of the group staff will be allowed to POST. Other users (unauthenticated) can use other methods, such as GET. This could be used to allow a CGI program o be accessed by anyone, but only authorised uses can POST information to it. It is possible to use both username and hostname restrictions at the same time. Normally Apache will require that both restrictions are satisfied, that is, that the user comes from an allowed host or domain name and that they supply a valid username and password. However the Satisfy any directive can be used in the .htaccess file or <Directory>, <Location> or <Files>, section. When this directive is given, anyone coming from the allowed domains will be given access without having to enter a username and password. All other users (from the "denied" domains) will be prompted for a username and password. The method used in HTTP for user authetication is quite simple. Since HTTP is a stateless protocol - that is, the server does not remember any information about a request once it has finished - the browser needs to resend the username and password on each request. Here is how it works. On the first access to an authenticated resource, the server will return a 401 status ("Unauthorized") and include a WWW-Authenticate response header. This will contain the authentication scheme to use (at the moment, only Basic is allowed) and the realm name. The browser should then ask the user to enter a username and password. It then requests the same resource again, this time including a Authorization header which contains the scheme name ("Basic") and the username and password entered. The server checks the username and password, and if they are valid, returns the page. If the password is not valid for that user, or the user is not allowed access because they are not listed on a require user line or in a suitable group, the server returns a 401 status as before. The browser can then ask the user to retry their username and password. Assuming the username and password was valid, the user might next request another resource which is protected. In this case, the server would respond with a 401 status, and the browser could send the request again with the user and password details. However this would be slow, so instead the browser sends the Authorization header on subsequent requests. Note that the browser must ensure that it only sends the username and password to further requests on the same server (it would be insecure to send those details if the user moved onto a different server). The browser needs to remember the username and password entered, so it can send them with future requests from the same server. Note that this can cause problems when testing authentication, since the browser remembers the first username and password that works. It can be difficult to force the browser to ask for a new username and password. While authentication does allow resources to be restricted to particular users, there are potential security issues. Some of these are: Care must be taken to ensure that the resource is restricted against all methods. Use of <Limit GET>, for instance, leaves POST and other request methods unprotected. The username and password are stored in a plain text file. While the password is encrypted, it is not completely safe against decryption, so the file should not be accessible to other users on the system. More importantly, it should not be placed under the document root where users from other sites could access it. The username and password is as secure as any username/password system, in that end-users should not tells others their password, or write it down, or make it easily guessable. The Basic authentication scheme transmits passwords across the Internet unencrypted, so they could be intercepted. The Digest method, see below, is intended to address this issue. The Digest Authentication scheme will make the sending of passwords across the Internet more secure. It effectively encrypts the password before it is sent such that the server can decrypt it. It works exactly the same as Basic authentication as far as the end-user and server administrator is concerned. The use of Digest authentication will depend on whether browser authors write it into their products. Apache can already do Digest authentication, when compiled with the mod_digest module (supplied with the Apache distribution). For more information about how user authentication works on the Internet, see the HTTP/1.0 and HTTP/1.1 documents, available from the Apache Week links page. Also available there is a link to the draft Digest Authentication specification. For basic information about setting up user authentication, see the NCSA Tutorial (most of which also applies to Apache). For modules which allow usernames, groups and passwords to be stored in database format files, or databases themselves, see this Apache Week feature on Adding Modules. Using Virtual Hosts How to obtain and set up virtual hosts. Feature: Using Virtual Hosts One of the most important facilities in Apache is its ability to run 'Virtual Hosts'. This is now the essential way to run multiple web services - each with different host names and URLs - which appear to be completely separate sites. This is widely used by ISPs, hosting sites and content providers who need to manage multiple sites but do not want to buy a new machine for each one. In this issues we explain how to go about setting up a virtual host on your machine, what you need to do to get the hostname working, and how to configure Apache. There are two types of virtual hosts: IP-based and non-IP-based. The former is where each virtual host has its own IP address. You will need a new IP address for each virtual host you want to set up, either from your existing allocation or by obtaining more from your service provider. Once you have extra IP addresses, you tell your machine to handle them. On some operating systems, you can give a single ethernet interface multiple addresses (typically with an ifconfig alias command). On other systems you will have to have a different physical interface for each IP address (typically by buying extra ethernet cards). IP addresses are a resource that costs money and are increasingly difficult to get hold of, so modern browsers can now also use 'non-IP' virtual hosts. This lets you use the same IP address for multiple host names. When the server receives an incoming Web connection it does not know the hostname what was used in the URL, however the new HTTP/1.1 specification adds a facility where the browser must tell the server the hostname it is using, on the Host: header. If an older browser connects to a non-IP virtual host, it will not send the Host: header, so the server will have to respond with a list of possible virtual hosts. Apache provides some help for configuring a site for both old and new browsers. Having selected an IP address, the next stage is to update the DNS so that browsers can convert the hostname into the right address. The DNS is the system that every machine connected to the Internet uses to find the IP address of host names. If your hostname is not in the DNS, no-one will be able to connect to your server (except by the unfriendly IP address). If the virtual host name you are going to use is under your existing domain, you can just add the record into your own DNS server. If the virtual host name is in someone else's domain, you will need to get them to add it to their DNS server files. In some cases, you will want to use a domain not yet used on the internet, in which case you will have to apply for the domain name from the InterNIC and set up the primary and secondary DNS servers for it, before adding the entry for your virtual host. In any of these cases, the entry you need to add to the DNS is an address record (an A record) pointing at the appropriate IP address. For example, say you want the domain www.my-dom.com to access your host with IP address 10.1.2.3: you will need to add the following line to the DNS zone file for my-dom.com: www A 10.1.2.3 Now users can enter http://www.my-dom.com/ as a URL in their browsers and get to your web server. However it will return the same information as if the machine's original hostname had been used. So the final stage is to tell Apache how to respond differently to the different addresses. Configuring Apache for virtual hosts is a two stage process. Firstly, it needs to be told which IP addresses (and ports) to listen to for incoming web connections. By default Apache listens to port 80 on all IP addresses of the local machine, and this is often sufficient. If you have a more complex requirement, such as listening on various port numbers, or only to specific IP addresses, the BindAddress or Listen directives can be used. Secondly, having accepted an incoming web connection, the server needs to be configured to handle the request differently depending on what virtual host it was addressed to. This usually involves configuring Apache to use a different DocumentRoot. If you are happy for Apache to listen to all local IP addresses on the port specified by the Port directive, you can skip this section. However there are some cases where you will want to use the directives explained here: If you have many IP addresses on the machine but only want to run a web server on some of them If one or more of your virtual hosts is on a different port If you want to run multiple copies of the Apache server serving different virtual hosts There are two ways of telling Apache what addresses and ports to listen two: either you use the BindAddress directive to specify a single address or port, or you use the Listen directive to any number of specific addresses or ports. For example, if you run your main server on IP address 10.1.2.3 port 80, and a virtual host on IP 10.1.2.4 port 8000, you would use: Listen 10.1.2.3:80 Listen 10.1.2.4:8000 Listen and BindAddress are documented on the Apache site. Having got Apache to listen to the appropriate IP addresses and ports, the final stage is to configure the server to behave differently for requests on each of the different addresses. This is done using <VirtualHost> sections in the configuration files, normally in httpd.conf. A typical (but minimal) virtual host configuration looks like this: <VirtualHost 10.1.2.3> DocumentRoot /www/vhost1 ServerName www.my-dom.com </VirtualHost> This should be placed in the httpd.conf file. You would replace the text '10.1.2.3' with one of your virtual host IP addresses. If you want to specify a port as well, follow the IP address with a colon and the port number (eg '10.1.2.4:8000'). If omitted, the port defaults to 80. If no <VirtualHost> sections are given in the configuration files, Apache will treat requests from the different addresses and ports identically. In terms of setting up virtual hosts, we call the default behaviour the 'main server' configuration. Unless overridden by <VirtualHost> sections, the main server behaviour will be inherited by all the virtual hosts. When configuring virtual hosts, you need to decide what changes need to be made in each of the virtual host configurations. Any directives inside a <VirtualHost> section apply to just that virtual host. The directives either override the configuration give in the main server, or supplement it, depending on the directive. For example, the DocumentRoot directive in a <VirtualHost> section overrides the main server's DocumentRoot, while AddType supplements the main server's mime types. Now, when a request arrives, Apache uses the IP address and port it arrived on to find a matching virtual host configuration. If no virtual host matches the address and port, it is handled by the main server configuration. If it does match a virtual host address, Apache will use the configuration of that virtual server to handle the request. For the example above, the server configuration used will be the same as the main server, except that the DocumentRoot will be /www/vhost1, and the ServerName will by www.my-dom.com. Directives commonly set in <VirtualHost> sections are DocumentRoot, ServerName, ErrorLog and TransferLog. Directives that deal with handling requests and resources are valid inside <VirtualHost> sections. However some directives are not valid inside <VirtualHost> sections, including BindAddress, StartSevers, Listen, Group and User. You can have as many <VirtualHost> sections as you want. You can choose to leave one or more of your virtual hosts being handled by the main server, or have a <VirtualHost> for every available address and port, and leave the main server with no requests to handle. Non-IP virtual hosts are configured in a very similar way. The IP address that the requests will arrive on is given in the <VirtualHost> directive, and the host name is put in the ServerName directive. The difference is that there will (usually) be more than one <VirtualHost> section handling the same IP address. In order for Apache to know whether a request arriving on a particular IP address is supposed to be a name-based requests, the NameVirtualHost directive is used to tell Apache the IP addresses for name-based requests. A virtual host can handle more than one non-IP hostname by using the ServerAlias directive, in addition to the ServerName. Apache and the Year 2000 How will Apache and the web in general cope with the year 2000? The Year 2000 Problem The year 2000 is predicted to bring chaos to software which is unable to handle dates beyond 1999. The question is what effect the change of century will have on the Internet, Web and Apache in particular. This feature shows what the risks are. First published in Apache Week issue 56 (23 June 1997). The theory of the year 2000 problem is that many older programs use only two digits for the date, such as "97" or "06". This might be part of the internal storage, input fields, output display, or network communcation protocol. If a program does use a two digit date, it might either not accept year 2000 dates such as "02", or it might make incorrect comparisons (thinking that 02 is earlier than 97, because it assumes that 02 is 1902). There are some areas where two digit years are widely used - for example, on credit card expiry dates - and the software which handles these dates will have to be capable of knowing that smaller values for the date are really in the 21st century. There are three things which can affect how Apache treats year 2000 issues: Apache code itself The HTTP and other protocols that Apache implements The underlying operating system The Apache code internally never stores years as two digits - it processes dates and times as standard Unix time epochs (the number of seconds since 1st January 1970). When it outputs the year (e.g. to the log file) it writes years as four digits. The HTTP protocol may be more troublesome. It allow for three different date formats in requests and responses, one of which uses a two-digit year. Dates are used on every response, in fields such as "Date", "Last-Modified" and "Expires", and requests can contain dates in the "If-Modified-Since" and similar fields. The date formats listed in HTTP/1.1 and HTTP/1.0 are: Sun, 06 Nov 1994 08:49:37 GMT (defined in RFC 822 as updated by RFC 1123) Sunday, 06-Nov-94 08:49:37 GMT (defined in RFC 850 and RFC 1036) Sun Nov 6 08:49:37 1994 (as defined in ANSI C's asctime() format) The first format is the only one that HTTP/1.1 servers are allowed to generate, and Apache uses it. This format includes a four-digit date. However to be compatible with older browsers and servers, Apache recognizes the other formats. The main problem will be older applications which generate RFC850 format dates - these only have a two digit date field. RFC850 format was used in early web servers and browsers, and the replacement with RFC1132 format in in early 1990's was not fully documented until HTTP/1.0 was published in 1996. However if Apache sees this format and the year is before 1970, it assumes that the first two digits of the four digit year are "20" rather than "19". The final area which affects Apache's ability to handle dates is the underlying operating system. If the OS has problems with dates past year 2000, Apache will as well. Most Unix systems store dates internally as 32 bit integers which contain the number of seconds since 1st January 1970. This allows dates up to the year 2038 to be stored. For dates past 2038, the OS will have to be updated to store dates in larger fields (for example, as a 64 bit value). There may also be problems before 2038 with OS calls which accept or return year numbers. For example, many date functions use a structure called tm which contains a field tm_year. This field holds the number of years since 1900, so for example the year 2002 will be stored as 102. This should not be a problem, provided that the OS and applications do not assume that the tm_year value is always a two-digit year between 1900 and 1999. All modern operating systems should be ok. Binary file (standard input) matches Apache 720 Apache Week Reviews http://www.apacheweek.com/img/apacheweek_channel.gif http://www.apacheweek.com/ 88 31 Apache Week Reviews