Table of Contents Previous | Next |
Apache Server Survival Guide |
This chapter covers a powerful feature of the Apache server: virtual hosts. Virtual hosting allows a single instance of the Apache Web server to run several Web sites, each accessible by its own domain name. Apache was one of the first HTTP servers to provide support for building virtual sites. While servers from NCSA and others also provide virtual-site support, Apache provides better performance and has more features than the others.
At first glance, the main advantage of a virtual site seems mostly cosmetic: It allows many Web sites to be addressed by their own domain name on a single shared machine. However, this cosmetic advantage has several positive effects on the administration of your site and how others use it. Virtual hosts are frequently created with the intent to do the following:
Because most Web sites don't generate enough traffic to exhaust the resources of a single machine, it is desirable from an administrative point of view to allow a single server to masquerade and behave as several different machines. Instead of dedicating hardware and monetary resources to each site hosted, a few server configuration commands produce the same result: a virtual site. Because the expense required to set up a Web server can be shared between several sites, the time required to configure and manage Web sites is reduced dramatically.
Virtual hosts have the positive side effect of making Web pages portable. When a site is virtual, its easy to move it to a different Web server in the same network or somewhere else. It becomes a matter of transferring the site's HTML pages to a new machine and modifying the site's Domain Name System (DNS) information to point to the new server. To accommodate DNS update lapses, simply create a redirection on the old server. This allows traffic to continue flowing without a lapse, which is an important issue with sites that thrive on traffic as a way of generating business.
Historically, when you wanted a site hosted using your domain, your only viable option was to purchase (or lease) a computer and have it configured as a Web server. Incurred with these costs was the expense of managing the server. These costs, easily thousands of dollars, motivated Internet Service Provider (ISPs) to consider additional ways of supporting multiple Web sites in one host, which resulted in a few early solutions, such as the "home page" approach.
The home page approach was one early way to support multiple sites on a single Web server. This approach differentiates each site by adding extra address information, such as a user's home paqe.
The home page approach creates addresses that look like this:
http://www.isp.dom/~name
or, worse yet, uniform resource locators (URLs) that list some path to a directory on the server tree:
http://www.isp.dom/name
The home page approach is an appropriate way to serve local users' pages. But when it comes to serving corporate information that is going to be accessed more frequently and by a large number of viewers, this solution creates ugly addresses that are hard to remember, long to type, prone to user error, and not very professional looking.
The terms virtual host, virtual site, and multihomed server are generally used interchangeably to describe a server that supports multiple domain names. To make matters easier to understand, I find it best to limit the scope of these terms to their appropriate meanings: To create a virtual site, you need to configure a virtual host; for the virtual host to work, you may have to create a multihomed server; and obviously, there's a difference in tone. How you build a multihomed server depends on which version of Apache you are using and some other technical issues. "So what's the difference?" you ask. Semantics; but clarity here will help in describing the process.
A multihomed computer is one that can answer to multiple IP addresses. A computer that is accessible by multiple names (such as mailhost.foo.com and www.foo.com) that resolve to the same IP address is not a multihomed computer.
Aliasing a capability provided by DNS in a CNAME (canonical name) resource record or by listing multiple machine names on the /etc/hosts file after an IP address, is just a convenience for people accessing a networked resource. In general, people have a hard time remembering names, and some names such as www or ftp are typically standardized for machines that host services with the same name. Users only need to remember the domain name when those resources use traditional names, such as www.apple.com, mailhost.apple.com, or ftp.apple.com. A multihomed computer needs more than that. It must answer to two or more different IP addresses such as 1.2.3.4. IP addresses are assigned by your Internet network provider when you sign up with them.
A virtual site is a Web site that resides on a server with other Web sites. Each Web site is accessible by its own name and shares all the hardware resources with other virtual sites. Although all requests are answered by the same HTTP server process, different home pages are returned for each site depending on the name or IP address used to access the information.
Another networking issue that you will have to address before you can multihome is the DNS; DNS provides the machine name to the IP translation service. While computers like to address each other using numbers, people prefer using names. DNS translates names into numbers and numbers into names. Chances are that if you have a connection to the Internet, you are running a name server. If you are not, someone else is running it for you. If you are not running your own DNS, you'll need to coordinate with your network administrator to implement any addition or change to the DNS.
A virtual site is configured by the <VirtualHost> directive, which allows you to override several configuration directives based on the site's name or IP address. This allows you to specify a different DocumentRoot directory for each virtual site, so a single instance of the HTTP server can return different Web sites for each of the virtual sites it hosts.
Right from the beginning, the Apache server provided the necessary infrastructure to allow server-side support for virtual hosts. The only requirement was that the server needed to answer to different IP addresses. This technique, which still works reliably, requires the creation of a multihomed server. Multihomed servers have a few downsides, including a built-in limit: The number of virtual hosts that can be supported depends on how many IP addresses you have available. Each virtual host requires its own unique IP address. This requirement made it easy for an ISP to swallow up scarce IP addresses with only a few customers.
In the current incarnation of the server, the Apache group introduced a nonIP-intensive solution. Using a new extension to the HTTP protocol, which requires the browser to report the name of the server being accessed (the http://www.apple.com portion of the URL) along with the resource to retrieve, the Web server can determine who the request is for. Before, an URL such as http://www.apple.com/index.html would have the www.apple.com portion removed from the request, making it impossible for the server to determine the name of the host as known by the user. This new mechanism effectively added the infrastructure from which Apache could provide virtual site support without the need to create a multihomed computer. NonIP-intensive virtual hosts make it possible to host an almost infinite number of virtual hosts easily, without the need to create a multihomed server.
The <VirtualHost> and </VirtualHost> section tags are used to group server configuration directives that only apply to a particular virtual host. Any directive that is allowed in a virtual host context can be used. When the server receives a request directed toward a virtual host, the configuration directives enclosed between the <VirtualHost> and </VirtualHost> section tags override directives found elsewhere. <VirtualHost> sections are found in the httpd.conf file.
You specify the host to which a <VirtualHost> section applies by specifying the IP of the virtual host or a fully qualified domain name. A minimal virtual host only needs to override the DocumentRoot directive:
<VirtualHost www.company.com> DocumentRoot /www/docs/company </VirtualHost>
The <VirtualHost> declaration> for nonIP-intensive virtual hosts, based on the HTTP/1.1 feature previously described , is almost identical to the original IP-intensive version, except that you only supply a machine name that resolves to the same IP address as the real host. The server decides what to serve based on the machine name used to access the resource.
This approach only works with browsers that comply to the HTTP/1.1 specification, which is still a draft.. Hopefully this will become part of the specification and will be more widely implemented.
For IP-intensive virtual hosts, each host IP identified by the <VirtualHost> directive must be unique and, in order for it to work, the computer must be configured to accept multiple IP addresses. This configuration, unlike that of the new non-IP virtual hosts, works regardless of the browser.
The <VirtualHost> section allows configuration of a virtual host as if it were a simple single-homed server; it allows you to implement customized server configuration settings on a per-site basis. Each site can tailor the server to its own needs.
A minimal virtual host configuration only needs to make use of the DocumentRoot directive. You can further customize the virtual server with ServerName directives, which will set the name the server will use when responding to requests. You are able to include many of the directives that can be specified in the server configuration file (httpd.conf) or in the resource configuration file (srm.conf). This gives total control over the configuration and behavior of the virtual server. You can even enable logging of the transaction to different log files. The following configuration presents a more complete and typical example:
#### Virtual machine configuration #Bind Address (Address identifying this virtual host BindAddress * <VirtualHost> # #Server name returned to users connected to this host ServerName www.company.com # # #WebMaster for this host #IF YOU USE A VIRTUAL DOMAIN FOR MAIL, REMEMBER TO #CONFIGURE SENDMAIL TO ACCEPT THIS DOMAIN IN YOUR SENDMAIL.CF!!! # ServerAdmin webmaster@company.com # #The document root for this virtual host DocumentRoot /usr/local/etc/htdocs/company.htmld # #The location of the error log for the virtual host ErrorLog /usr/local/etc/apache/logs/company/error_log # #The location of the access log for the virtual host TransferLog /usr/local/etc/apache/logs/company/access_log ... </VirtualHost> #### End Virtual Machine configuration
Configuration using> the <VirtualHost> directive is, as you can see, pretty straightforward. However, there are some mechanics that must be resolved prior to getting a server to multihome (namely, how to configure the server so that it answers to different IP numbers).
The BindAddress directive allows you to further refine where the server will listen for connections. The default behavior for the server is to listen for requests on all the IP addresses supported on the machine (BindAddress *) at the port specified by the Port directive. However, this may not be what you want. You may want to limit which addresses the server will serve. The BindAddress directive allows you to specify this. BindAddress allows you to specify the addresses to listen for (either the IP address or fully qualified hostname). These are examples for the various possibilities:
BindAddress 1.2.3.4
Or this specifies the fully qualified name of the machine:
BindAddress www.foo.com
To have the server listen to all IP addresses accepted by the host, use BindAddress *.
BindAddress * is also the default value if BindAddress is not specified.
Note that the server will only bind to requests on the port specified by the Port directive, which defaults to 80the standard TCP port for HTTP servers .
The Listen directive is new for Apache 1.1. It has similar functionality to the BindAddress. This directive allows you to specify additional ports that the server will listen in for requests. This feature is useful in implementing internal servers that contain information you dont want others to see by default when they connect to your address. For example, you may want to have special information about your network available through the Web. Instead of running a separate server process to serve this information, you can have Apache listen in on multiple ports, such as port 8080.
This feature is used in the upcoming version of Strongholda commercial secure version of Apache that implements both an SSL and standard server using one process. Instead of running two server processes (one for the secure server and another for the nonsecure one), you can have one process respond appropriately depending on the port used for the connection.
BindAddress and Listen dont implement virtual hosts. They simply tell the server where to listen for requests. BindAddress and Listen are used to build a list of addresses and ports that the server will listen for. If Apache is not configured to listen for a particular request, your virtual host wont be accessible.
This section does not apply if you are using Apache 1.1 nonIP-intensive virtual hosts . If you are using Apache 1.1 or better, you may want to skip this section because Apache 1.1 has a much easier way of developing a virtual host that doesn't require multihoming. (Refer to "Apache Non-IP Virtual Hosts" later in this chapter.)
For a multihomed Web server to work, the host computer needs to be able to respond to multiple names or IP addresses. Traditionally, multihomed hosts were the only computers on a network that did this. Because they bridged two separate networks, they had the need to answer to two different addresses. To do this, these computers were usually fitted with two networking cards that allowed the computer to have two different network addresses. Each network knew the multihomed host by a different name. On the Web, multihomed computers don't connect two networks, but they do need to answer to multiple names.
Most modern unices (plural for UNIX) provide the software tools needed to allow you to specify multiple IP addresses to a single interface. If your system does not include a version of ifconfig that supports the alias option, don't despair. Table 4.1 lists the major systems and the how you can achieve multiple IP support.
Operating System | Multiple IP Support |
AIX 4.1 | Built-in: ifconfig alias |
BSDI | Built-in: ifconfig alias |
Digital OSF/1 | Built-in: ifconfig alias |
Digital UNIX | Built-in: ifconfig alias |
FreeBSD | Built-in: ifconfig alias |
HPUX 10.x | Needs ifconfig alias patch |
HPUX 9.x | Needs virtual interface (VIF) patch |
IRIX 5.3 | Needs SGI's ifconfig alias patch |
Linux | Needs ifconfig alias patch |
NeXTSTEP 3.3 | Use PPP Interfaces to provide virtual interfaces |
Solaris 2.3 and better | Built-in: ifconfig logical units |
SunOS 4.x | Needs VIF patch |
SunOS 5.3 and better | Built-in: ifconfig logical units |
Ultrix | Needs VIF patch |
If your system is not listed in Table 4.1, you might want to try the PPP approach described later in this chapter. Alternatively, you can opt for Apache's 1.1 non-IP-intensive virtual hosts, which remove the need to create a multihomed server altogether.
You can specify multiple IP addresses to a single interface by using the following:
The alias option of ifconfig is the easiest way to implement a virtual host. The ifconfig command is used to set or display configuration values of a network interface. Not all vendors support the alias option.
You can make your Ethernet interface answer to an additional IP by issuing the following command:
ifconfig interface IP alias
Depending on your operating system, Interface can be en0 or le0, so you'll need to check your man pages for ifconfig to determine the appropriate name for your Ethernet interface. Set the IP parameter to a valid DNS address you assigned to the virtual host. That's it! The interface will now accept packets destined for the virtual site.
Some operating systems in the preceding list don't offer the alias option by default, but patches are available from the vendor or the Net. Each patch distribution includes information on how to apply it, as well as any other specifics on its use. I have included some of these patches on the CD, but you may want to check for updates on them at the following locations:
Each patch distribution will include information on how to apply it and any other specifics on its use. I have included some of these patches on the CD, but you may want to check for updates on them at the following locations:
Solaris supports virtual interfaces in the form of logical units. You can have up to 255 logical units per network interface. To add a new virtual host to the primary interface, just type the following:
ifconfig le0:logicalunit 204.95.222.200 up
Replace logicalunit with a number from 1 to 255. You can check currently used interfaces by issuing netstat -ia.
The VIF patch was one of the first solutions available on the Internet to enable a single interface to handle multiple IP addresses. The VIF patch was originally developed by John Ioannidis in 1991 as part of a project related to his Ph.D. thesis at Columbia University. The patch was pared down in 1994 to allow a single interface to handle multiple IP addresses under SunOS. The patch was further enhanced for SunOS 4.1.x by Chuck Smoko and Bob Baggerman. Two separate Ultrix 4.3a ports have been developed, one by John Hascall and the other by Phil Brandenberger .
The VIF patch creates a virtual interface that, to the networking software, looks like a hardware interface. The VIF interface allows you to assign an address as well as to configure network interface parameters with ifconfig.
After the virtual interface has been configured, you may have to add routing entries to the routing table using the route command. However, some implementations do the right thing without this step. For more information, refer to the instructions for your operating system. The VIF allows requests to any of the host's addresses, real or virtual, to be answered by the real interface. Packets leaving the host use the real or virtual IP assigned to match where they are coming from.
Detailed instructions on how to build the patch and incorporate it into the kernel, and how to use VIF for each specific architecture, are included in the source. You can obtain the latest version of the VIF patch from these repositories:
If your UNIX vendor didn't provide you with the sources to the kernel, then PPP software (used for establishing dial-up networking connections) may be your only viable alternative to create a multihomed host. NEXTSTEP, the operating system I use at accessLINK, doesn't come with the source to its Mach kernel, so I developed this workaround. The result is almost an equivalent to the VIF patch, and its effects are easily removed from a system. This solution should be very portable across various UNIX boxes.
PPP provides networking interfaces, which you can configure much in the same way as VIF interfaces, with these advantages:
The PPP in your distribution may already have one or two PPP interfaces available named ppp0 or ppp1. You can check the number of interfaces your vendor provides by executing the following command:
% netstat - ia
PPP interfaces will be listed as ppp0 though pppX. If you only need one or two virtual machines, you may be able to use your preinstalled version of ppp.
If you need additional interfaces, you will have to obtain the source code to the basic PPP software plus the system-specific changes for your operating system. You will also have to edit one of the header files to add the additional interfaces.
You can find the latest release to the source code for the PPP software at ftp://dcssoft.anu.edu.au/pub/ppp and ftp://ftp.merit.edu/pub/ppp.
After you configure the software for your particular machine, modify the number of interfaces to match your needs. In the pppd.h file, found on the pppd distribution, find the line that contains the following definition:
#define NUM_PPP 2
The number to the right of NUM_PPP (in this case, 2)defines how many PPP interfaces are to be compiled into the software. Change this number to the number of interfaces you think you'll be needing plus a few more. I changed mine to add 25 interfaces.
In the NeXTSTEP-specific distribution of the PPP software, which is based on ppp-2.2, the defined NUM_PPP may be found on additional files. Do a grep to find them and change all instances to match the number of interfaces you want.
You should be able to build a customized version of your kernel or a kernel loader that adds supports for PPP. Read the installation instructions for your particular system. Once this is installed, you are ready to configure the interfaces.
By default, PPP interfaces are named ppp0, ppp1, ppp2, and so on. It is a good idea to set up the interfaces right after the machine boots. This way you can be confident that the Web server will be accepting requests for all sites at boot time.
I suggest putting your configuration entries into your rc.local or, if you have many entries, creating your own rc.ppp.
To assign an address to an interface, I use the following commands. Your version of UNIX may provide additional or different options to these commands. Check the main page of your system prior to issuing any of these commands:
ifconfig ppp0 204.95.222.200 up netmask 255.255.255.0 route add host 1.2.3.4 1.2.3.4 0 arp -s 1.2.3.4 00:00:00:00:00 pub
In the previous example, I added a virtual host at address 1.2.3.4. Simply replace 1.2.3.4 with the address of your machine. Replace 255.255.255.0 with the netmask used in your network. Replace 00:00:00:00:00 with the Ethernet address of your networking card.
Make sure you update your DNS to include the virtual machine. Otherwise, a client wont be able to find the new virtual address!
The previous commands do the following:
You can verify that the interfaces provided by PPP were incorporated into the kernel by issuing this:
/usr/etc/netstat -ia
You should be able to see a number of pppn interface names listed and be able to test whether the interfaces and the DNS are working properly. You will need to test from a machine other than the Web server. A limitation of PPP virtual hosts is that a virtual host is unreachable from the server that hosts it. However, that is not really a problem because your server is dedicated, right? To test that the virtual host is running, simply ping the virtual host. You should be get a response like the following:
hydrogen:4# /etc/ping 1.2.3.4 PING helium: 56 data bytes 64 bytes from 204.95.222.2: icmp_seq=0. time=4. ms 64 bytes from 204.95.222.2: icmp_seq=1. time=2. ms 64 bytes from 204.95.222.2: icmp_seq=2. time=2. ms 64 bytes from 204.95.222.2: icmp_seq=3. time=2. ms 64 bytes from 204.95.222.2: icmp_seq=4. time=2. ms 64 bytes from 204.95.222.2: icmp_seq=5. time=2. ms
Replace 1.2.3.4 with the virtual IP address that you assigned earlier.
Starting with version 1.1, Apache added support for virtual hosts without the need for additional IP addresses. This greatly simplifies the management process of the server since it removes the hardest configuration problem from the list: how to get a machine to answer to multiple IP addresses.
Apache's implementation of non-IP virtual hosts depends on the client browser to provide information about the server being accessed. At the time of this writing, many browsers, including Netscape 2.0 or better and Microsoft's Internet Explorer 2.0 and better, support this feature.
The great advantage to nonIP-intensive virtual hosts is that now there are no limits to the number of virtual sites you can host. Before, the number of virtual hosts you could support was dependent on how many IP addresses you had available for hosts. Now, a single IP address can support an unlimited number of sites without having to rely on software, such as PPP or kernel patches, to get the machine to multihome.
The new modifications to the <VirtualHost> section look almost identical to the old version:
<VirtualHost www.company.com> ServerName www.company.com ServerAlias company.com *.company.com DocumentRoot /usr/local/htdocs/company </VirtualHost>
Just add as many <VirtualHost> sections (one per virtual site as you need) to your httpd.conf configuration file and restart the server.
Additional directives can still be placed in the section. All you need to do to get the virtual server to work is add a DNS entry for www.company.com that points to your server's real address. Unlike IP-intensive virtual hosts, the IP of the virtual sites is the same as that of the real host.
A new directive, ServerAlias, allows you to reference a virtual host section configuration by other names. You must, of course, have DNS entries for each alias that point to the same IP address for this feature to work.
Client browsers that do not support the non-IP virtual hosts will get the server's real main page. A fully qualified path link to your server will let the client access your materials. If your link points to the same URL, you'll create an endless loop. With careful use of the new ServerPath directive, you can redirect traffic to the right place.
The ServerPath directive should be placed in the VirtualHost section. The directive looks like this:
<VirtualHost www.company.com> ServerName www.company.com DocumentRoot /usr/local/htdocs/company ServerPath /company </VirtualHost>
The preceding ServerPath directive will redirect all requests beginning with the /company path to /usr/local/htdocs/company, the correct place. Your HTML documents should contain only relative links; this will make your files accessible from any browser.
With the ability to control most aspects of the server behavior on a per-virtual-host basis, multihoming is very powerful indeed. The downside of multihoming is that the load processed by a single machine is increased. One machine is going to handle all the requests for all the sites you host in it. If these requests require several connections per second, performance may degrade below acceptable levels depending on the supporting network and computer hardware. However, there are ways of eliminating these bottlenecks and obtaining the administrative benefits of a multihomed Web server. In "Increasing Performance," I discuss how to distribute user load between several mirrored Web servers.
The advantages of multihoming greatly outweigh any downsides. If your traffic load is heavy, you're probably interested in distributing the load. An infrastructure of multihomed distributed servers may be your best plan. By making all your servers clones of each other, you have in fact created a fault-tolerant information service. Should one machine go down, access to the site is unaffected. Other servers will handle requests for the non-operational server. This is good because it may reduce the urgency of a disaster situation and because it will allow you to maintain a high-level performance site at a low cost.
Using a single server to host various Web sites reduces the cost of the hardware, maximizes the use of computer and network hardware, and at the same time reduces the number of administrative chores. All this occurs because there is only one machine to set up, one machine to troubleshoot, one machine to upgrade, one machine to monitor, and one machine to back up.
One is a good number. The most valuable resource of a Web site is the time of the webmaster, so reducing his workload is a good thing. If you are running more than one HTTP server hosting different content, you may want to evaluate whether this feature can make your life easier as well.