HTML 4 For Dummies - Bonus Chapters

Common Gateway Interface (CGI)


If you decide to skip this geek speak section on CGI, be sure to catch the final CGI-related heading -- How to Cheat at CGI. That section has nongeek information that you just may want to know!

Cross-Reference Icon Building HTML forms and handling user interaction through Web pages requires action on both sides of the client-server connection. In the book, we concentrate mainly on the client side, except for the clickable image map files that we describe in Chapter 14.

In this extra CD-ROM chapter, we move across the network connection from the client to the server-side and describe the Common Gateway Interface (CGI). A CGI program lets Web pages communicate with applications on the server to provide customized information or to build interactive exchanges between clients and servers.

Along the way, we show you the history and foundations of CGI and cover the basic details of its design and use. We also introduce you to the issues surrounding your choice of a scripting or programming language for CGI, and we give you the chance to read through some interesting example programs. Because you get these programs -- and the code to run them -- in another section of the CD, we hope that you end up using them on your Web pages.

Warning Icon If you read the previous paragraphs carefully, you’ve noticed one important element of CGI programs: They run on the Web server, not on the client-side (where you run your browser to access Web-based information). You can’t use our CGI programs on your own machine unless it can run Web server software (usually called an HTTP daemon, or more simply, httpd) at the same time that it runs your browser.

On Windows NT, Macintoshes, and UNIX machines, running both a Web server and a browser is quite doable; on Windows 3.x and Windows 95 computers, it’s a bit trickier. For the latter machines, the Quarterdeck inexpensive WebServer software package fits the bill nicely: It supports CGI Version 1.1 and enables you to work both sides of the client/server street with ease. For more information on this product, visit Quarterdeck’s Web site at

A "Common Gateway" is not a revolving door!

Gateway scripts, or programs, add the capability for true interaction between browsers and servers across the Web. This powerful capability is limited only by your imagination and the tools at hand. Gateway scripts supply the underlying functions that enable you to perform searches on Web documents or databases that provide the capacity to accept and process forms data and that deliver the intelligence necessary to customize Web pages based on user input.

If you build Web pages, you must manage this interaction across the Web; that is, you must build the front-end information that users see and interact with as well as the back-end programming that accepts, interprets, and responds to user input and information. This coordination does require some effort and some programming, but if you’re willing to take the time to learn CGI, you’ll be limited only by the amount of time and energy you have for programming your Web pages. For a different take on the same idea, you should also check out Extra 3 on the CD, which covers Dynamic HTML.

Describing CGI programs

Whether you call your server-side work a program or a script doesn’t -- in fact, this distinction is often used to describe the tools you use to build one. Scripts get built with scripting tools or languages, and programs get built with programming languages. Like a script read by an actor, a scripting language creates a set of actions and activities that must be performed in the prescribed order each time a script is executed; that’s why scripts are often said to be interpreted.

Programming languages, on the other hand, usually are transformed into a special executable form by a special program called a compiler that takes the programming language statements and turns them into equivalent computer instructions -- that’s why programs are often said to be executed. Because both approaches work equally well with the Web’s Common Gateway Interface, we simply refer to them as CGI programs for convenience. You can call them whatever you want!

CGI is the method that UNIX-based CERN and NCSA Web servers use to mediate interaction between servers and programs. Because UNIX was the original Web platform, it sets the model (yet again) for how the Web handles user interaction. Although you can find other platforms that support Web servers -- such as Windows NT and the Macintosh -- they, too, must follow the standard set for the CERN and NCSA implementations of httpd. Whether or not they conform to the CERN or NCSA models, all Web servers must provide CGI capabilities or a similar set of functions to match what CGI can do.

What's going on in a CGI program?

You can think of a CGI program as an extension of the core WWW server services. In fact, CGI programs are like worker bees that do the dirty work on behalf of the server. The server serves as an intermediary between the client and the CGI program. It’s good to be the queen bee and make all those workers do their things at your behest!

The server invokes CGI programs based on information that the browser provides (as in the <FORM> tag, where the ACTION attribute supplies a URL for the particular program that services the form). The browser request sets the stage for a series of information hand-offs and exchanges:

This method is a rather disjointed way to have a conversation over the Web, but it does allow information to move both ways. The real beauty of CGI programs is that they extend a simple WWW server in every conceivable direction and make that server’s services more valuable. As you see in Extra 3, Dynamic HTML provides quite similar functionality, but does all of its work on the client side rather than relying on client-server and server-client “conversations” to do its job.

What’s in CGI input?

A request for a CGI program is encoded in HTML in a basic form as shown in this example:

<A HREF=""> Silly Quote</A>

The URL declaration says to execute the CGI program on the WWW server from the hal-bin directory. This request has no additional input data to pass to the CGI program. (The clue is that no ? is appended to the URL -- see the explanation of the question mark argument later in this chapter.) The result of the CGI program is a Web page created on-the-fly and returned to the browser.

The information gathered by an HTML form or requested by a user (with a search request or other information query) passes to CGI programs in one of two ways:

In the next section, you see how forms create special formats for information intended for use in CGI programs. Also, you see how those programs use and deliver that information.

Short and sweet: the "extended URL" approach

Most search engines use what’s called a document-based query to obtain information from users. This query consists of nothing more than special characters appended to the end of the search engine’s URL. Document-based queries are intended to solicit search terms or key words from a browser and then deliver them to a CGI program that uses them to search a database or a collection of files. Such simplicity makes document-based queries so good for soliciting small amounts of input from users and why you see them in so many Web pages.

Document-based queries depend on three ingredients for their successful operation:

Here’s how the process actually works:

* The <ISINDEX> tag in the <HEAD> of the document causes the browser to supply a search widget that allows the user to enter keywords. A widget is a generic bit of software that performs a particular task; in this case, the widget handles packages and sends search requests. These keywords are bundled into an HTTP request and passed to the named CGI program. If the CGI program finds no arguments appended to the URL, it returns a default page that delivers the search widget to the browser. This sometimes happens with the first search request because a complete search widget may not be included on every Web page.

Every part of this operation depends on the others: The browser activates the <ISINDEX> tag that allows the query to be requested and entered. Then the browser appends the query string to the URL and passes on the query as an argument to the search program. The search program uses the query value as the focus of its search operation and returns the search results to the browser via another custom-built HTML document.

Long-winded and thorough: the input-stream approach

As you build an HTML form, you have important definitions to make, including the assignment of names and associated values to your variables or selections. When users fill out HTML forms, they actually instruct the browser to build a list of associated name-value pairs for each selection made or for each field they fill in.

Name=value pairs take the form

The equal sign (=) separates the name of the field from its associated value. The ampersand (&) separates the end of the value’s string from the next item of text information in a completed form. For <SELECT> statements where MULTIPLE choices are allowed, the resulting list has multiple name=value pairs where the name remains the same, but the value assignment changes for each value chosen.

Technical Stuff Icon Reading through forms information delivered to a CGI program’s standard input (stdin) is a matter of checking certain key environment variables (which we cover in the next section) and then parsing (separating into individual words or units or information) the input data. This reading consists of separating name=value pairs and using the names, with their associated values, to guide subsequent processing. The easiest way to do this, from a programming perspective, is to first parse and split out name=value pairs by looking for the ending ampersand (&), and then divide these pairs into their name and value parts by looking for the equal sign (=).

Tip Icon

Here are some Perl code fragments that you can use to parse a form’s input data. (It assumes that, in keeping with our recommendation, you use METHOD="POST" for passing data.)

# this reads the input stream from the Standard Input
# device (STD) into the buffer variable $buffer, using
# the environment variable CONTENT_LENGTH to know how
# much data to read
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
#Split the name=value pairs on '&'
@pairs = split (/&/, $buffer);
# Go through pairs and determine the name and value for
# each named form field
for each $pair (@pairs) {
# Split name from value on "="
($name, $value) = split(/=/,$pair);
# Translate URL syntax of + for blanks
$value =~ tr/+/ /;
# Substitute hexadecimal characters with their normal equivalents
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg;
# Deposit the value in the FORMS array, associated to name
$FORM($name) = $value;

Handling environment variables

As part of the CGI environment, the httpd server’s software version and configuration are of interest, as are the multiple variables associated with the server. You can use the following shell program to produce a complete listing of such information; this program is a valuable testing tool when installing or modifying a Web server.

echo Content-type: text/plain
echo CGI/1.1 test script report:
echo argc is $#. argv is "$*".

Web Icon This UNIX shell script is widely distributed around the Net. We found this version in the NCSA hoohoo collection at

Running this script on a Web server (in this case, the NCSA Web server hoohoo) produces the following output:

CGI/1.1 test script report:
argc is 0. argv is .
HTTP_ACCEPT = application/,
application/msword, application/,
image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
HTTP_USER_AGENT = Mozilla/2.0
(compatible; MSIE 3.02; Win32)
SCRIPT_NAME = /cgi-bin/test-cgi

Each capitalized variable name (to the left of the equal marks) and its associated output are environment variables that CGI sets. These variables are always available for use in your programs.

Two environment variables are especially worthy of note:

The environment variables also identify other items of potential interest, including the name of the remote host and its corresponding IP address, the request method used, and the types of data that the server can accept. As you become more proficient in building CGI programs, you may find further uses for many of these values.

Forming up: input-handling programs

When you create a form in HTML, each input field has an associated unique NAME. Filling out the form usually associates one or more values with each name. As shipped from the browser to the Web server (and on to the CGI program the URL targets), the form data is a stream of bytes, consisting of name=value pairs separated by ampersand characters (&).

Each of these name=value pairs is URL-encoded, which means that spaces are changed into plus signs (+) and some characters are encoded into hexadecimal. Decoding these URLs is what caused the interesting translation contortions in our Perl code sample in the preceding section.

Web Icon

If you visit the NCSA CGI archive, you can find links to a number of input-handling code libraries that can help you build forms. You can find all this information at

The common input-handling code libraries include

Bourne Shell: The AA Archie Gateway contains calls to sed and awk to convert a GET form data string into separate environmental variables. (AA-1.2.tar.Z)

C: The default scripts for NCSA http includes C routines and example programs for translating the query string into various structures. (ncsa-default.tar.Z)

Perl: The Perl CGI-lib contains a group of useful Perl routines to decode and manage forms data. (

TCL: The TCL argument processor includes a set of TCL routines to retrieve forms data and insert it into associated TCL variables. (tcl-proc-args.tar.Z)

Most of the work of reading and organizing forms information is already widely and publicly available in one form or fashion, which is great for you. This simplifies your programming efforts because you can concentrate on writing the code that interprets the input and builds the appropriate HTML document that’s returned to your reader as a response.

Coding CGI

In this section, we examine three different CGI programs. We include their code and a great deal of associated information on the CD. Each of these programs is available in three versions: AppleScript (for use on an Apple httpd server), Perl (for use on any system that supports a Perl interpreter/compiler), and C (for use on any system that supports a C compiler).

Ladies and gentlemen: Choose your weapons!

Before we launch you into these excellent examples, we’d like to encourage you to use them in your own HTML documents and CGI programs (we’re giving away the license to this code, so you can use it without restrictions). We first conclude the discussion portion of this chapter with an investigation of why we chose to implement each of these programs in three forms and how you can choose a suitable language to write your CGI programs.

You can build your CGI programs with just about any programming or scripting language that your Web server supports. Nothing can stop you from ignoring all options that we cover here and using something completely different.

Nevertheless, we can think of good reasons why you should consider using these options and equally good reasons why you should ignore other options. We cheerfully concede that you’ll find probably as many opinions on this subject as CGI programmers, but we ask that you consider carefully before deciding on a CGI language with which you’re likely to spend considerable time and effort learning and using.

For example, we include the NCSA test-cgi script earlier in this chapter. It’s written in the basic C shell (a command language common on many UNIX systems) and makes an adequate scripting language for many uses. Nevertheless, we don’t think that UNIX shells are suitable for heavy-duty CGI programming, because they mix UNIX system commands freely within their own syntax.

The problem with CGI programming under UNIX is that it depends on the standard input (stdin) and standard output (stdout) devices as the methods for moving data between Web servers and browsers. Each new UNIX process automatically creates its own stdin and stdout; sometimes UNIX shells can get confused regarding where its input comes from and where its output goes. This confusion can be a side effect of running system commands or of spawning tasks (or having one running program start up another program to perform a specific task and report its results back to the original program when it finishes). Whatever the cause, this confusion can lose the input or output for CGI programs, which is the main reason why we don’t recommend shell scripts of any kind for heavier-duty CGI applications (for example, forms-processing versus query-handling).

On the plus side, Perl offers straightforward access to UNIX system calls and capabilities within a tightly structured environment. Perl includes the positive features of languages like C, Pascal, awk, sed, and even Basic, and it offers powerful string-handling and output-management functions. Perl is emerging as the favorite of many Web programmers (and is certainly our favorite CGI gurus’ language of choice). Best of all, Perl implementations are already available for UNIX, DOS, Windows NT, Macintosh OS, and the Amiga, with numerous other implementations under way. We’ve had excellent luck moving Perl from one platform to another with only small changes.

We include C because it’s a powerful programming language and remains a tool of choice in the UNIX environment. What features and functions it doesn’t offer as built-ins are readily available in the form of system APIs (Application Programming Interfaces -- the set of routines used to invoke system functions and other kinds of prepackaged functionality within a program) and code libraries. C is also portable (barring the use of system APIs, which can change from one system to another). One version of C is available for just about every platform, and multiple implementations are available for popular platforms and operating systems. We’re especially fond of the GNU C and the related GNU Tools from the Free Software Foundation that Richard Stallman pioneered.

Cross Reference Icon If you use a Macintosh as a Web server, AppleScript is pretty much your only option. Even so, it has proven to be a worthwhile tool for building CGI programs and is widely used in the Macintosh Web community. If you’re a real Macophile, be sure to consult Extra 8 for some excellent pointers on Macintosh tools and technologies for the Web.

Whatever language you choose for your CGI programs, be sure that it provides good string-handling capabilities and offers reasonable output controls. Because you’ll be reading and interpreting byte stream input and creating HTML documents galore, look for these important capabilities. Also, we recommend that you pick a language that is already widely used in the Web community. You’ll likely find lots of related modules, libraries, and code widgets that may save you programming time and make your job easier. But hey -- it’s your choice!

Tip Icon We’ve just barely scratched the surface of CGI as a topic. For more information, consult your favorite search engine and use “CGI” or “CGI script” as your search string. You’ll turn up tons of useful references. Also, one author of this book has cowritten two other books that are largely devoted to CGI programming: CGI Bible and World Wide Web Programming Secrets with Perl and CGI, both by Ed Tittel, Mark Gaither, Sebastian Hassinger, and Mike Erwin, and both available from IDG Books Worldwide. Although you can also find other books on this subject, you won’t find any others that your authors like more!

In the following sections, we examine three different CGI programs. Their code and a great deal of associated information is on the CD. Each program is available in three versions: AppleScript (for use on an Apple httpd server), Perl (for use on any system that supports a Perl interpreter/compiler), and C (for use on any system that supports a C compiler).

Example 1: What time is it?

This short Perl program accesses the system time on the server and writes an HTML page with the current time to the user’s screen. This program has filenames beginning with “time” in the CGI subdirectory on this CD.

Example 2: Counting page visits

This AppleScript program establishes a counter that tracks the number of times a page is visited. This kind of tool can provide useful statistics for individuals or organizations curious about how much traffic their pages actually receive. This program has filenames beginning with “counter” in the CGI subdirectory on this CD.

Example 3: Decoding clickable map coordinates

This C program can distinguish whether the right or left side of a graphic is selected (as well as defining a default to handle when the graphic isn’t selected at all). It shows how a script handles the definitions inside an image map file. This program has filenames beginning with “ismapper” in the compressed CGI files available from the FTP subdirectory on this CD.

Installing and using CGIs

Most Web servers are configured to look for CGI programs in a particular directory that’s under the server’s control. Normal users (including you) probably won’t be able to copy CGI scripts into this directory without obtaining help from their friendly neighborhood systems administrator or webmaster. Before you can install and use any CGI program on a particular server, you want to talk to one of these individuals (if not both), tell them what you want to do, and ask for their help and advice in making it happen.

Don’t be hurt or surprised if this process takes some time: Systems administrators and webmasters tend to be chronically busy people. You may have to wait a while to get their attention and then discuss your needs with them. Consider the following as you interact with your webmaster:

As the sample CGI programs in this chapter illustrate, you’ll find as many ways to skin the proverbial CGI as ideas and approaches about how to implement them. We sincerely hope that you can use the tools that we include on this CD, where you can find C, Perl, and AppleScript versions for all three programs. In Chapter 15 of the book, we extend our coverage of server-side Web activities as we investigate search engines, Webcrawlers, and other interesting server-side services.

How to cheat at CGI

If you didn’t get a warm fuzzy feeling while reading about CGI, you’re not alone. CGI can be difficult, especially if you’re not a programmer and don’t really want to be. But before you throw out the virtual CGI baby with the cyber-bathwater, let us show you how to cheat at CGI!

The secret is to use a forms-designing application to create your forms and the background CGIs automatically. O’Reilly PolyForm is one great application for this. Answer a few simple questions, click a few buttons, follow the instructions of the semi-intelligent program wizard, and poof! -- out comes a form and matching CGI application.

Web Icon Find out more about PolyForm by visiting Unfortunately, PolyForm is only available for Windows computers, and we couldn’t find any worthwhile Mac or UNIX software that provides similar features.

Extra 2 Main Page | Next Section

Home | Bonus Chapters | FTP Resources | Site Overview | Book Contents | Book URLs | Book Examples | Wayfinding Toolkit | Contact Info

E-mail: HTML For Dummies

Webmaster: Natanya Pitts, LANWrights
Copyright Information
Revised -- January 16, 1998