Common Gateway Interface (CGI)

By the way, if you are going to skip this geek speak section on CGI, be sure to catch the final CGI-related heading — How to Cheat at CGI — its got non-geek information you just may want to know about!

Building HTML forms and handling user interaction through Web pages requires action on both sides of the client-server connection. So far, this book has concentrated mainly on the client side, except for the clickable image map files that we describe in Chapter 16.

In this chapter, we move across the network connection from the client to the server-side and describe the Common Gateway Interface (CGI). The CGI lets Web pages communicate with programs on the server to either provide customized information or build interactive exchanges between clients and servers.

Along the way, we show you the history and foundations of CGI and cover the basic details of its design and use. We also introduce you to the issues surrounding your choice of a scripting or programming language for CGI, and we give you the chance to read through some interesting example programs. Because you're getting these programs — and the code to run them — as part of this book (the code is on the CD), we hope that you end up using them as part of your Web pages.

If you've been reading carefully, you've already noticed one important thing about CGI programs: They run on the Web server, not on the client-side (where you run your browser to access Web-based information). This means that you can't use our CGI programs on your own machine unless it can run Web server software (usually called an HTTP daemon, or more simply, httpd) at the same time that it runs your browser.

On Windows NT, Macintoshes, and UNIX machines, running both a Web server and a browser is quite doable; on Windows 3.x and Windows 95 computers, it's a bit trickier. For the latter machines, Quarterdeck's $99 WebServer software package fits the bill quite nicely: It supports the latest version of CGI (1.1) and lets you work both sides of the client/server street with ease. For more information on this product, visit Quarterdeck's Web site at

http://www.qdeck.com/

The "Common Gateway" Is NOT a Revolving Door!

Gateway scripts, or programs, add the capability for true interaction between browsers and servers across the Web. This is a powerful capability that's limited only by your imagination and the tools at hand. Gateway scripts supply the underlying functions that let you perform searches on Web documents or databases that provide the capacity to accept and process forms data and that deliver the intelligence necessary to customize Web pages based on user input.

If you build Web pages, you must manage this interaction across the Web: That is, you must build the front-end information that users see and interact with as well as the back-end programming that accepts, interprets, and responds to user input and information. This coordination does require some effort and some programming, but if you're willing to take the time to learn CGI, you'll be limited only by the amount of time and energy you have for programming your Web pages.

Describing CGI programs

It doesn't really matter whether you call your server-side work a program or a script — in fact, this distinction is often used to describe the tools you use to build one. That is, scripts get built with scripting tools or languages, and programs get built with programming languages. Like a script read by an actor, a scripting language creates a set of actions and activities that must be performed in the prescribed order each time a script is executed; that's why scripts are often said to be interpreted. Programming languages, on the other hand, usually are transformed into a special executable form by a special program called a compiler that takes the programming language statements and turns them into equivalent computer instructions — that's why programs are often said to be executed. Because both approaches work equally well with the Web's Common Gateway Interface, for convenience, we simply refer to them as CGI programs. You can call them whatever you want!

CGI is the method that UNIX-based CERN and NCSA Web servers use to mediate interaction between servers and programs. Because UNIX was the original Web platform, it sets the model (yet again) for how the Web handles user interaction. Although you can find other platforms that support Web servers — like Windows NT and the Macintosh — they, too, must follow the standard set for the CERN and NCSA implementations of httpd. Whether or not they conform to the CERN or NCSA models, all Web servers must provide CGI capabilities or a similar set of functions to match what CGI can do.

What's going on in a CGI program?

You can think of a CGI program as an extension to the core WWW server services. In fact, CGI programs are like worker bees that do the dirty work on behalf of the server. The server serves as an intermediary between the client and the CGI program. It's good to be the queen bee and make all those workers do their things at your behest!

The server invokes CGI programs based on information that the browser provides (as in the <FORM> tag, where the ACTION attribute supplies a URL for the particular program that services the form). The browser request sets the stage for a series of information hand-offs and exchanges:

* The browser makes a request to a server for a URL, which actually contains the name of a CGI program for the server to run.

* The server fields the URL request, figures out that it points to a CGI program (usually by parsing the filename and its extension or through the directory where the file resides), and fires off the CGI program.

* The CGI program performs whatever actions it's been built to supply, based on input from the browser request. These actions can include obtaining the date and time from the server's underlying system, accumulating a counter for each visit to a Web page, searching a database, and so on.

* The CGI program takes the results of its actions and returns the proper data back to the server; often the program formats these results as a Web page for delivery back to the server (if the Content-Type is text/html).

* The server accepts the results from the CGI program and passes them to the browser, which renders them for display to the user. If this exchange is part of an ongoing, interactive Web session, these results can include additional forms tags to accept further user input, along with a URL for this or another CGI program. Thus, the cycle begins anew.

This is a kind of disjointed way to have a conversation over the Web, but it does allow information to move both ways. The real beauty of CGI programs is that they extend a simple WWW server in every conceivable direction and make that server's services more valuable.

What's in CGI Input?

A request for a CGI program is encoded in HTML in a basic form as shown in this example:

<A HREF="http://www.hal.com/hal-bin/silly_quote.pl">Silly Quote</A>

The URL declaration says to execute the silly_quote.pl CGI program on the www.hal.com WWW server from the hal-bin directory. This request has no additional input data to pass to the CGI program. (The clue is that no ? is appended to the URL — see the explanation of the question mark argument later in this chapter.) The result of the CGI program is a Web page created on-the-fly and returned to the browser.

The information gathered by an HTML form or requested by a user (with a search request or other information query) passes to CGI programs in one of two ways:

* As an appendage to a CGI program's URL (most commonly, for WAIS requests or other short information searches). This way uses the METHOD="GET" option.

* As a stream of bytes through the UNIX default standard input device (stdin) in response to the ACTION setting for an HTML <FORM> tag. This way is best used with the METHOD="POST" option.

In the next section, you see how forms create special formats for information that is intended for use in CGI programs. Also, you see how those programs use and deliver that information.

Short and sweet: the "extended URL" approach

Most search engines use what's called a document-based query to obtain information from users. This query consists of nothing more than special characters appended to the end of the search engine's URL. Document-based queries are intended to solicit search terms or key words from a browser and then deliver them to a CGI program that uses them to search a database or a collection of files. Their simplicity is what makes document-based queries so good for soliciting small amounts of input from users and why you see them in so many Web pages.

Document-based queries depend on three ingredients for their successful operation:

* The <ISINDEX> tag within the <HEAD> section of an HTML document enables searching of the document by the browser.

* A special URL format is generated by adding the contents of a query to the URL, with the search terms added at the end and denoted by question marks.

* Special arguments in your underlying CGI program.

Here's how the process actually works:

* The <ISINDEX> tag in the <HEAD> of the document causes the browser to supply a search widget that allows the user to enter keywords (a widget is a generic bit of software that performs a particular task; in this case, the widget handles packages and sends search requests). These keywords are then bundled into an HTTP request and passed to the corresponding CGI program. If the CGI program finds that no arguments are appended to the URL, it returns a default page that includes the search widget to the browser. This sometimes happens with the first search request because the complete search widget may not be included on every Web page.

* At the prompt, the reader enters a string they want to search for and presses Enter (or otherwise causes the string to be shipped to the CGI program for handling).

* The browser calls the same URL as before, except it appends the search string following a question mark. So, if the search engine program's URL is

http://www.HTML4d.com/cgi-bin/searchit

and the string to be searched for is ìtether,î then the new URL becomes

http://www.HTML4d.com/cgi-bin/searchit?tether

* The server receives the URL exactly as formatted and passes it to the searchit program, with the string after the question mark passed as an argument to searchit.

* This time, the program performs an actual search and returns the results as another HTML page (instead of the default prompt page that was sent the first time).

Every part of this operation depends on the others: The browser activates the <ISINDEX> tag that allows the query to be requested and entered. Then the browser appends the query string to the URL and passes on the query as an argument to the search program. The search program uses the query value as the focus of its search operation and returns the search results to the browser via another custom-built HTML document.

Long-winded and thorough: the input-stream approach

As you build an HTML form, you have important definitions to make, including the assignment of names and associated values to your variables or selections. When users fill out HTML forms, they're actually instructing the browser to build a list of associated name-value pairs for each selection made or for each field that's filled in.

Name=value pairs take the form

name=value&

The equal sign (=) separates the name of the field from its associated value. The ampersand (&) separates the end of the value's string from the next item of text information in a completed form. For <SELECT> statements where MULTIPLE choices are allowed, the resulting list has multiple name=value pairs where the name remains the same, but the value assignment changes for each value chosen.

Reading through forms information delivered to a CGI program's standard input (stdin) is a matter of checking certain key environment variables (which we cover in the next section) and then parsing (separating into individual words or units or information) the input data. This reading consists of separating name=value pairs and using the names, with their associated values, to guide subsequent processing. The easiest way to do this, from a programming perspective, is to first parse and split out name=value pairs by looking for the ending ampersand (&), and then divide these pairs into their name and value parts by looking for the equal sign (=).

Here are some Perl code fragments that you can use to parse a forms' input data. (It assumes that, in keeping with our recommendation, you use METHOD="POST" for passing data.)

# this reads the input stream from the Standard Input

# device (STD) into the buffer variable $buffer, using

# the environment variable CONTENT_LENGTH to know how

# much data to read

read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});

#Split the name=value pairs on '&'

@pairs = split (/&/, $buffer);

# Go through pairs and determine the name and value for

# each named form field

for each $pair (@pairs) {

# Split name from value on "="

($name, $value) = split(/=/,$pair);

# Translate URL syntax of + for blanks

$value =~ tr/+/ /;

# Substitute hexadecimal characters with their normal equivalents

$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg;

# Deposit the value in the FORMS array, associated to name

$FORM($name) = $value;

Handling environment variables

As part of the CGI environment, the httpd server's software version and configuration are of interest, as are the multiple variables associated with the server. You can use the following shell program to produce a complete listing of such information; it is a valuable testing tool when installing or modifying a Web server.

#!/bin/sh

echo Content-type: text/plain

echo

echo CGI/1.1 test script report:

echo

echo argc is $#. argv is "$*".

echo

echo SERVER_SOFTWARE = $SERVER_SOFTWARE

echo SERVER_NAME = $SERVER_NAME

echo GATEWAY_INTERFACE = $GATEWAY_INTERFACE

echo SERVER_PROTOCOL = $SERVER_PROTOCOL

echo SERVER_PORT = $SERVER_PORT

echo REQUEST_METHOD = $REQUEST_METHOD

echo HTTP_ACCEPT = $HTTP_ACCEPT

echo PATH_INFO = $PATH_INFO

echo PATH_TRANSLATED = $PATH_TRANSLATED

echo SCRIPT_NAME = $SCRIPT_NAME

echo QUERY_STRING = $QUERY_STRING

echo REMOTE_HOST = $REMOTE_HOST

echo REMOTE_ADDR = $REMOTE_ADDR

echo REMOTE_USER = $REMOTE_USER

echo CONTENT_TYPE = $CONTENT_TYPE

echo CONTENT_LENGTH = $CONTENT_LENGTH

This UNIX shell script is widely distributed around the Net. We found this version in the NCSA hoohoo collection at

http://hoohoo.ncsa.uiuc.edu/cgi-bin/test-cgi

Running this script on a Web server (in this case, the NCSA Web server hoohoo) produces the following output:

CGI/1.1 test script report:

argc is 0. argv is .

SERVER_SOFTWARE = NCSA/1.4b2

SERVER_NAME = hoohoo.ncsa.uiuc.edu

GATEWAY_INTERFACE = CGI/1.1

SERVER_PROTOCOL = HTTP/1.0

SERVER_PORT = 80

REQUEST_METHOD = GET

HTTP_ACCEPT = */*, image/gif, image/x-xbitmap, image/jpeg

PATH_INFO =

PATH_TRANSLATED =

SCRIPT_NAME = /cgi-bin/test-cgi

QUERY_STRING =

REMOTE_HOST = etittel.zilker.net

REMOTE_ADDR = 198.252.182.167

REMOTE_USER =

AUTH_TYPE =

CONTENT_TYPE =

CONTENT_LENGTH = n

Each capitalized variable name (to the left of the equal marks) and its associated output are environment variables CGI sets. These are always available for use in your programs.

Two environment variables are especially worthy of note:

* The QUERY_STRING variable is associated with the GET method of information-passing (as is common with search commands) and must be parsed for such queries or information requests.

* The CONTENT_LENGTH variable is associated with the POST method of information-passing (as is recommended for forms or other lengthier types of input data). The browser accumulates the variable while assembling the forms data to deliver to the server; the variable then tells the CGI program how much input data it has to read from the standard input device.

The environment variables also identify other items of potential interest, including the name of the remote host and its corresponding IP address, the request method used, and the types of data that the server can accept. As you become more proficient in building CGI programs, you find further uses for many of these values.

Forming Up: Input-Handling Programs

When you create a form in HTML, each input field has an associated unique NAME. Filling out the form usually associates one or more values with each name. As shipped from the browser to the Web server (and on to the CGI program the URL targets), the form data is a stream of bytes, consisting of name=value pairs separated by ampersand characters (&).

Each of these name=value pairs is URL-encoded, which means that spaces are changed into plus signs (+) and some characters are encoded into hexadecimal. Decoding these URLs is what caused the interesting translation contortions in our Perl code sample in the preceding section.

If you visit the NCSA CGI archive, you can find links to a number of input-handling code libraries that can help you build forms. You can find all this information at

ftp://ftp.ncsa.uiuc.edu/Web/httpd/Unix/ncsa_httpd/cgi/

* Bourne Shell: The AA Archie Gateway, which contains calls to sed and awk to convert a GET form data string into separate environmental variables. (AA-1.2.tar.Z)

* C: The default scripts for NCSA httpd, including C routines and example programs for translating the query string into various structures. (ncsa-default.tar.Z)

* Perl: The Perl CGI-lib contains a group of useful Perl routines to decode and manage forms data. (cgi-lib.pl.Z)

* TCL: The TCL argument processor includes a set of TCL routines to retrieve forms data and insert it into associated TCL variables. (tcl-proc-args.tar.Z)

What this means to you, gentle reader, is that most of the work of reading in and organizing forms information is already widely and publicly available in one form or fashion. This simplifies your programming efforts because you can concentrate on writing the code that interprets the input and builds the appropriate HTML document that's returned to your reader as a response.

Coding CGI

In this section, we examine three different CGI programs. We've included their code and a great deal of associated information on the CD that comes with this book. Each of these programs is available in three versions: AppleScript (for use on an Apple httpd server), Perl (for use on any system that supports a Perl interpreter/compiler), and C (for use on any system that supports a C compiler).

Ladies and gentlemen: Choose your weapons!

Before we launch you into these excellent examples, we'd like to encourage you to use them in your own HTML documents and CGI programs (we're giving the license to this code away, so you can use it without restrictions). We want to conclude the discussion portion of this chapter with an investigation of why we chose to implement each of these programs in three forms and how you can choose a suitable language to write your CGI programs.

It's quite true that you can build your CGI programs with just about any programming or scripting language that your Web server supports. Nothing can stop you from ignoring all options that we cover here and using something completely different.

Nevertheless, we can think of good reasons why you should consider using these options and equally good reasons why you should ignore other options. We cheerfully concede that there are probably as many opinions on this subject as there are CGI programmers, but we'd like you to consider carefully before deciding on a CGI language that you're likely to spend considerable time and effort learning and using.

For example, we include the NCSA test-cgi script earlier in this chapter. It's written in the basic C shell, a command language that is common on many UNIX systems and makes an adequate scripting language for many uses. Nevertheless, we don't think that UNIX shells are suitable for heavy-duty CGI programming because they mix UNIX system commands freely within their own syntax.

The problem with CGI programming under UNIX is that it depends on the standard input (stdin) and standard output (stdout) devices as the methods for moving data between Web servers and browsers. Each new UNIX process automatically creates its own stdin and stdout; sometimes UNIX shells can get confused regarding where its input is coming from and where its output is going to. This confusion can be a side effect of spawning tasks (or having one running program start up another program to perform a specific task and report its results back to the original program when it's finished) or running system commands. Whatever the cause, this confusion can lose the input or output for CGI programs. That's the main reason why we don't recommend shell scripts of any kind for heavier-duty CGI applications (for example, forms-processing versus query-handling).

On the plus side, Perl offers straightforward access to UNIX system calls and capabilities within a tightly structured environment. Perl includes the positive features of languages like C, Pascal, awk, sed, and even Basic, and it offers powerful string-handling and output-management functions. Perl is emerging as the favorite of many Web programmers (and is certainly our favorite CGI gurus' language of choice). Best of all, Perl implementations are already available for UNIX, DOS, Windows NT, the Macintosh OS, and the Amiga, with numerous other implementations under way. We've had excellent luck moving Perl from one platform to another with only small changes.

We include C because it's a powerful programming language and remains a tool of choice in the UNIX environment. What features and functions it doesn't offer as built-ins are readily available in the form of system APIs (Application Programming Interfaces — the set of routines used to invoke system functions and other kinds of prepackaged functionality within a program) and code libraries. C is also portable (barring the use of system APIs, which can change from one system to another). One version of C is available for just about every platform, and multiple implementations are available for popular platforms and operating systems. We're especially fond of the GNU C and the related GNU Tools from the Free Software Foundation that Richard Stallman pioneered.

If you use the Macintosh as a Web server, AppleScript is pretty much your only option. Even so, it has proven to be a worthwhile tool for building CGI programs and is widely used in the Macintosh Web community. If you're a real Mac-o-phile, be sure to consult Chapter 21 for some excellent pointers on Macintosh tools and technologies for the Web.

Whatever language you choose for your CGI programs, be sure that it provides good string-handling capabilities and offers reasonable output controls. Because you'll be reading and interpreting byte stream input and creating HTML documents galore, look for these important capabilities. Also, we recommend that you pick a language that is already widely used in the Web community. You'll likely find lots of related modules, libraries, and code widgets that may save you programming time and make your job easier. But hey — it's your choice!

We've just barely scratched the surface of CGI as a topic. For more information, consult your favorite search engine and use ìCGIî or ìCGI scriptî as your search string. It turns up tons of useful references. Also, one of the authors of this book has cowritten two other books that are largely devoted to CGI programming. They are CGI Bible and World Wide Web Programming Secrets with Perl and CGI, both by Ed Tittel, Mark Gaither, Sebastian Hassinger, and Mike Erwin (both from IDG Books Worldwide, Inc., Programmers Press). Although you can also find other books on this subject, you won't find any others that your authors like more!

Example 1: What time is it?

This short Perl program accesses the system time on the server and writes an HTML page with the current time to the user's screen. This program has filenames beginning with ìtimeî in the CGI subdirectory on the book's CD.

Example 2: Counting page visits

This AppleScript program establishes a counter that tracks the number of times a page is visited. This kind of tool can provide useful statistics for individuals or organizations curious about how much traffic their pages actually receive. This program has filenames beginning with ìcounterî in the CGI subdirectory on the book's CD.

Example 3: Decoding clickable map coordinates

This C program can distinguish whether the right or left side of a graphic is selected (as well as defining a default to handle when the graphic isn't selected at all). It shows how a script handles the definitions inside an image map file. This program has filenames beginning with ìismapperî in the CGI subdirectory on the book's CD.

Installing and Using CGIs

Most Web servers are configured to look for CGI programs in a particular directory that's under the server's control. Normal users (including you) probably won't be able to copy CGI scripts into this directory without obtaining help from their friendly neighborhood systems administrator or Webmaster. Before you can install and use any CGI program on a particular server, you want to talk to one of these individuals (if not both), tell them what you're trying to do, and ask for their help and advice in making it happen.

Don't be hurt or surprised if this process takes some time: Systems administrators and Webmasters tend to be chronically busy people. You may have to wait a while to get their attention and then discuss your needs with them. Make your initial approach by e-mail or a phone call and briefly explain what you want to do. (For example: ìI want to include a counter CGI on my home page to track visitors.î) Among other things, you may discover that they already have a CGI available for that very purpose on your server, and they'll simply tell you how to include its reference in the CGI invocation of your home page. (You may not have to use our code at all!)

Also, most of these individuals are responsible for the safe and proper operation of the server and/or the Web site where you want to run a CGI program. Don't be offended if you hear them say, ìWell, I need to look it over first to make sure that it's okay before you can use it.î That's because they're only trying to make sure that you're not planning on introducing software that could cripple or compromise their system. So, please, don't take it personally when this happens — these people are just doing their jobs and asking for a review shows you that they care about what happens on their (and your) server. This is actually a good thing!

Finally, the systems administrator or Webmaster can show you how to install and use CGIs more quickly and efficiently than if you try to figure it out for yourself. So, be ready to wait your turn to get some time and be ready to listen and learn when your turn comes. You won't be disappointed!

As the sample CGI programs in this chapter illustrate, there are as many ways to skin the proverbial CGI as there are ideas and approaches about how to implement them. We sincerely hope that you can use these paltry tools we've included with this chapter and that you investigate the contents of the CD that comes with the book. On it, you can find C, Perl, and AppleScript versions for all three programs. In the next chapter, we extend our coverage of server-side Web activities as we investigate search engines, Webcrawlers, and other interesting server-side services.

How to Cheat at CGI

If you didn't get a warm fuzzy feeling while reading about CGI, you are not alone. CGI can be a difficult, especially if you are not a programmer and don't really want to be. But before you throw out the virtual CGI baby with the cyber-bathwater, let us show you how to cheat at CGI!

The secret is to use a forms designing application to create your forms and the background CGIs automatically. O'Reilly's PolyForm is one of the best of these types of applications. Answer a few simple questions, click a few buttons, follow the instructions of the semi-intelligent wizard, and poof! — out comes a form and matching CGI application.

Find out more about PolyForm by visiting:

http://polyform.ora.com/

Unfortunately, PolyForm is only available for Windows computers, and we were unable to find any worthwhile Mac or UNIX software that provides similar features.


Next Section | Extending Your Web: CGI and Other Alternatives | Extras TOC


E-Mail: HTML for Dummies at html4dum@lanw.com

URL: http://www.lanw.com/html4dum/h4d3e/extras/ch18sec1.htm 
Text - Copyright © 1995, 1996, 1997 Ed Tittel & Stephen N. James. 
For Dummies, the Dummies Man logo and Dummies Press are trademarks or registered trademarks of Wiley Publishing, Inc. Used with Permission.
Web Layout - Copyright © 1997, LANWrights
Revised -- May, 2002 [MCB]