If you decide to skip this geek speak section on CGI, be sure to catch the final CGI-related heading -- How to Cheat at CGI. That section has nongeek information that you just may want to know!
|Building HTML forms and handling user interaction through Web pages requires action on both sides of the client-server connection. In the book, we concentrate mainly on the client side, except for the clickable image map files that we describe in Chapter 14.|
In this extra CD-ROM chapter, we move across the network connection from the client to the server-side and describe the Common Gateway Interface (CGI). A CGI program lets Web pages communicate with applications on the server to provide customized information or to build interactive exchanges between clients and servers.
Along the way, we show you the history and foundations of CGI and cover the basic details of its design and use. We also introduce you to the issues surrounding your choice of a scripting or programming language for CGI, and we give you the chance to read through some interesting example programs. Because you get these programs -- and the code to run them -- in another section of the CD, we hope that you end up using them on your Web pages.
|If you read the previous paragraphs carefully, youve noticed one important element of CGI programs: They run on the Web server, not on the client-side (where you run your browser to access Web-based information). You cant use our CGI programs on your own machine unless it can run Web server software (usually called an HTTP daemon, or more simply, httpd) at the same time that it runs your browser.|
On Windows NT, Macintoshes, and UNIX machines, running both a Web server and a browser is quite doable; on Windows 3.x and Windows 95 computers, its a bit trickier. For the latter machines, the Quarterdeck inexpensive WebServer software package fits the bill nicely: It supports CGI Version 1.1 and enables you to work both sides of the client/server street with ease. For more information on this product, visit Quarterdecks Web site at www.qdeck.com.
Gateway scripts, or programs, add the capability for true interaction between browsers and servers across the Web. This powerful capability is limited only by your imagination and the tools at hand. Gateway scripts supply the underlying functions that enable you to perform searches on Web documents or databases that provide the capacity to accept and process forms data and that deliver the intelligence necessary to customize Web pages based on user input.
If you build Web pages, you must manage this interaction across the Web; that is, you must build the front-end information that users see and interact with as well as the back-end programming that accepts, interprets, and responds to user input and information. This coordination does require some effort and some programming, but if youre willing to take the time to learn CGI, youll be limited only by the amount of time and energy you have for programming your Web pages. For a different take on the same idea, you should also check out Extra 3 on the CD, which covers Dynamic HTML.
Whether you call your server-side work a program or a script doesnt -- in fact, this distinction is often used to describe the tools you use to build one. Scripts get built with scripting tools or languages, and programs get built with programming languages. Like a script read by an actor, a scripting language creates a set of actions and activities that must be performed in the prescribed order each time a script is executed; thats why scripts are often said to be interpreted.
Programming languages, on the other hand, usually are transformed into a special executable form by a special program called a compiler that takes the programming language statements and turns them into equivalent computer instructions -- thats why programs are often said to be executed. Because both approaches work equally well with the Webs Common Gateway Interface, we simply refer to them as CGI programs for convenience. You can call them whatever you want!
CGI is the method that UNIX-based CERN and NCSA Web servers use to mediate interaction between servers and programs. Because UNIX was the original Web platform, it sets the model (yet again) for how the Web handles user interaction. Although you can find other platforms that support Web servers -- such as Windows NT and the Macintosh -- they, too, must follow the standard set for the CERN and NCSA implementations of httpd. Whether or not they conform to the CERN or NCSA models, all Web servers must provide CGI capabilities or a similar set of functions to match what CGI can do.
You can think of a CGI program as an extension of the core WWW server services. In fact, CGI programs are like worker bees that do the dirty work on behalf of the server. The server serves as an intermediary between the client and the CGI program. Its good to be the queen bee and make all those workers do their things at your behest!
The server invokes CGI programs based on information that the browser provides (as in the <FORM> tag, where the ACTION attribute supplies a URL for the particular program that services the form). The browser request sets the stage for a series of information hand-offs and exchanges:
The browser makes a request to a server for a URL, which actually contains the name of a CGI program for the server to run.
The server fields the URL request, figures out that it points to a CGI program (usually by parsing the filename and its extension or through the directory where the file resides), and fires off the CGI program.
The CGI program performs whatever actions its been built to supply, based on input from the browser request. These actions can include obtaining the date and time from the servers underlying system, accumulating a counter for each visit to a Web page, searching a database, and so on.
The CGI program takes the results of its actions and returns the proper data back to the server; often the program formats these results as a Web page for delivery back to the server (if the Content-Type is text/html).
The server accepts the results from the CGI program and passes them to the browser, which renders them for display to the user. If this exchange is part of an ongoing, interactive Web session, these results can include additional forms tags to accept further user input, along with a URL for this or another CGI program. Thus, the cycle begins anew.
This method is a rather disjointed way to have a conversation over the Web, but it does allow information to move both ways. The real beauty of CGI programs is that they extend a simple WWW server in every conceivable direction and make that servers services more valuable. As you see in Extra 3, Dynamic HTML provides quite similar functionality, but does all of its work on the client side rather than relying on client-server and server-client conversations to do its job.
A request for a CGI program is encoded in HTML in a basic form as shown in this example:
The URL declaration says to execute the silly-quote.pl CGI program on the www.hal.com WWW server from the hal-bin directory. This request has no additional input data to pass to the CGI program. (The clue is that no ? is appended to the URL -- see the explanation of the question mark argument later in this chapter.) The result of the CGI program is a Web page created on-the-fly and returned to the browser.
The information gathered by an HTML form or requested by a user (with a search request or other information query) passes to CGI programs in one of two ways:
As an appendage to a CGI programs URL (most commonly, for WAIS requests or other short information searches). This way uses the METHOD="GET" option.
As a stream of bytes through the UNIX default standard input device (stdin) in response to the ACTION setting for an HTML <FORM> tag. This way is best used with the METHOD="POST" option.
In the next section, you see how forms create special formats for information intended for use in CGI programs. Also, you see how those programs use and deliver that information.
Most search engines use whats called a document-based query to obtain information from users. This query consists of nothing more than special characters appended to the end of the search engines URL. Document-based queries are intended to solicit search terms or key words from a browser and then deliver them to a CGI program that uses them to search a database or a collection of files. Such simplicity makes document-based queries so good for soliciting small amounts of input from users and why you see them in so many Web pages.
Document-based queries depend on three ingredients for their successful operation:
The <ISINDEX> tag within the <HEAD> section of an HTML document enables searching of the document by the browser.
A special URL format is generated by adding the query text to the URL, with terms added at the end, delimited by question marks.
Including special arguments in your underlying CGI program processes the query.
Heres how the process actually works:
* The <ISINDEX> tag in the <HEAD> of the document causes the browser to supply a search widget that allows the user to enter keywords. A widget is a generic bit of software that performs a particular task; in this case, the widget handles packages and sends search requests. These keywords are bundled into an HTTP request and passed to the named CGI program. If the CGI program finds no arguments appended to the URL, it returns a default page that delivers the search widget to the browser. This sometimes happens with the first search request because a complete search widget may not be included on every Web page.
At the prompt, users enter a string they want to search for and press Enter (or otherwise cause the string to be shipped to the CGI program).
The browser calls the same URL as before,
except it appends the search string following a question mark. So, if
the search engine programs URL is
and the string to be searched for is tether, then the new URL becomes
The server receives the URL exactly as formatted and passes it to the searchit program, with the string after the question mark passed as an argument to searchit.
This time, the program performs an actual search and returns the results as another HTML page (instead of the default prompt page that was sent the first time).
Every part of this operation depends on the others: The browser activates the <ISINDEX> tag that allows the query to be requested and entered. Then the browser appends the query string to the URL and passes on the query as an argument to the search program. The search program uses the query value as the focus of its search operation and returns the search results to the browser via another custom-built HTML document.
As you build an HTML form, you have important definitions to make, including the assignment of names and associated values to your variables or selections. When users fill out HTML forms, they actually instruct the browser to build a list of associated name-value pairs for each selection made or for each field they fill in.
Name=value pairs take the form
The equal sign (=) separates the name of the field from its associated value. The ampersand (&) separates the end of the values string from the next item of text information in a completed form. For <SELECT> statements where MULTIPLE choices are allowed, the resulting list has multiple name=value pairs where the name remains the same, but the value assignment changes for each value chosen.
|Reading through forms information delivered to a CGI programs standard input (stdin) is a matter of checking certain key environment variables (which we cover in the next section) and then parsing (separating into individual words or units or information) the input data. This reading consists of separating name=value pairs and using the names, with their associated values, to guide subsequent processing. The easiest way to do this, from a programming perspective, is to first parse and split out name=value pairs by looking for the ending ampersand (&), and then divide these pairs into their name and value parts by looking for the equal sign (=).|
Here are some Perl code fragments that you can use to parse a forms input data. (It assumes that, in keeping with our recommendation, you use METHOD="POST" for passing data.)
As part of the CGI environment, the httpd servers software version and configuration are of interest, as are the multiple variables associated with the server. You can use the following shell program to produce a complete listing of such information; this program is a valuable testing tool when installing or modifying a Web server.
echo Content-type: text/plain
echo CGI/1.1 test script report:
echo argc is $#. argv is "$*".
echo SERVER_SOFTWARE = $SERVER_SOFTWARE
echo SERVER_NAME = $SERVER_NAME
echo GATEWAY_INTERFACE = $GATEWAY_INTERFACE
echo SERVER_PROTOCOL = $SERVER_PROTOCOL
echo SERVER_PORT = $SERVER_PORT
echo REQUEST_METHOD = $REQUEST_METHOD
echo HTTP_ACCEPT = $HTTP_ACCEPT
echo PATH_INFO = $PATH_INFO
echo PATH_TRANSLATED = $PATH_TRANSLATED
echo SCRIPT_NAME = $SCRIPT_NAME
echo QUERY_STRING = $QUERY_STRING
echo REMOTE_HOST = $REMOTE_HOST
echo REMOTE_ADDR = $REMOTE_ADDR
echo REMOTE_USER = $REMOTE_USER
echo CONTENT_TYPE = $CONTENT_TYPE
echo CONTENT_LENGTH = $CONTENT_LENGTH
|This UNIX shell script is widely distributed
around the Net. We found this version in the NCSA hoohoo collection
Running this script on a Web server (in this case, the NCSA Web server hoohoo) produces the following output:
CGI/1.1 test script report:
argc is 0. argv is .
SERVER_SOFTWARE = NCSA/1.5.2
SERVER_NAME = hoohoo.ncsa.uiuc.edu
GATEWAY_INTERFACE = CGI/1.1
SERVER_PROTOCOL = HTTP/1.0
SERVER_PORT = 80
REQUEST_METHOD = GET
HTTP_ACCEPT = application/vnd.ms-excel,
image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
HTTP_USER_AGENT = Mozilla/2.0
(compatible; MSIE 3.02; Win32)
SCRIPT_NAME = /cgi-bin/test-cgi
REMOTE_HOST = max2-54.ip.realtime.net
REMOTE_ADDR = 18.104.22.168
Each capitalized variable name (to the left of the equal marks) and its associated output are environment variables that CGI sets. These variables are always available for use in your programs.
Two environment variables are especially worthy of note:
The QUERY_STRING variable is associated with the GET method of information-passing (as is common with search commands) and must be parsed for such queries or information requests.
The CONTENT_LENGTH variable is associated with the POST method of information-passing (as is recommended for forms or other lengthier types of input data). The browser accumulates the variable while assembling forms data to deliver to the server; the variable then tells the CGI program how much input data it must read.
The environment variables also identify other items of potential interest, including the name of the remote host and its corresponding IP address, the request method used, and the types of data that the server can accept. As you become more proficient in building CGI programs, you may find further uses for many of these values.
When you create a form in HTML, each input field has an associated unique NAME. Filling out the form usually associates one or more values with each name. As shipped from the browser to the Web server (and on to the CGI program the URL targets), the form data is a stream of bytes, consisting of name=value pairs separated by ampersand characters (&).
Each of these name=value pairs is URL-encoded, which means that spaces are changed into plus signs (+) and some characters are encoded into hexadecimal. Decoding these URLs is what caused the interesting translation contortions in our Perl code sample in the preceding section.
If you visit the NCSA CGI archive, you
can find links to a number of input-handling code libraries that can
help you build forms. You can find all this information at
The common input-handling code libraries include
Bourne Shell: The AA Archie Gateway contains calls to sed and awk to convert a GET form data string into separate environmental variables. (AA-1.2.tar.Z)
C: The default scripts for NCSA http includes C routines and example programs for translating the query string into various structures. (ncsa-default.tar.Z)
Perl: The Perl CGI-lib contains a group of useful Perl routines to decode and manage forms data. (cgi-lib.pl.Z)
TCL: The TCL argument processor includes a set of TCL routines to retrieve forms data and insert it into associated TCL variables. (tcl-proc-args.tar.Z)
Most of the work of reading and organizing forms information is already widely and publicly available in one form or fashion, which is great for you. This simplifies your programming efforts because you can concentrate on writing the code that interprets the input and builds the appropriate HTML document thats returned to your reader as a response.
In this section, we examine three different CGI programs. We include their code and a great deal of associated information on the CD. Each of these programs is available in three versions: AppleScript (for use on an Apple httpd server), Perl (for use on any system that supports a Perl interpreter/compiler), and C (for use on any system that supports a C compiler).
Before we launch you into these excellent examples, wed like to encourage you to use them in your own HTML documents and CGI programs (were giving away the license to this code, so you can use it without restrictions). We first conclude the discussion portion of this chapter with an investigation of why we chose to implement each of these programs in three forms and how you can choose a suitable language to write your CGI programs.
You can build your CGI programs with just about any programming or scripting language that your Web server supports. Nothing can stop you from ignoring all options that we cover here and using something completely different.
Nevertheless, we can think of good reasons why you should consider using these options and equally good reasons why you should ignore other options. We cheerfully concede that youll find probably as many opinions on this subject as CGI programmers, but we ask that you consider carefully before deciding on a CGI language with which youre likely to spend considerable time and effort learning and using.
For example, we include the NCSA test-cgi script earlier in this chapter. Its written in the basic C shell (a command language common on many UNIX systems) and makes an adequate scripting language for many uses. Nevertheless, we dont think that UNIX shells are suitable for heavy-duty CGI programming, because they mix UNIX system commands freely within their own syntax.
The problem with CGI programming under UNIX is that it depends on the standard input (stdin) and standard output (stdout) devices as the methods for moving data between Web servers and browsers. Each new UNIX process automatically creates its own stdin and stdout; sometimes UNIX shells can get confused regarding where its input comes from and where its output goes. This confusion can be a side effect of running system commands or of spawning tasks (or having one running program start up another program to perform a specific task and report its results back to the original program when it finishes). Whatever the cause, this confusion can lose the input or output for CGI programs, which is the main reason why we dont recommend shell scripts of any kind for heavier-duty CGI applications (for example, forms-processing versus query-handling).
On the plus side, Perl offers straightforward access to UNIX system calls and capabilities within a tightly structured environment. Perl includes the positive features of languages like C, Pascal, awk, sed, and even Basic, and it offers powerful string-handling and output-management functions. Perl is emerging as the favorite of many Web programmers (and is certainly our favorite CGI gurus language of choice). Best of all, Perl implementations are already available for UNIX, DOS, Windows NT, Macintosh OS, and the Amiga, with numerous other implementations under way. Weve had excellent luck moving Perl from one platform to another with only small changes.
We include C because its a powerful programming language and remains a tool of choice in the UNIX environment. What features and functions it doesnt offer as built-ins are readily available in the form of system APIs (Application Programming Interfaces -- the set of routines used to invoke system functions and other kinds of prepackaged functionality within a program) and code libraries. C is also portable (barring the use of system APIs, which can change from one system to another). One version of C is available for just about every platform, and multiple implementations are available for popular platforms and operating systems. Were especially fond of the GNU C and the related GNU Tools from the Free Software Foundation that Richard Stallman pioneered.
|If you use a Macintosh as a Web server, AppleScript is pretty much your only option. Even so, it has proven to be a worthwhile tool for building CGI programs and is widely used in the Macintosh Web community. If youre a real Macophile, be sure to consult Extra 8 for some excellent pointers on Macintosh tools and technologies for the Web.|
Whatever language you choose for your CGI programs, be sure that it provides good string-handling capabilities and offers reasonable output controls. Because youll be reading and interpreting byte stream input and creating HTML documents galore, look for these important capabilities. Also, we recommend that you pick a language that is already widely used in the Web community. Youll likely find lots of related modules, libraries, and code widgets that may save you programming time and make your job easier. But hey -- its your choice!
|Weve just barely scratched the surface of CGI as a topic. For more information, consult your favorite search engine and use CGI or CGI script as your search string. Youll turn up tons of useful references. Also, one author of this book has cowritten two other books that are largely devoted to CGI programming: CGI Bible and World Wide Web Programming Secrets with Perl and CGI, both by Ed Tittel, Mark Gaither, Sebastian Hassinger, and Mike Erwin, and both available from IDG Books Worldwide. Although you can also find other books on this subject, you wont find any others that your authors like more!|
In the following sections, we examine three different CGI programs. Their code and a great deal of associated information is on the CD. Each program is available in three versions: AppleScript (for use on an Apple httpd server), Perl (for use on any system that supports a Perl interpreter/compiler), and C (for use on any system that supports a C compiler).
This short Perl program accesses the system time on the server and writes an HTML page with the current time to the users screen. This program has filenames beginning with time in the CGI subdirectory on this CD.
This AppleScript program establishes a counter that tracks the number of times a page is visited. This kind of tool can provide useful statistics for individuals or organizations curious about how much traffic their pages actually receive. This program has filenames beginning with counter in the CGI subdirectory on this CD.
This C program can distinguish whether the right or left side of a graphic is selected (as well as defining a default to handle when the graphic isnt selected at all). It shows how a script handles the definitions inside an image map file. This program has filenames beginning with ismapper in the compressed CGI files available from the FTP subdirectory on this CD.
Most Web servers are configured to look for CGI programs in a particular directory thats under the servers control. Normal users (including you) probably wont be able to copy CGI scripts into this directory without obtaining help from their friendly neighborhood systems administrator or webmaster. Before you can install and use any CGI program on a particular server, you want to talk to one of these individuals (if not both), tell them what you want to do, and ask for their help and advice in making it happen.
Dont be hurt or surprised if this process takes some time: Systems administrators and webmasters tend to be chronically busy people. You may have to wait a while to get their attention and then discuss your needs with them. Consider the following as you interact with your webmaster:
Make your initial approach to your webmaster by e-mail or a phone call and briefly explain what you want to do. (For example, say something like I want to include a counter CGI on my home page to track visitors.) Among other things, you may discover that they already have a CGI available for that very purpose on your server, and theyll simply tell you how to include its reference in the CGI invocation of your home page. (You may not have to use our code at all!)
Most webmasters are responsible for the safe and proper operation of the server and/or the Web site on which you want to run a CGI program. Dont be offended if you hear them say, Well, I need to look it over first to make sure that its okay before you can use it. Theyre only trying to make sure that youre not going to introduce software that could cripple or compromise their system. So, dont take it personally when this happens -- webmasters are just doing their jobs, and asking for a review shows you that they care about what happens on their (and your) server, which is actually a good thing!
Finally, the systems administrator or webmaster can show you how to install and use CGIs more quickly and efficiently than if you try to figure it out for yourself. Be ready to wait your turn to get some time with this person and be ready to listen and learn when your turn comes. You wont be disappointed!
As the sample CGI programs in this chapter illustrate, youll find as many ways to skin the proverbial CGI as ideas and approaches about how to implement them. We sincerely hope that you can use the tools that we include on this CD, where you can find C, Perl, and AppleScript versions for all three programs. In Chapter 15 of the book, we extend our coverage of server-side Web activities as we investigate search engines, Webcrawlers, and other interesting server-side services.
If you didnt get a warm fuzzy feeling while reading about CGI, youre not alone. CGI can be difficult, especially if youre not a programmer and dont really want to be. But before you throw out the virtual CGI baby with the cyber-bathwater, let us show you how to cheat at CGI!
The secret is to use a forms-designing application to create your forms and the background CGIs automatically. OReilly PolyForm is one great application for this. Answer a few simple questions, click a few buttons, follow the instructions of the semi-intelligent program wizard, and poof! -- out comes a form and matching CGI application.
|Find out more about PolyForm by visiting polyform.ora.com. Unfortunately, PolyForm is only available for Windows computers, and we couldnt find any worthwhile Mac or UNIX software that provides similar features.|
E-mail: HTML For Dummies
Webmaster: Natanya Pitts, LANWrights
Revised -- January 16, 1998