NLC HOMESEARCHSITE INDEXCOMMENTSFRANÇAIS
Publications*

Federal Identifier for the National Library of Canada


Web Forms and CGIs: Making Web Pages Interactive

by Chris Savage and Linda Lee

Network Notes #19
ISSN 1201-4338
Information Technology Services
National Library of Canada

December, 1995


Introduction

The World Wide Web (WWW) continues to sustain unparalleled growth in the volume of information published and in its total number of users. The reasons for its growth are complex. End users are drawn to the WWW because it offers an integration of multimedia data types such as text, images, sound and, to a lesser extent, video. Information producers find the WWW appealing because of its immense potential audience and technological capabilities for publishing. They view the WWW as a synthesis of publishing mediums, combining the sophisticated layout of print documents and the diversity of multimedia formats, with the mass-market distribution of radio and television broadcasting. Yet the WWW is clearly more than a broadcasting technology. It is interactive; permitting bi-directional communication between information publishers and end users. For this reason, it is really a mass-market broadcasting and communication technology. This issue of Network Notes addresses the interactive aspect of the WWW through the use of forms embedded in WWW documents to gather information, accompanied by executable programs, or CGI scripts, that process queries and respond to end-user requests.

Defining HTML forms

Forms are interactive, dynamic documents that permit users to enter data, check off preferences, select options, ask questions and provide comments. Printed forms are simple to design and use; blank space is provided for users to add comments or check off options. However, the authors cannot control how users complete the forms except by encouraging a type of response with clearly written instructions. In this respect, electronic forms are different. Electronic forms created in the Hypertext Markup Language (HTML) can compel users to complete the forms in a particular way by refusing alternative submissions. Options can be limited to one from a selection of many or extended to include multiple choices. Space can be reserved for comments to be entered and file-size limits can be easily defined. Compared to their print counterparts, HTML forms have more powerful functionality, can be simpler to use when they are well-designed, and are more responsive to user needs by returning immediate results. Best of all, most of the functionality can be gained by using simple HTML tagging. This means it requires little labour to create powerful, responsive, yet sophisticated, forms in HTML for use on the WWW.

Forms: What good are they?

A popular use of forms is to enable users to search a collection of documents. Search forms can be simple full-text searches on instances of words, or elaborate multi-fielded searches using Boolean operators, limiters, and wild card truncation. Other popular uses of forms are for soliciting reference questions, surveys, comments, placing orders, registering for services, subscribing to mail lists, or, reporting faults and errors. Forms can also be used in libraries to request acquisitions, interlibrary loans, circulation status, borrower information, and act as an interface with the OPAC.

The cycle of a form action -- from simple document transfer to CGI script

The most common action on the WWW is a simple document transfer:

  1. The client-browser requests a document
  2. The WWW server sends a corresponding packet of HTML code and binary multimedia data
  3. The browser reassembles the data packets, and displays the formatted document.

The complete cycle of a form action is more complex. Forms are HTML documents, written like any Web page with regular HTML code. This means the first half of a form action-cycle is a simple document transfer as the form is retrieved from the server and displayed on the browser. But forms are used to gather information from the user, so unlike ordinary HTML documents, which are static pages of fixed information, forms permit users to input data and resubmit the document to the server. Therefore, the second half of the form action cycle is:

  1. The user enters data into the form
  2. The browser sends the completed form to the server
  3. The server passes the form's contents to an external program
  4. The program processes the data
  5. The program returns a result to the server
  6. The server relays the results to the browser.

Until the completed form is received by the server, the only code the form action uses is HTML and Hypertext Transfer Protocol (HTTP), the underlying WWW server protocol for handling documents. However, once the server receives the completed form, it passes the data to a separate executable program for processing called a Common Gateway Interface (CGI) "script". This script or program either transfers the data to another program for processing or processes the data itself. The significant point here is that the CGI script represents a call to another programming language or network protocol. The CGI script can be extremely powerful, complex and as flexible as its programming language. Thus, CGI scripts represent an escape hatch for the limitations of the HTML and HTTP specifications, enabling other types of information systems with more powerful programming languages to connect to the WWW.

To each form its CGI script

It is critical to note that without a CGI script an HTML form is useless. The WWW server using HTTP alone cannot understand the contents of the form. Therefore a CGI script is required to supplement the limited capabilities of HTTP and interpret the form's contents. However, a specific CGI script must be used because each is specifically created for a particular form. This is because the form contains variable and value fields, for example, the variable "NAME:" and the value "Ralph". The CGI script collects the variable and matching value fields, then processes the data in a predetermined way, such as registering the user "Ralph" in a database. Depending on the purpose of the CGI script, the server may or may not return a result to the browser. In most cases, a message is sent to notify the browser that the form was received and processed. Using the given example, the CGI script may be designed to collect the data, write each field into a database of subscribers, create a new HTML document with the value of the form's variable "NAME:" inserted into a string of text, and then send the document back to the user. For instance, after updating the database, the CGI script may generate a document called "register.html" that the browser displays as "Hello Ralph, you are now registered in our records."

Creating forms and CGI scripts

Creating HTML forms is relatively simple; however, developing CGI scripts can be a laborious endeavour. The most difficult part of creating HTML forms is selecting the content. If the forms already exist in printed format, the content selection process is significantly simplified. For this reason, it is wise to adapt well-conceived, test-proven printed forms to the electronic format whenever possible. Because of the distinct properties of the two media, there will be some differences in their utility and success, but, in most cases, printed forms function well in electronic format.

Once the content of a form is decided, the actual HTML coding is reasonably straightforward. Numerous HTML instruction books and documents on the WWW assist with the HTML markup of forms. But, before the form is marked up in HTML, the accompanying CGI script should be sketched out. The form and CGI script are interdependent; the CGI expects to receive values associated with the form's variable names and, in turn, execute an appropriate command. Therefore, the form's design will affect the CGI script and vice versa. Although creating a form is simple, writing the CGI script is more complex. Designing CGI scripts crosses over into the domain of computer programming and brings with it the related responsibilities of testing, debugging and dissecting security holes that accompany every programming venture. If the CGI developer is not a skilled programmer, it is advisable to borrow test-proven CGI scripts from public archive sites rather than develop a custom CGI script. Many standard forms, such as simple search forms, requests for feedback, reference questions and online registration are widely distributed with matching CGI scripts. Frequently, these form/CGI script pairs can be implemented with little or no editing, sparing developers the frustration involved in developing a custom HTML form and matching CGI script. Yet, there are some advantages to developing custom CGI scripts. The most significant is that they can be designed to query or input data directly into an existing database, using the established fields and formulae. Custom CGI scripts can also be written to comply with site-specific security procedures and operate more efficiently with system resources.

The CGI specification permits a script to be written in any programming language, provided the host system can execute it. CGI scripts can be written in compiled languages such as C, C++, Pascal, Visual Basic, Fortran, and interpreted languages such as Perl, AWK, sed, TCL, DOS batch, or Unix Bourne shell. The first decision a CGI developer will make is whether to use a compiled or interpreted programming language. How do these differ? The program source code written in a compiled language must be fully compiled, or converted, into the operating system's native language before it can be run. Once compiled, two versions of the same program exist: the source code that is useless by itself, and the compiled program that executes directly in the operating system. Compiled programs are intricately tied to the operating system; thus, the same source code must be compiled separately for each type of platform. Scripts written in interpreted languages, however, can be used across multiple platforms, as long as the computer has an interpreter to convert the script line-by-line into the operating system's native language. Unlike compiled programs, the interpreted script is never saved as an executable program. It is interpreted into the native language of the operating system each time it is run. This extra translation step causes interpreted scripts to execute more slowly than compiled programs. Yet, balanced against this detraction, interpreted languages are simpler to learn than compiled languages and interpreted scripts are easier to debug than compiled programs. Also, a script can be stopped at any time, edited and retested with little effort. For these reasons, the current trend in CGI programming is to use interpreted languages; of these, the most popular is Perl. Freely available, Perl runs in the Unix environment (still the dominant operating system for WWW servers), but has also been ported to Windows NT, DOS, VMS and Mac environments. It handles strings of text particularly well and borrows many of the strengths of C and Bourne shell. Several books and documents on the WWW are devoted to teaching Perl, as well as the newsgroup comp.lang.perl for discussing Perl-related issues.

Words of caution

There is some concern about the number of WWW browsers that cannot display and process HTML forms. The most popular and advanced browsers, such as Netscape and Mosaic, support forms, but older versions and some text-based browsers do not. Just as publishers cannot survive without knowing the needs and capabilities of their audiences, WWW developers need to know the technological profiles of their end-users. If most of the target audiences are using old text- based browsers with slow connections, alternative strategies need to be developed to ensure that these users receive comparable levels of service. One solution is to provide e-mail addresses for users to send in comments, rather than use a comments request form via a CGI script.

Other concerns regarding the use of CGIs are: system resource allocation (scripts use the server's CPU, so running too many CGIs at once can decrease the server's performance), threats to system security, data encryption and user authentication. Consequently, CGI scripts need to be thoroughly evaluated and tested before they are implemented, and policies for acceptable uses should be established.

Providing forms for users to search a collection of documents is becoming commonplace. Forms can assist users in locating specific information; however, this should not diminish the importance of developing a logical, browsable web structure. People have unique information searching techniques. While many prefer to use indexes and search forms, others would rather browse the entire collection. Using forms to query a search engine enhances subject access, but it does not satisfy the various information searching behaviours. CGI forms should be considered a convenient rather than an exclusive pathway to finding information on the Web.

Related sources for forms, CGI programming, and Perl

There is much current information about forms and CGI programming, both on the Internet and in bookstores. The following are some suggested starting points:

Forms and CGI

HTML & CGI Unleashed (1995) by John December, Mark Ginsburg, and other contributors.

Indianapolis, IN: Sams.Net Publishing. 830 pages + CD-ROM, $61.95 CAN.

This book provides an in-depth coverage of HTML markup and CGI programming with a special emphasis on Perl.

The Common Gateway Interface -- http://hoohoo.ncsa.uiuc.edu/cgi/overview.ht ml

NCSA's documentation on CGI. It includes an overview of fill-out forms.

CGI tutorial -- http://agora.leeds.ac.uk/nik/Cgi/start.html

Excellent introduction and tutorial on CGI. Companion to the Perl tutorial. (See below)

Perl

Learning Perl (November 1993) by Randal L. Schwartz, Foreword by Larry Wall.

Sebastopol, CA: O'Reilly & Associates, Inc. 274 pages, $24.95 US.

Good introductory, hands-on tutorial designed to help the reader write useful Perl scripts as quickly as possible.

Teach Yourself Perl in 21 Days (November 1994) by David Till.

Indianapolis, IN: Sams Publishing. 700 pages, $29.99 US.

This tutorial and reference book starts with the Perl basics and progresses to advanced features. Comprehensive in scope, it is directed to an audience who need not have had previous programming experience.

Introduction to Perl -- http://www.khoros.unm.edu/staff/neilb/perl/introduction/home.html

Perl tutorial -- http://agora.leeds.ac.uk/nik/Perl/start.html

Excellent Perl tutorial that focuses on CGI script creation. Companion to the CGI tutorial. (See above)

CGI Archives

perlWWW -- http://www.oac.uci.edu/indiv/ehood/perlW WW/

An index of Perl programs and libraries related to the World Wide Web.

Perl scripts area -- http://www.metronet.com/perlinfo/scripts/

Indexed collection of Perl scripts.

C library for CGI programming -- http://sunsite.unc.edu/boutell/cgic/cgic.html


Canada Copyright. The National Library of Canada. (Revised: 1997-07-30).