Chapter 4 -- Understanding Basic CGI Elements
Chapter 4
Understanding Basic CGI Elements
by Bill Schongar
CONTENTS
CGI Behind the Scenes
Environment Variables: Information for the Taking
Server-Specific Environment Variables
Request-Specific Environment Variables
Client-Specific Environment Variables
Scripts to Check Environment Variables
Dealing with URL-Encoded Information
Encoding
Decoding (Parsing) Routines
Use Your Header
Non-Parsed Headers
Content-Type Header
Location Header
Status Header
Returning Output to the Users
STDOUT
File-Based Output
Using CGI programs is somewhat like ordering a pizza and having
it delivered: you call, someone makes it, and then someone sends
the pizza to your place. With CGI, you send a request, the server
processes it, and you get back the results. The whole goal is
that someone (or something) else is supposed to take care of processing
the information that you send: Do you want extra cheese? Pepperoni
and/or sausage? Anchovies? All the instructions and conditions
you send have to be considered as part of the whole operation;
otherwise, you have no use for what gets delivered to you.
Whether giving instructions to a pizza place or sending a registration
form through CGI, the process is the same: You initiate a conversation
to tell someone, or something, what you want done. The main difference,
however, is that the pizza place normally doesn't keep you on
hold while someone makes your pizza.
The information you send as part of your request to whomever (or
whatever) processes it determines the output. To make sure that
you're understood, you have to communicate clearly and pass on
information that makes sense to the receiving end. The basic elements
of CGI that hold the information and keep track of what format
it's in are available to help you with that process. In Chapter
3, "Designing CGI Applications," you learned how to
plan your application; the chapter also introduced you to some
of the basic CGI elements involved in that planning process. In
this chapter, you look at some more specifics of those elements
and a few others, including
Understanding environment variables
Retrieving data from environment variables
Parsing information
Formatting for output
CGI Behind the Scenes
The Common Gateway Interface, or CGI, is really nothing more than
a standard communication method that makes sure that information
between the client and the server gets sent in an understandable
manner. Imagine that everyone in the world, regardless of language,
used a standard form. It could be a form used for job applicants,
a vacation request, a pizza order, or a grocery list-the actual
purpose wouldn't matter. What would matter is that anyone who
looked at that form would recognize it and could understand what
data was contained on it. You wouldn't have to be able to pick
out the word name in 35 different languages to be able
to find another person's name on that form. Although the language
would vary, you would know that the name goes in a specific box,
and you could pick out that box. If the form had a common format,
language wouldn't be as much of a barrier.
Virtually Satisfy Your Appetite
Often, the first implementations of a neat or useful server technology come in a funny form. Knowing that the "average" programmer diets on pizza and soda, is it any surprise to know that CGI is alive and well both for delivering food or checking
to see whether the machine downstairs has any cold sodas left?
Although it won't curb your appetite, you can see one of the earliest (and still coolest) hybrids of CGI and pizza at http://www.ecst.csuchico.edu/~pizza/-home of the original Internet Pizza Server.
If it makes you hungry, don't worry-it has links to places such as Little Caesars Virtual Pizza Page, where you not only can see the pizza, but get it delivered by the closest Little Caesars franchise.
If you need a drink to go with your pizza and live somewhere near Rochester, New York, check out the Coke machine(s) in the Computer Science House at the Rochester Institute of Technology. Maybe you can convince a resident to buy you a drink over the
modem. Find out how (and why) at http://www.csh.rit.edu/proj/drink.html.
In the case of CGI, the common format is outlined by processes-server
receives a request, script is executed, script reads in the data,
script processes the data, script sends back output. At any step,
elements that have been set (or can be set) by that particular
step are in use.
The first step, when a request is sent through CGI, involves the
server doing all the front-end work in gathering data for you.
This step takes care of two things at once:
It puts the information into predefined holding areas.
It formats the data.
All you have to do is look in the storage areas, pick and choose
what you want, and use it in your program. First, you need to
know what information is stored where, and at that point, you
encounter environment variables.
Environment Variables: Information for the Taking
When the Common Gateway Interface gathers information for you,
the amount of information it gathers is extensive-not only the
information that's directly related to your application, but information
about the current state of the session environment, such as who's
executing the program, where they're doing it from, and how they're
doing it. In fact, more than a dozen distinct pieces of environment
information are available every time a CGI application executes.
To store all this information, the CGI functions of the server
place it all into your system's environment variables, allowing
persistent global access to this data from anything that cares
to take a look at it. Just like you might have a PATH
or a HOME environment variable, now you have environment
variables telling you what script is being executed and from where
it's being executed.
So what gets set and why? Each one of the many pieces of information
has its own purpose, and it may or may not be used by your application.
So to make sure that the server doesn't skip anything that might
be of use, it records all the information it can get its hands
on. You've already been briefly introduced to a variety of environment
variables in the previous chapter, but they can be broken down
further so that you can look at what part of the process they
assist in. Three distinct sets of environment variables exist,
if grouped by purpose. The first one of these groups is called
server-specific variables.
Server-Specific Environment Variables
When it records information, the server starts with itself. Server-specific
variables, summarized in table 4.1, record information such as
the port the server is running on, the name of the server software,
the protocol being used to process requests, and the version of
the CGI specification the server con-forms to.
Table 4.1 Server-Specific Environment Variables
VariablePurpose
GATEWAY_INTERFACECGI version that the server complies with. Example: CGI/1.1.
SERVER_NAMEServer's IP address or host name. Example: www.yourhost.com.
SERVER_PORTPort on the server that received the HTTP request. Example: 80 (most servers).
SERVER_PROTOCOLName and version of the protocol being used by the server to process requests. Example: HTTP/1.0.
SERVER_SOFTWAREName (and, normally, version) of the server software being run. Example: Purveyor / v1.1 Windows NT.
In general, the information provided by the server-specific environment
variables isn't going to be of much use to your application because
it almost always is the same. The real exception to the rule takes
place when you have a script that can be accessed by multiple
servers or by a server that supports virtual addressing-one server
responding to multiple IP addresses. For instance, if your server
is of a commercial nature, you might have one machine running
several virtual servers on different IP addresses to provide for
a unique server name for each customer.
The outline of each server-specific variable has already been
shown in Chapter 3 "Designing CGI Applications," but
for reference, the following is the output of part of a Perl script
that you look at later. It serves just to echo the content of
the environment variables in the order you examine them in this
chapter-in this case, as they appear when the script echo.pl is
run on a Windows NT system. You'll see the code for echo.pl a
little later in the chapter, after you examine the different variables.
Gateway Interface:CGI/1.1
Server Protocol:HTTP/1.0
Server Name:bills.aimtech.com
Server Port:80
Server Software:Purveyor / v1.1 Windows NT
After the server has the chance to describe itself to your program,
it moves on to the meat of the information-the components directly
related to the user's request.
Request-Specific Environment Variables
Unlike the information about the server, which rarely changes,
the information for each request is dynamic, varying not only
by which script is called but also by data sent and the user who
sent it. At one point or another, all this information may be
of use to a script you write, but three basic environment variables
are always important to any script: REQUEST_METHOD, CONTENT_LENGTH,
and QUERY_STRING. The latter two are used in different
situations:
CONTENT_LENGTH is useful to POST requests
for determining input size.
QUERY_STRING is the data passed when a GET
request is used.
The combination of variables tells you how the request was sent,
determines how much information was available, and can provide
you with the information itself. Unless your script accepts no
input, you'll be using these three variables quite a bit. Table
4.2 outlines these variables, as well as the other request-specific
environment variables.
Table 4.2 Request-Specific Environment Variables
VariablePurpose
AUTH_TYPEAuthentication scheme used by the server (NULL if no authentication is present). Example: Basic.
CONTENT_FILEFile used to pass data to a CGI program (Windows HTTPd/WinCGI only). Example: c:\temp\324513.dat.
CONTENT_LENGTHNumber of bytes passed to standard input (STDIN) as content from a POST request. Example: 9.
CONTENT_TYPEType of data being sent to the server. Example: text/plain.
OUTPUT_FILEFile name to be used as the location for expected output (Windows HTTPd/WinCGI only). Example: c:\temp\132984.dat.
PATH_INFOAdditional relative path information passed to the server after the script name, but before any query data. Example: /scripts/forms.
PATH_TRANSLATEDSame information as PATH_INFO, but with virtual paths translated into absolute directory information. Example: /users/webserver/scripts/forms.
QUERY_STRINGData passed as part of the URL, comprised of anything after the ? in the URL. Example: part1=hi&part2=there.
REMOTE_ADDREnd user's IP address or server name. Example: 127.0.0.1.
REMOTE_USERUser name, if authorization was used. Example: jen.
REQUEST_LINEThe full HTTP request line provided to the server. (Availability varies by server.) Example: GET /ssi2.htm HTTP/1.0.
REQUEST_METHODSpecifies whether data for the HTTP request was sent as part of the URL (GET) or directly to STDIN (POST).
SCRIPT_NAMEName of the CGI script being run. Example: echo.cgi.
Of all these variables, REQUEST_METHOD, QUERY_STRING,
CONTENT_LENGTH, and PATH_INFO are the most commonly
used environment variables. They determine how you get your information,
what it is, and where to get it, and they pass on locations that
may be needed for processing that data. In the following sections,
you look at them in an arbitrary estimation of how often they're
used.
REQUEST_METHOD
When you try to determine how data has been sent to your application,
the method of the request is the first thing you need to identify.
If you're using a form, you can choose which data-sending method
is used; if you're using a direct link such as <a
href=/scripts/myscript.pl?data>, your script is invoked
with the GET method.
Identifying REQUEST_METHOD is necessary for any application
except one type-a program that requires no input. If your application
is a random-link generator or a link to output a dynamically generated
file that doesn't depend on what the user inputs, you don't need
to know whether it was sent via GET or POST
because your program doesn't require any input. It might want
to read the other environment variables, but no input data exists
to be parsed, just a semi-fixed output; the end result doesn't
depend on any data from users, just their action of executing
it.
Assuming that your CGI application is like many, though, getting
the data from the link or user is the next thing on your list
of processes. Then you need either QUERY_STRING or CONTENT_LENGTH.
NOTE
Other possible selections are available for the REQUEST_METHOD value besides just GET and POST, including DELETE, HEAD, LINK, and UNLINK. The use of these other values isn't as common, but in
case you do encounter them, you'll want to provide a fall-back case for dealing with these other methods, as discussed in Chapter 3 "Designing CGI Applications."
QUERY_STRING
The data that's passed when using the GET method is normally
designed to be somewhat limited in size, because QUERY_STRING
holds all of it in the environment space of the server. When your
application receives this data, it comes URL encoded. That means
it's in the form of ordered pairs of information, with an equal
sign (=) tying together two elements of a pair, an ampersand (&)
tying pairs together, a plus (+) sign taking the place of spaces,
and special characters that have been encoded as hexadecimal values.
A sample from a form with multiple named elements might produce
a full request that looks like this:
http://server.host.com/script.pl?field1=data1&field2=
data2+more+data+from+field2&field3=data3
The part that comprises QUERY_STRING is automatically
chopped to include only that information after the question mark
(?). So, for that request, the QUERY_STRING
would be as follows:
field1=data1&field2=data2+more+data+from+field2&field3=data3
Interpreting this URL-encoded information is easy and just requires
a parsing routine in your script to break up these pairs. In "Dealing
with URL-Encoded Information" later in this chapter, you'll
see how parsing can be done easily, and become a little more familiar
with URL encoding.
CONTENT_LENGTH
When the POST method is used, CONTENT_LENGTH
is set to the number of URL-encoded bytes being sent to the standard
input (STDIN) stream. This method is useful to your application
because no end of file (EOF) is sent as part of the input
stream. If you were to look for EOF in your script, you
would just continue to loop, never knowing when you were supposed
to stop processing, unless you put other checks in place. If you
use CONTENT_LENGTH, an application can loop until the
number of bytes has been read and then stop gracefully. The formatting
that will be read from the STDIN block follows the same URL-encoding
methods of ordered pairs and character replacement as QUERY_STRING
and can be parsed the same way.
NOTE
When considering what method (GET or POST) is best suited to your application, consider the amount of data being passed. GET relies on passing all data through QUERY_STRING and thus can be limited in size. For large
amounts of data, the STDIN buffer has a virtually unlimited capacity and makes a much better choice.
PATH_INFO
Another thing that you can include in the URL sent to the server
is path information. If you place this data after the script but
before the query string, your application can use this additional
information to access files in alternate locations.
For instance, if you have a script that might need to search in
either /docs/november or /docs/december, you can pass in the different
paths, and the server automatically knows the location of these
files relative to the root data directory for your server. So
if you use the URL http://www.xyz.com/scripts/search.cgi/docs/december?value=abc,
the PATH_INFO would be /docs/decem-ber. The companion
variable PATH_TRANSLATED can give you the actual path
to the files based on PATH_INFO, instead of just the
relative path. So /docs/december might translate on your server
as /users/webserver/marketing/docs/december. Using this variable
saves you the work of having to figure out the path for yourself.
Other Variables
In addition to the primary variables, some other data could come
in quite handy in your application. Looking at each individual
environment variable is a good idea because you'll become familiar
with just what purpose the variable is designed for, as well as
what other purpose you could find for it.
You'll automatically know where a user is calling you from because
REMOTE_ADDR provides his or her IP address. In case your
script forgot, you can see what its name is (by using SCRIPT_NAME).
Path information can be passed to your program to reference data
files in alternate locations, and you can see the full URL that
led someone to the script (by using REQUEST_LINE). Whether
you use the information is up to you, but it's there for the taking.
Client-Specific Environment Variables
Last but not least is information that comes from the software
from which the user accessed the script. To identify these pieces
of information uniquely, the variables are all prefixed with HTTP_.
This information gives you background details about the type of
software the user used, where he or she accessed it, and so on.
Table 4.3 shows three of the most commonly used client-specific
variables: HTTP_ACCEPT, HTTP_REFERER, and HTTP_USER_AGENT.
Table 4.3 Common Client-Specific (HTTP_)
Environment Variables
HTTP_ VariablePurpose
ACCEPTLists what kind of response schemes are accepted by this request
REFERERIdentifies the URL of the document that gave the link to the current document
USER_AGENTIdentifies the client software, normally including version information
The formats of these HTTP header variables look like the following:
HTTP_ACCEPT:*/*,image/gif,image/x-xbitmap
HTTP_REFERER:http://server.host.com/previous.html
HTTP_USER_AGENT:Mozilla/1.1N (Windows, I 32-bit)
These variables open up some interesting possibilities. For instance,
certain browsers support special formatting (tables, backgrounds,
and so on) that you might want to take advantage of to make your
output look its best. You can use the HTTP_USER_AGENT
value, for example, to determine whether your script has been
accessed using one of those browsers, and modify the output accordingly.
However, because some browsers accessing your script may not set
the HTTP_USER_AGENT field to a value you're expecting,
make sure that you include a default case that will apply if you
can't isolate what type of browser is being used.
In addition to the variables listed in table 4.3 are other HTTP
environment variables, but you're much less likely to run into
browsers that set these fields with any regularity until newer
browsers integrate them and people then migrate to the newer browsers.
For reference, though, table 4.4 shows some other client-specific
environment variables that you may want to examine.
Table 4.4 Additional Client-Specific (HTTP_)
Environment Variables
HTTP_ VariablePurpose
ACCEPT_ENCODINGLists what types of encoding schemes are supported by the client
ACCEPT_LANGUAGEIdentifies the ISO code for the language that the client is looking to receive
AUTHORIZATIONIdentifies verified users
CHARGE_TOSets up automatic billing (for future use)
FROMLists the client's e-mail address
IF_MODIFIED_SINCEAccompanies GET request to return data only if the document is newer than the date specified
PRAGMASets up server directives or proxies for future use
NOTE
Not every browser fills out the same HTTP_ variables. If you make your application dependent on any, you can run into problems. Be sure to verify support of HTTP_ environment variables for the browsers you're concerned about.
If you want to know for sure whether a specific browser sets certain
HTTP_ headers (because new versions and new browsers
are always released), you can find out in two ways.
First, you can look at the survey of browsers located at http://www.halcyon.com/htbin/browser-survey.
This list shows a large number of browsers, ordered by name and
version, with an output page for each that shows the headers that
they send.
If you have some browser that didn't make it onto that list, or
you want to make your own survey, you need to write a script that
checks the HTTP_ headers you're interested in. The script
itself doesn't have to be complex, just echo back the environment
variables that you're interested in and then access that script
with the browsers you want to check.
The next section provides two short examples for performing these
checks-one in Perl (version 4) and one in UNIX sh script.
You also can use these scripts to check any environment variable
you're interested in just by changing the variable names that
are used.
Scripts to Check Environment Variables
Without too much work, you can write your own simple scripts to
check for the existence of specific environment variables. Because
all the environment variables are read into a program in the same
way, it doesn't matter whether you're checking for a server-specific
variable, a request-set variable, or even a client-set variable-the
methodology is the same.
The scripts in listings 4.1 and 4.2 are simple cases for checking
whatever variables you're interested in. They demonstrate how
similar the functions are in two different scripting languages.
Listing 4.1 Checking Variables with a Perl 4 Script
#!/bin/perl
#A Generic Environment Variable checker
print "Content-Type: text/plain \n \n";
print "Browser Software: $ENV('HTTP_USER_AGENT') \n";
print "\n";
print "Originating Page: $ENV('HTTP_REFERER') \n";
#... and so on...
print STDOUT "<UL>\n";
foreach $var (sort keys (%ENV){
print STDOUT "<LI>$var: $ENV{$var}\n";
}
print STDOUT "</UL>";
TIP
If you want to see all the environment variables, you can let Perl cycle through them for you, rather than have to identify each one uniquely. Not only is the code smaller, but you don't run the risk of mistyping or forgetting a variable you may have
wanted to know about.
Listing 4.2 Checking Variables with a sh Script
#!/bin/sh
echo Content-Type: text/html
echo
echo Browser Software: $HTTP_USER_AGENT
echo Originating Page: $HTTP_REFERER
Dealing with URL-Encoded Information
After you find all the data you want and are ready to let your
program do some interpretation and processing, you need to take
the information and break it up into manageable parts first. To
do that, you need to know how the data is formatted.
Encoding
As you learned previously, data is formatted in ordered pairs,
regardless of where it goes: QUERY_STRING or STDIN.
The benefit is that this pairing and replacement, called URL
encoding, allows you to use a common routine to evaluate this
data regardless of the method. All you have to be aware of are
the reserved characters that are used as part of URL encoding
and the format that's used to pass values representing those reserved
characters for literal use (see table 4.5).
Table 4.5 Reserved Characters Used in Encoding
CharacterName
Purpose
+Plus sign
Separates data
=Equal sign
Joins named fields and their values
&Ampersand
Strings together joined pairs
%Percent sign
Denotes hexadecimal value to follow
Suppose that you want to send a plus sign as part of the data.
Sending the literal character, like any reserved character, is
out of the question. Instead, you send the hexadecimal value (the
reason for having the % sign as a reserved value). To
be passed correctly, a hexadecimal value is always formatted as
%XX, where XX represents the hexadecimal value
of a specific ASCII character. For instance, the value for the
plus sign is passed as %2b. In the parsing routine, you
need to check for the % sign and its two following digits
and then use the functionality of your scripting language to change
it back into a literal value for use.
Now before you worry about just how you're expected to deal with
all this on the client side, you should know one thing: It's done
automatically-you don't have to do a thing. The only possible
exception is if you're creating an explicit link to a CGI program,
such as the following:
<a href="/scripts/myscript.cgi?value1=abcde&value2=more+info>
Here, because you're setting up exactly what gets passed, you
have to do the formatting yourself. This isn't too common, but
occasionally you may want to use it for a dynamic process that
can't take advantage of server-side includes.
Decoding (Parsing) Routines
Rather than do all the work of creating your parsing routine from
scratch, you can use one of the multitudes of scripts available
for general use in almost every language. The authors of these
libraries and routines save you the work, which is always a benefit.
One of the more prolific libraries for Perl takes care of this
work for you-cgi-lib.pl by Stephen Brenner. This library allows
you to take an otherwise tedious task and make it simple. For
instance, reading and parsing the input becomes as simple as
require 'cgi-lib.pl';
&ReadParse(*input);
You now have values in the variable array input that
you can bend to your will. Behind the scenes, all the ordered
pairs have been broken down in the subroutine ReadParse,
and each individual pair has been assigned its own name as part
of input and had its appropriate data value assigned
to it. To get a better understanding of just how the code is working,
look at the ReadParse source itself (in listing 4.3).
Like many well-written pieces of code, it's already been commented
by its creator, but in some places additional comments have been
added off to the side for further clarification.
Listing 4.3 cgi-lib Source Code
# Source for CGI-LIB.PL, by Stephen Brenner:
# ReadParse
# Reads in GET or POST data, converts it to unescaped text, and
# puts one key=value in each member of the list "@in"
# Also creates key/value pairs in %in, using '\0' to separate
# multiple selections
# If a variable-glob parameter (e.g., *cgi_input) is passed to
# ReadParse, information is stored there, rather than in $in, @in,
# and %in.
sub ReadParse {
local (*in) = @_ if @_;
local ($i, $loc, $key, $val);
# Read in text #Checks the data-sending method
if ($ENV{'REQUEST_METHOD'} eq "GET") {
$in = $ENV{'QUERY_STRING'};
} elsif ($ENV{'REQUEST_METHOD'} eq "POST") {
read(STDIN,$in,$ENV{'CONTENT_LENGTH'});
#Reads in CONTENT_LENGTH bytes of STDIN
}
@in = split(/&/,$in); #Splits ordered pairs at the "&" sign
foreach $i (0 .. $#in) { #Processes ordered pairs
# Convert plus's to spaces
$in[$i] =~ s/\+/ /g;
# Split into key and value.
($key, $val) = split(/=/,$in[$i],2); # splits on the first =.
# Convert %XX from hex numbers to alphanumeric
$key =~ s/%(..)/pack("c",hex($1))/ge;
$val =~ s/%(..)/pack("c",hex($1))/ge;
# Associate key and value
$in{$key} .= "\0" if (defined($in{$key})); # \0 is the multiple
$in{$key} .= $val; # separator
}
return 1; # just for fun
}
If you want to use cgi-lib.pl, the application has been provided
for you on the CD-ROM accompanying this book.
Many other libraries are available in almost every scripting language.
They're already in use by countless other users, so you even remove
testing time from your already busy schedule by making use of
code that already has the functionality you were looking to create.
You can find these libraries with a search on CGI Library in
most search engines, or use the list provided in Chapter 3 "Designing
CGI Applications."
CAUTION
If you're planning to write your own parsing routine, be very careful in how you do it. Things such as limited buffer sizes and open-ended functions that could be used to execute things such as Perl eval statements can let someone into your
system. When in doubt, don't let it through.
Use Your Header
Just like header information accompanies the incoming data, a
header must let the server and client know what kind of information
is being sent back. Called a response header, it can be
one of three different types: content-type, location, or status.
NOTE
When using headers in your script, you must separate the header from the body (if any) of your response with blank lines to make sure that it's interpreted correctly. Otherwise, you end up with a somewhat ordered mess instead of a correctly returned
document.
Non-Parsed Headers
In certain cases, you may not want your application to rely on
the server to process your program's response. Whether due to
overhead or some special response that's easier to do outside
your server's interpretation, the decision to use non-parsed header
(NPH) files places a little more work on your shoulders.
To function as an NPH return, the output data from your program
must contain a complete HTTP response. That means providing the
HTTP version and status code, the general header, response header,
entity header, and entity body. What does all that mean in plain
English? Well, look at an example of NPH output, taken from the
original NCSA CGI documentation, and add some comments to it:
HTTP/1.0 200 OK #HTTP Version, Status Code
Server: NCSA/1.0a6 #General Header
Content-type: text/plain #Entity Header
Text goes here... #Entity Body
As you can see, it's pretty straightforward once the terms are
cleared up. All the client really wants to know is what protocol
the response conforms to, if there's a status message it needs
to concern itself with (such as errors), what type of data it's
receiving, and what the data is. The main restriction is that
the file name of the CGI application must begin with nph-
to specify that the server shouldn't parse the return.
There are no hard-and-fast rules as to when it's right to use
NPH output. If it works for what you want to do, your server load
is normally very high, and you feel like using it, that could
make it a good candidate. For the most part, however, the CGI
libraries and application samples that you'll come across prefer
to place a little of that work on the server.
Content-Type Header
The most common response header is content-type, which tells the
client software to expect some data of a specific type, based
on the supported MIME (Multipart Internet Mail Extensions) types.
These types are covered in detail later in Chapter 10, "Using
MIME with CGI," and are outlined in table 4.6. One of the
more common content types to be returned in your CGI application
is text/html, meaning that you're sending back an HTML document,
so it should be interpreted as one, with all tags and other elements
converted for display.
Table 4.6 Common MIME Types
TypeCategory
applicationApplication data, such as a compressed file
audioAudio data, such as RealAudio
imageImage data, such as a counter
textText-based information, including plain and HTML
videoVideo data (MPEG, AVI, QuickTime)
Location Header
If you were to create a random link program, you probably wouldn't
want the results to come back as an HTML page with a URL link
that says, "Click here to go to the link that has been selected
randomly." You would want users to make one selection that
says "Random Link," and automatically be taken to that
link after selecting it. The same would hold true for some search
programs in which you might have only one possible match, or if
you have a page to return if a function fails. In any of these
cases, your best bet would be to use a location header.
As the name implies, the location header specifies that the data
you're returning is a pointer to another location, normally a
full URL. It's in the format of Location: http://server.host.com/document.
TIP
A number of browsers support enhanced HTML formatting commands that you may want to take advantage of, but the commands may create formatting problems if the user doesn't have a browser with those particular enhancements. You can use
HTTP_USER_AGENT to determine the type of browser client, and then redirect the user to an appropriately formatted page with the Location: header.
Status Header
The status header is the basic element for use in returning
error codes. If you don't have specific pages to be used when
returning an error, you can just use the built-in codes to let
the server send back the error message to be interpreted by the
client. Table 4.7 lists some common status codes.
Table 4.7 Some Common Status Codes
CodeResult
Description
200OK
The request was carried out with no problems.
202Accepted
The request has been accepted but is still being processed.
301Moved
The document has been moved to a new location.
302Found
The document is on the server but at a different location.
400Bad Request
The request's syntax was bad.
401Unauthorized (AUTH_TYPE).
The server has restrictions on the document.
403Forbidden
The request was forbidden, due to access rights or other reasons.
404Not Found
The request couldn't find a match (or your Perl script is missing a ;).
500Internal Error
The server unexpectedly failed to carry out the request.
502Service is overloaded
The server can't process any more requests now.
NOTE
One of the most frustrating things to encounter when writing a CGI script in Perl is getting a 404 Not Found error instead of your expected output. When you encounter this, make a habit of double-checking your script for missing
or misplaced semicolons (;), which Perl uses to terminate a line of code. Just one missing piece of punctuation can drive you crazy.
For a more complete list of status codes, see http://www.w3.org/hypertext/WWW/Protocols/HTTP/HTRESP.html.
Returning Output to the Users
After all the work you've done getting the data, interpreting
it, processing it, and deciding what type of information you're
going to send back, all that's left to do is send it. To do that,
you'll need three things: a header, content, and a way to output
it to the user.
You already know about headers, such as content-type, for specifying
what kind of information you're returning. The data that your
program sends back can be anything, but it just gets sent after
the header and the rest is taken care of. The only remaining item
is determining how the user will get the data back.
STDOUT
Just like you can read data sent to the standard input stream,
you can send information back out through the standard output
(STDOUT) stream to the waiting server. By default, your programming
language of choice probably makes this process easy. Just pretend
that you're going to print something directly to the screen, which
is normally what STDOUT is, and the server takes care of the rest
for you.
For instance, if you send back a header telling the server and
client to expect HTML code or text, just send it back as standard
text, as follows:
Perl:
Print "Content-type: text/html \n";
Print "\n" #The blank line separates the Content from its header.
Print "<h1>Hello World. </h1> \n";
sh:
Echo 'Content-Type'
Echo '<h1>Hello World. </h1>'
Suppose that your program outputs records from mailing-list requests
or just a plain old log of who used the script. File output is
accomplished by redirecting statements like the preceding ones.
Perl uses file handles (OPEN MY_FILE, ">>\home\file1.txt"),
whereas sh scripts can do a number of things by using
> redirection:
Perl:
Print MY_FILE "Hello World. \n";
sh:
cat 'Hello World' >> myfile
Whatever output method you choose-whether it's a pointer to data
somewhere else or data you send back yourself-after you send it
to STDOUT, the rest of the work is done for you. The server and
the client negotiate the connection and translation work to get
what you sent to the client into the right place and in the form
you specified.
File-Based Output
In certain instances, the result of a CGI program's execution
is just the location of a file or the creation of an output data
file. The latter of these occurs when the server has set the OUTPUT_FILE
environment variable, which means that a server such as Win HTTPd
is expecting to go out to a specific file name and read everything
from there, rather than from STDOUT.
There's no real "trick" to dealing with these situations,
unless you want to create the output file and then perform some
subsequent operation on it: As soon as the file is there, the
server reads that as a response and brings it into place. So be
sure not to copy something to the final OUTPUT_FILE name
until it's ready to be received by the server.
Wyszukiwarka
Podobne podstrony:
ch4 (4)ch4 (9)ch4CH4 Nieznanych4Cisco2 ch4 Focus0472113038 ch4Cisco2 ch4 Conceptch4 tsh2ch4ch4 (14)ch4 tsh4Ch4CH4 (5)ch4 tsh4ch4 (13)ch4 tsh2więcej podobnych podstron