ch15 (3)




Chapter 15 -- Generating HTML Documents in Real Time




Chapter 15
Generating HTML Documents in Real
Time

by Jeffry Dwight


CONTENTS


Static HTML

Real-Time HTML


Benefits of Real-Time HTML

Methods of Generating Real-Time HTML


Near Real-Time HTML

Server Performance Considerations




HyperText Markup Language (HTML) lets you publish text and graphics
in a platform-independent way. Using HTML, you can easily, via
embedded links, weave a world full of sites together.

In this chapter, you examine static and dynamic HTML, concentrating
on the latter. Dynamic, or real-time, HTML extends the viability
of the Web far beyond its original conception.

You learn what makes real-time HTML tick and how to produce it
in a variety of ways. Specifically, this chapter provides

An overview of static HTML
The benefits of real-time HTML
Methods of generating real-time HTML, including scheduled
jobs, regular CGI or SSI, Client Pull, and Server Push
An example of dynamically redirecting a browser to a static
HTML page appropriate for that browser
An example of Server Push
Server performance considerations


Static HTML

Need to review the complete works of Mark Twain? Want to find
the address of a manufacturer in Taiwan? Need the phone number
for the White House? Ever wondered how to spell floccinaucinihilipilificatrix,
or what it means? (Yes, that's a real word. You won't find it
in any dictionary except the Old English Dictionary, though,
so put away your Webster's Collegiate.)

The answers are only as far away as your favorite search engine.
These types of references are perfectly suited to the Web. They
seldom, if ever, need revision; after they're written and thrown
on a page, other sites can establish links to them, search engines
can catalog them, and you can find them-today, tomorrow, next
week, or next year. Because the markup language used to create
these pages is HTML and the content of the pages is static (relatively
unchanging), such pages are called static HTML.

But what if you want to know the stock prices-not 10 hours ago
or 10 days ago, but right now? What if you want to know the arrival
time of American Airlines Flight 101? What if you need to know
the ambient temperature in Brisbane as of 30 seconds ago?

In these cases, static documents just won't do. Not even if a
diligent, never-sleeping Webmaster does his level best to keep
the documents updated. For these sorts of applications, you need
real-time, or dynamic, HTML.

Real-Time HTML

All CGI-generated HTML is technically "real-time" in
that it's generated on the fly, right when it's needed. In data
processing circles, however, the term refers more to the data
itself than the production thereof.

Therefore, a CGI program that talks to a hardware port and retrieves
the current temperature and then generates HTML to report it would
be considered real-time. A CGI program that looks up your birthday
in a database wouldn't.

In this chapter, I don't worry too much about the technical definitions.
I call all CGI programs that produce time-sensitive or user-sensitive
output "real-time." This includes uses such as the following:




Current temperature


Quote of the day



Current time and date


Network or server statistics



Election returns


Package delivery status



Stock market data


Animations and other special effects



Page-hit count for a home page


Browser-specific pages






Benefits of Real-Time HTML

The prime, and most immediately apparent, benefit of real-time
HTML is that the information is fresh. Getting the stock market
report from yesterday's closing is one thing; finding the value
of a specific stock right this minute is something else altogether.
The information has different value to the consumer. People pay
for up-to-the-minute information.

Another, somewhat less obvious, benefit is that real-time HTML
can make your pages seem livelier. For example, in the next chapter
you examine a page counter and a random-quote generator. You can
put them together on a page to produce output like this:

First visit: "You're the 1st visitor to be amused by
this page."
Second visit: "You're the 2nd visitor to be flabbergasted
by this page."
Third visit: "You're the 3rd visitor to be terrified
by this page."


And so on. Granted, this particular example is rather trivial.
Many readers may not even notice that the wording changes each
time, and those who do won't have their lives, careers, or religion
changed by it. But this example should give you an idea of the
sorts of pages you can make by using real-time document generation.

Methods of Generating Real-Time HTML

The following are the four main methods of generating dynamic
pages:

Scheduled Jobs
Regular CGI or SSI
Client Pull
Server Push


In the following sections, you tackle them in order.
Scheduled Jobs

A scheduled job is a batch file, shell script, or other
program that runs at a regular interval. These jobs usually run
in the background-that is, invisibly and independent of the foreground
task-and may run once a month, once a day, or once a minute. The
interval is up to you. A special case is the program that runs
continuously (called a daemon in the UNIX world, and a
service in the Windows NT world), spending most of its
time asleep, and waking up only periodically to accomplish some
task. Usually, though, background jobs are scheduled. They run
at the appointed time, do their jobs, and quit, only to repeat
at the next scheduled time.

The method of scheduling varies from operating system to operating
system. In UNIX, you find the cron utility most appropriate.
Under Windows NT, the AT command makes the most sense.

Scheduled jobs are useful for information that changes infrequently
but regularly. A quote-of-the-day program is probably the best
example. You don't need to invoke a CGI program to retrieve or
regenerate a program that changes only once a day. It's far better
to write a program that updates your HTML at midnight and then
let the page get retrieved normally.
Regular CGI or SSI

For page counters and similar programs, either CGI or SSI (see
the next chapter for examples of SSI) make the most sense. The
kind of information being generated is what drives your choice.
Because a page count changes only when a page is retrieved, updating
it then makes sense. A scheduled job is clearly inadequate for
up-to-the-moment data, and the remaining methods-Client Pull and
Server Push-are inappropriate because you don't want a continuous
update.

A trivial, but nonetheless useful, example of using CGI to provide
dynamic HTML is a CGI program that redirects the browser to a
static page appropriate for that browser. For this example, assume
that you want to provide different pages for each of the following
browser types: Netscape, Microsoft Internet Explorer, and Lynx.
Any browser that can't be identified as one of these three gets
redirected to a generic page.

ByAgent is a complete working sample of using CGI to provide a
dynamic response. You should be able to compile it for any platform.
You can find the source, plus sample HTML files and a compiled
executable for the 32-bit Windows NT/Windows 95 environment, on
the CD-ROM accompanying this book.

Compile the code (as shown in listing 15.1) and name it byagent.exe.
Put the compiled executable in your CGI-BIN directory. If you're
using a 32-bit Windows environment, you can skip the compile step
and just copy byagent.exe from the CD-ROM to your CGI-BIN directory.

To test this program, you need to create a number of static HTML
files. The first will be used to demonstrate the others. Call
it default.htm.


Listing 15.1  default.htm: HTML to Demonstrate ByAgent





<html>
<head><title>ByAgent</title></head>
<body>
<h1>ByAgent Test Page</h1>
This page demonstrates the ByAgent CGI program. Click
<a href="/cgi-bin/byagent.exe?">here</a> to test.
</body>
</html>




As you can see, this code is fairly straightforward. If your cgi-bin
directory is called something else, correct the link in the preceding
code.

Now you can create four individual pages: one for Netscape, called
netscape.html (see listing 15.2); one for Lynx, called lynx.html
(see listing 15.3); one for Microsoft Internet Explorer, called
msie.html (see listing 15.4); and one for everyone else, called
generic.html (see listing 15.5).


Listing 15.2  netscape.html: Target Page for Netscape
Browsers




<html>
<head><title>ByAgent</title></head>
<body>
<h1>ByAgent</h1>
Congratulations! You got to this page because your browser identified
itself as a Netscape (or compatible) browser.
</body>
</html>





Listing 15.3  lynx.html: Target Page for Lynx Browsers





<html>
<head><title>ByAgent</title></head>
<body>
<h1>ByAgent</h1>
Congratulations! You got to this page because your browser identified itself
as a Lynx (or compatible) browser.
</body>
</html>





Listing 15.4  msie.html: Target Page for MSIE Browsers





<html>
<head><title>ByAgent</title></head>
<body>
<h1>ByAgent</h1>
Congratulations! You got to this page because your browser identified itself
as a Microsoft Internet Explorer (or compatible) browser.
</body>
</html>





Listing 15.5  generic.html: Target Page for Generic
Browsers




<html>
<head><title>ByAgent</title></head>
<body>
<h1>ByAgent</h1>
Congratulations! You got to this page because your browser identified itself
as a something other than Netscape, Lynx, or Microsoft Internet Explorer.
</body>
</html>




Put these files together in a directory, and load default.htm
into your browser. Click the test link. You should see the page
corresponding to your browser. Listing 15.6 shows the actual code
to accomplish the redirection.


Listing 15.6  byagent.c: Source Code for ByAgent
CGI Program




// BYAGENT.C
// This program demonstrates how to redirect
// a browser to a page that matches the browser.
// It depends on the browser's self-identification,
// so a browser that lies can get the wrong page.
// In general, most programs that claim to be
// "Mozilla" are either Netscape, fully compatible
// with Netscape, or Microsoft Internet Explorer.
// The special case of MSIE can be identified
// because although it says "Mozilla," it also
// says "MSIE."

#include <windows.h>
#include <string.h>
#include <stdio.h>

void main() {

// First declare our variables.
// We'll use three pointers and a character
// array. The pointers are UserAgent, a
// pointer to the CGI environment variable
// HTTP_USER_AGENT; Referer, a pointer to
// the CGI environment variable
// HTTP_REFERER; and p, a generic pointer
// used for string manipulation. The
// remaining variable, szNewPage, is where
// we build the URL of the page to which
// the browser gets redirected.

char *UserAgent;
char *Referer;
char *p;
char szNewPage[128];

// Turn buffering off for stdout
setvbuf(stdout,NULL,_IONBF,0);

// Get the HTTP_REFERER, so we know our directory
Referer = getenv("HTTP_REFERER");

// Get the user-agent, so we know which pagename to
// supply
UserAgent = getenv("HTTP_USER_AGENT");

// If either user agent or http referer not available,
// die here
if ((Referer==NULL) | (UserAgent==NULL)) {
printf("Content-type: text/html\n\n"
"<html>\n"
"<head><title>ByAgent</title></head>\n"
"<body>\n"
"<h1>Pick your browser</h1>\n"
"ByAgent could not find either the "
"HTTP_REFERER or the HTTP_USER_AGENT "
"environment variable. "
"Please pick your browser from this list:\n"
"<ul>\n"
"<li><a href=\"generic.html\">Generic</a>\n"
"<li><a href=\"lynx.html\">Lynx</a>\n"
"<li><a href=\"msie.html\">Microsoft</a>\n"
"<li><a href=\"netscape.html\">Netscape</a>\n"
"</ul>\n"
"</body>\n"
"</html>"
);
return;
}

// This program assumes that the browser-specific pages
// are in the same directory as the page calling this
// program. Therefore, we'll use the HTTP_REFERER to
// get our URL, then strip the HTTP_REFERER's page
// name, and add the proper browser-specific page name
// to the end.

// First, copy the HTTP_REFERER value to szNewPage, so
// we have something to work on.
strcpy(szNewPage,Referer);

// Find the last forward slash in the URL. This is
// the separator between the directory and the page
// name.
p = strrchr(szNewPage,'/');

// If we found no forward slash, assume some sort of
// weird server and hope a relative path will work by
// chopping off the entire URL.
if (p==NULL) p = szNewPage;

// Mark the end of the string, so we can concatenate
// to it from that point on.
*p = '\0';

// Convert to lower-case so we can do more efficient
// searches.
_strlwr(UserAgent);

// We are now ready to output a redirection header.
// This header tells the browser to go elsewhere
// for its next page. A redirection header is
// nothing more than a standard content type
// followed by "Location: " and an URL. The
// content type is separated from the redirection
// by a single newline; the entire header is
// terminated by a blank line (two newlines).

// If user agent is Microsoft Internet Explorer,
// redirect to msie.html
if (strstr(UserAgent,"msie")) {
printf("Location: %s/msie.html\n\n",szNewPage);
return;
}

// If user agent is Lynx,
// redirect to lynx.html
if (strstr(UserAgent,"lynx")) {
printf("Location: %s/lynx.html\n\n",szNewPage);
return;
}

// If user agent is Netscape,
// redirect to netscape.html
if (strstr(UserAgent,"mozilla")) {
printf("Location: %s/netscape.html\n\n",szNewPage);
return;
}

// If none of the above,
// use generic.html
printf("Location: %s/generic.html\n\n",szNewPage);
return;
}




As you can see, the preceding code is fairly simple. The comments
far outweigh the lines of code. The only tricky bits to this program
are remembering to format the redirection header correctly, and
remembering that Microsoft Internet Explorer appears to be "Mozilla"
(Netscape) if you don't look carefully.

In your own program, you may want to incorporate some mechanism
to allow the secondary pages to live in a different directory,
or even on a different server, just by changing the Location information.
You may also consider generating the correct HTML on the fly rather
than redirect the browser to an existing static page. Now that
you know how to identify the browser and do redirection, your
imagination is the only limit.
Client Pull

Several other browsers now support Client Pull, a Netscape enhancement,
but you should be careful when writing your HTML to include options
for browsers that can't deal with it.

In typical browsing, a user clicks a link and retrieves a document.
With Client Pull, that document comes back with extra instructions-directives
to reload the page or to go to another URL altogether.

Client Pull works via the <META HTTP-EQUIV>
tag, which must be part of the HTML header (that is, before any
text or graphics are displayed). When the browser sees the <META>
tag, it interprets the contents as an HTTP header. Because HTTP
headers already support automatic refresh and redirection, not
much magic is involved at all. Normally, the server or CGI program
is responsible for sending the HTTP headers. Netscape's clever
idea was to allow additional HTTP headers inside a document.

Say you have a Web page that reports election returns. A background
process of some sort reads the precinct numbers from a Reuters
connection (why not?) and once every 10 seconds rewrites your
Web page with the current data. The client can hit the reload
button every 10 seconds to see the new data, but you want to make
that process automatic. Listing 15.7 shows how to do it.


Listing 15.7  default.htm: Demonstration of Client
Pull




<html>
<head>
<META HTTP-EQUIV="Refresh" CONTENT="10">
<title>Election Returns</title>
</head>
<body>
<h1>Election Returns</h1>
This document refreshes itself once every ten seconds. Sit back and watch!
...
</body>
</html>




Note the <META HTTP-EQUIV> line. This
line causes the browser to refresh the page once every 10 seconds.
Of course, for this example to be useful, you need to have some
other process updating the page in the background, but this example
works-it will reload the page once every 10 seconds.

Why once every 10 seconds? Because each time it fetches the document,
the browser sees the instruction to load it again 10 seconds later.
The instruction is a "one-shot" instruction. It doesn't
tell the browser to load the page every 10 seconds from now until
doomsday; it just says to load the page again 10 seconds from
now.

You also can use Client Pull to redirect the browser to another
page. In listing 15.8, the browser goes to http://www.microsoft.com/
after 5 seconds.


Listing 15.8  takeride.htm: Take a Ride to Microsoft
with Client Pull




<html>
<head>
<META HTTP-EQUIV="Refresh" CONTENT="5; URL=http://www.microsoft.com/
<title>Take a Ride</title>
</head>
<body>
<h1>Take a Ride to Microsoft</h1>
This page takes you to Microsoft's Web server in five seconds.
<p>
If your browser doesn't support META commands, click <a href="http://www.microsoft.
com/">here</a> to go there manually.
</body>
</html>




This example uses the URL= syntax to tell the browser
to go to the specified URL. The delay is set to 5 seconds. Note
also that text is included to explain what's going on, and a manual
link is included for people who have browsers that don't support
Client Pull.



TIP


You can set the refresh delay to zero. This tells the browser to go to the designated URL (or, if no URL is specified, to reload the current page) as soon as it possibly can. You can create crude animations this way.




You can set up a chain of redirection, too. In the simplest configuration,
this chain would be two files that refer to each other, as listing
15.9 shows.


Listing 15.9  page1.html and page2.html: Two Pages
that Refer to Each Other





page1.html:
<html>
<head>
<META HTTP-EQUIV="Refresh" CONTENT="1; URL=http://www.myserver.com/page2.html">
<title>Page One</title>
</head>
<body>
<h1>Page One</h1>
This page takes you to Page Two.
</body>
</html>

page2.html:
<html>
<head>
<META HTTP-EQUIV="Refresh" CONTENT="1; URL=http://www.myserver.com/page1.html">
<title>Page Two</title>
</head>
<body>
<h1>Page Two</h1>
This page takes you to Page One.
</body>
</html>




When the user first loads page1.html, he or she gets to see page1.html
for one second. Then the browser fetches page2.html. Page2.html
sticks around for one second and then switches back to page1.html.
This process continues until the user goes elsewhere or shuts
down his or her browser.



TIP


The <META> tag requires a fully qualified URL for redirection; that is, you must include the http://machine.domain/ part of the URL. Relative URLs don't work because your browser, just like the server, is stateless at this level. The
browser doesn't remember where it got the redirection instruction from, so a relative URL is meaningless here.


Also, you're not limited to redirecting to a page of static HTML text. Your URL can point to an audio clip or a video file.



Server Push

Server Push works with more browsers than does Client Pull, but
it's still limited. If you use this technique, be aware that some
users can't see your splendid achievements.

The Server Push method relies on a variant of the MIME type multipart/mixed
called multipart/x-mixed-replace. Like the standard multipart/mixed,
this MIME type can contain an arbitrary number of segments, each
of which can be almost any type of information. You accomplish
Server Push by outputting continuous data using this MIME type,
thus keeping the connection to the browser open and continuously
refreshing the browser's display.

Server Push isn't a browser trick; you need to write a CGI program
that outputs the correct HTTP headers, MIME headers, and data.
Server Push isn't for the faint-hearted. To pull it off, you need
to understand and use just about every CGI trick in the book.

A Server Push continues until the client clicks the Stop button
or until the CGI program outputs a termination sequence. Because
the connection is left open all the time, a Server Push is more
efficient than a Client Pull. On the other hand, your CGI program
is running continuously, consuming bandwidth on the network pipe
and resources on the server.

In a standard multipart/mixed document, the headers and data would
look something like listing 15.10.


Listing 15.10  Example of Multipart/Mixed Headers





Content-type: multipart/mixed;boundary=BoundaryString

--BoundaryString
Content-type: text/plain

Some text for part one.

--BoundaryString
Content-type: text/plain

Some text for part two.
--BoundaryString--




The boundary is an arbitrary string of characters used
to demarcate the sections of the multipart document. You use whatever
you specify on the first header for the remainder of the document.
In this example, BoundaryString is the boundary marker.




NOTE


The blank lines in listing 15.10 aren't there to make the text more readable-they're part of the headers. Your program will fail if you don't follow this syntax exactly!






Each section of the document begins with two dashes and the boundary
marker on a line by itself. Immediately thereafter, you must specify
the content type for that section. Like a normal header, the content
type is followed by one blank line. You then output the content
for that section. The last section is terminated by a standard
boundary marker with two dashes at the beginning and end
of the line.

Server Push uses the same general format but takes advantage of
the MIME type multipart/x-mixed-replace. The x
means that the MIME type is still experimental; the replace
means that each section should replace the previous one rather
than be appended to it. Here's how the preceding example looks
using multipart/x-mixed-replace:



Content-type: multipart/x-mixed-replace;boundary=BoundaryString

--BoundaryString
Content-type: text/plain

Original text.

--BoundaryString
Content-type: text/plain

This text replaces the original text.

--BoundaryString--



In a typical Server Push scenario, the CGI program sends the first
header and first data block, and then leaves the connection open.
Because the browser hasn't seen a terminating sequence yet, it
knows to wait around for the next block. When the CGI program
is ready, it sends the next block, which the browser dutifully
uses to replace the first block. The browser then waits again
for more information.

This process can continue indefinitely, which is how the Server
Push animations you've seen are accomplished. The individual sections
can be any MIME format. Although the example in this chapter uses
text/plain for clarity, you may well choose to use image/jpeg
instead in your program. The data in the block would then be binary
image data. Each block you send would be a frame of the animation.

ServPush is a complete working sample of a Server Push program.
You should be able to compile it for any platform. You can find
the source, plus a compiled executable for the 32-bit Windows
NT/Windows 95 environment, on the CD-ROM accompanying this book.

Compile the code for ServPush (see listing 15.11) and name it
servpush.exe. Put the compiled executable in your cgi-bin directory,
and test it with <a href="/cgi-bin/servpush.exe?">Test
Server Push</a>.


Listing 15.11  servpush.c: Demonstration of Server
Push




// SERVPUSH.C
// This program demonstrates SERVER PUSH of text
// strings. It outputs a header, followed by 10
// strings. Each output is an x-mixed-replace
// section. Each section replaces the previous
// one on the user's browser.
//
// Long printf lines in this listing have been broken
// for clarity.

#include <windows.h>
#include <stdio.h>

void main() {
// First declare our variables. We'll use "x"
// as a loop counter. We'll use an array of
// pointers, called *pushes[], to hold 10 strings.
// These strings will get pushed down the pipe,
// one at a time, during the operation of our
// program.
int x;
char *pushes[10] = {
"Did you know this was possible?",
"Did you know this was <i>possible</i>?",
"Did you know this was <b>possible?</b>",
"<font size=+1>Did you know this was "
"possible?</font>",
"<font size=+2>Did you know this was "
"<i>possible?</i></font>",
"<font size=+3>Did you know this was "
"<b>possible?</b></font>",
"<font size=+4>Did you know this was "
"possible?</font>",
"<font size=+5><i>DID YOU KNOW THIS WAS "
"POSSIBLE?</i></font>",
"<font size=+6><b>DID YOU KNOW THIS WAS "
"POSSIBLE?</b></font>",
"<b><i>Now you do!</i></b>"
};

// Turn buffering off for stdout
setvbuf(stdout,NULL,_IONBF,0);

// Output the main HTTP header
// Our boundary string will be "BoundaryString"
// Note that like all headers, it must be
// terminated with a blank line (the \n\n at
// the end).
printf("Content-type: "
"multipart/x-mixed-replace;"
"boundary=BoundaryString\n\n");

// Output the first section header
// Each section header must start with two dashes,
// the arbitrary boundary string, a newline character,
// the content type for this section, and TWO newlines.
printf("--BoundaryString\n"
"Content-type: text/html\n\n");

// Output a line to describe what we're doing
printf("<h1>Server Push Demonstration</h1>\n");

// Loop through the 10 strings
for (x = 0; x < 10; x++) {
// Output the section header first
printf("\n--BoundaryString\n"
"Content-type: text/html\n\n");
// Flush output, just to be safe
fflush(stdout);
// Wait to let the browser display last section
Sleep(1500);
// Output data for this section
printf("Special Edition: Using CGI<br>"
"Server Push demonstration. "
"Push %i:<br>%s\n"
,x+1, pushes[x]);
// Flush again
fflush(stdout);
}

// All done, so output the terminator.
// The trailing two dashes let the browser know that
// there will be no more parts in this multipart
// document.
printf("\n--BoundaryString--\n\n");



..}


Now that you see how it's done, you should be able to make your
own programs. If you want to push graphics instead of text, change
the MIME header for the individual sections, and output binary
data. (See Chapter 3 "Designing CGI Applications,"
for details about raw versus cooked mode; you need to tell the
operating system to switch the STDOUT output mode to binary if
you're going to send binary data.)

Interestingly, you can use Server Push to create animated inline
graphics in an otherwise static document. To do so, first create
your static document. Include an <img> tag, with
the source pointing to a CGI Server Push program instead of a
graphics file. For example, say that you've written a Server Push
program called photos.exe, which outputs a slide show of your
family album. Here's how you can incorporate a dynamic slide show
into your HTML:



<html>
<head><title>In-Line Push</title></head>
<body>
<h1>In-Line Push</h1>
This page of otherwise ordinary HTML includes a link to a server push program.
Sit back and watch the show:
<p>
<img src="/cgi-bin/photos.exe?">
</body>
</html>



Near Real-Time HTML

As you saw earlier in this chapter in the section "Methods
of Generating Real-Time HTML," not everything needs to be
generated on the fly. Documents that are updated regularly and
served as static documents are often called near real-time,
because the information is fresh but the document itself is static.
Often, CGI is used to update the document (rather than create
the document in real time). This allows the document to reflect
changes immediately, but avoids the overhead of running a CGI
program every time a browser fetches the document.

MHonArc (pronounced monarch) is a good example of providing
near real-time content. This freeware Perl 5 program (available
from http://www.oac.uci.edu/indiv/ehood/mhonarc.doc.html)
provides e-mail archival to HTML, with full indexing, thread linking,
and support for embedded MIME types. Although the HTML pages themselves
are already composed and retrieved normally, they can be updated
in the background. You can schedule the MHonArc program to run
at regular times, or it can be triggered by the arrival of new
mail. Although the code is highly UNIX-centric-and therefore not
particularly useful on other platforms-you can examine the source
for ideas and techniques.

List maintenance also benefits from near real-time HTML. Lists
of favorite links or FTP directory listings don't change very
often, but you want them up-to-date at all times. A database with
a real-time CGI program to retrieve and format information may
be overkill here. A more efficient method is to have a CGI program
that updates the list as new information is added, or a scheduled
job that updates the list from a central database at regular intervals.

The SFF-NET (http://www.greyware.com/sff/) uses a combination
of CGI, SSI, and static documents to provide up-to-the-moment
lists without running a CGI program every time. When visitors
want to propose a new link for one of the lists on the SFF-NET,
they fill out an online form that invokes a standard CGI program.
The CGI program validates the information, adds the words not
validated yet, and appends it, in proper HTML format, to a
text file. Users never see this file directly; instead, when they
browse a list of links, they see a static HTML page that uses
an SSI include file function. The new links
(in the text file) show up in the list right next to the existing
links. This provides real-time updating of the overall list without
touching the main HTML page. The site administrator then looks
at the text file of new links at his leisure, and moves new links
from the text file to the HTML file.

Server Performance Considerations

Dynamic HTML can be a lot of fun and can be extraordinarily useful
at times. However, it doesn't come without cost.

The first consideration is for caching proxy servers. If your
page includes a page count or a random quotation or a Server Push
animation, it can't be cached. Defeating caching isn't necessarily
an evil-you wouldn't want your up-to-the-second stock market quotes
to be cached, for instance-but it can create unnecessary network
traffic.

If you visit the Usenet groups regularly, you see a recurring
theme of experienced old hackers venting their spleens at newbies
who chew up bandwidth for no reasonable purpose. The range of
opinion you find goes from calm, rational argumentation, to wild,
impassioned screeds. Some go so far as to say that any
CGI program is evil and that page-hit counters are the devil's
own spawn.

In a book with the sole purpose of teaching you how to write your
own CGI scripts, you won't find much support for the extremists.
The network is there to be used. Like any limited resource, it
should be used wisely rather than wastefully. The problem is in
determining what's wise. If you keep your high-traffic pages static,
you'll make everyone except the true Internet curmudgeons happy.
Of course, if you're a Java developer, all bets are off. The new
ways of using the Web are completely incompatible with caching
from the start.

The second thing to consider is that CGI programs tax the Web
server. For each retrieval that calls a CGI program, the server
must prepare environment variables, validate security, launch
the script, and pipe the results back to the caller. If a hundred
scripts are executing simultaneously, the server may become overburdened.
Even if the server has sufficient resources to cope, the overall
server throughput will suffer.

Server Push puts more of a strain on the system than almost any
other type of dynamic HTML, because the script continues executing
(consuming processor cycles and memory) theoretically forever.
Just a few of these scripts running at the same time can bring
an otherwise capable server to its knees. They have a high level
of traffic and resource consumption for relatively little gain.

There are no hard and fast rules. As with any system, you must
balance performance against cost, and capacity against throughput.











Wyszukiwarka

Podobne podstrony:
ch15 (7)
ch15
ch15
ch15
ch15
CH15 (2)
ch15
ch15
CH15 (18)
ch15 (28)
ch15
ch15 (10)
ch15
ch15
ch15
ch15
ch15

więcej podobnych podstron