 Dynamic Pages and Search

You know all about the advantages dynamically
generated web sites offer - but if you want your site to be indexed by
search engines, you have to keep in mind how search engines work. This
article shows some search engine basics and provides you with guidelines
on making your dynamic web sites search-engine-friendly. By Tobias Ratschiller on September 28th, 2001.

The problem
If ecommerce-applications, web-based schedule planners, or personalized
portals - dynamic sites are often generated for one user specifically.
Web-applications for example often assign a session-ID to unambigously
identify a user. A URL would for example look as follows: http://www.foo.com/script.php?ID=b6ac8ca8e453cdc43e6078abf044cdb5This
makes it possible to recognize users over different separate pages, and
possibly also show their shopping cart in an online shop. For a search
engine it does not make much sense to show the contents of such a site:
usually the session expires after a certain time-span or the content of a
site is not traceable anymore.
For this reason many search engines do principally not indicate sites
whose address (URL) looks like "dynamics". Part of these are for example
addresses which contain "cgi-bin", "pl", "?" or "&". A few search
engines just leave the parameters ("?ID") away and call up the page
alone ( "script.php").
This perfectly understandable behaviour leads to one problem, though:
many bigger sites generate pages in a dynamic way, for example through the
use of databases. These should obviously be indicated by search engines.
But as already depicted there are problems with URLs like
Fooling robots
The robots of search engines, however, are also normal HTTP-clients and
do absolutely not see how a site is created on the server side. And with
PHP almost anything can be created that can be sent from a web-server to a
client. To make the search-robot indicate a dynamically generated page, it
is sufficient to make it believe that the site is page. Instead of the
ending "php" for a php-generated site you assign an ending like "html",
for example. The URL of your example script now looks as follows:http://www.foo.com/script.html?category=php.
If a search engine calls up a page without these parameters, a standard
page should come up. This works well with pages that do not need any
parameters. Sometimes, though, the parameters do really indicate the
content which is connected with certain parameters: An article from the
category "PHP" is completely different than an article from the category
"Perl": the parameter "category" is thus very important.
Thus the developer has to find another possibility to transfer
parameters. The following for example simulates a static html-site:
http://www.foo.com/script.html/PHP/. For the robot this looks like a
normal index structure: The path component of this URL is /script.html/PHP/. The web-server
though executes it as "script.html". The parameter "PHP" is then manually
extracted from the path environment ($PATH_INFO). A more elegant way:
Apache can directly assign a MIME-type to the file. You simply call the
file "script" (without ending) and with Apache's "force-type" directive you assign
the type application
/x-httpd-php to it.The URL of the script is now: http://www.foo.com/script/PHP/,
and the parameter is again visible from the path. All search engines
indicate such a page without problems, because they are not different from
the static HTML pages anymore.
Making magic with Mod_Rewrite
With Mod_Rewrite it is possible to do without the manual use of the
path environment. With Ralf S. Engelschall's Mod_Rewrite URLs can be
rewritten on-the-fly; because for these rewrite-rules (thus the
instructions according to which the URLs are to be programmed) regular
expressions can be used, almost anything imaginable can be done. Further
information about this can be found in the documentation under http://www.apache.org/docs/mod/mod_rewrite.html.
Please notice that this module is not compiled with Apache in a standard
way; you have to give the configure-script the following instructions to
also compile mod-rewrite:
For our use a few simple rewrite-rules are sufficient. First the
rewrite-engine has to be switched on. For this you write the following
configuration directives into a .htaccess-file: RewriteEngine on
With the following rule all URLs with the form news<id>.html are
transformed in shownews.php?id=<id>. So
news01.html becomes
RewriteRule ^news(.*)\.html$
Your script may access the variable $id as usual. The browser of the
user does not notice the change - for the browser the file is still called
Another example:
RewriteRule ^(.*)\.html$
This line transforms URLs like foo.html into shownews.php?id=foo.
With a few tricks it is possible to make spiders and robots believe to
have found static sites which they display in the usual way. The methods
presented in the article can be easily integrated in own scripts and with
the respective adaptation they also work with other server-side script
languages without problems.

About the author: Tobias Ratschiller is a new media consultant based
in Italy. He runs http://phpWizard.net. [Web Site]


