Perl and the Internet (Unix Power Tools, 3rd Edition)
41.13. Perl and the Internet
Because Perl supports Berkeley sockets, all kinds of networking tasks
can be automated with Perl. Below are some common idioms to show you
what is possible with Perl and a little elbow grease.
41.13.1. Be Your Own Web Browser with LWP
The suite of classes that handle all the aspects of HTTP are
collectively known as LWP (for libwww-perl library). If your Perl
installation doesn't currently have LWP, you can
easily install it with the CPAN
module (Section 41.11) like this:
# perl -MCPAN -e 'install Bundle::LWP'
If you also included an X widget library such as Tk, you could create
a graphic web browser in Perl (an example of this comes with the Perl
Tk library). However, you don't need all of that if
you simply want to grab a file from a web server:
use LWP::Simple;
my $url = "http://slashdot.org/slashdot.rdf";
getstore($url, "s.rdf");
This example grabs the Rich Site Summary file from the popular tech
news portal, Slashdot, and saves it to a local file called
s.rdf. In fact, you don't even
need to bother with a full-fledged script:
$ perl -MLWP::Simple -e 'getstore("http://slashdot.org/slashdot.rdf", "s.rdf")'
Sometimes you want to process a web page to extract information from
it. Here, the title of the page given by the URL given on the command
line is extracted and reported:
use LWP::Simple;
use HTML::TokeParser;
$url = $ARGV[0] || 'http://www.oreilly.com';
$content = get($url);
die "Can't fetch page: halting\n" unless $content;
$parser = HTML::TokeParser->new(\$content);
$parser->get_tag("title");
$title = $parser->get_token;
print $title->[1], "\n" if $title;
After bringing in the library to fetch the web page (LWP::Simple) and
the one that can parse HTML (HTML::TokeParser), the command line is
inspected for a user-supplied URL. If one isn't
there, a default URL is used. The get function,
imported implicitly from LWP::Simple, attempts to fetch the URL. If
it succeeds, the whole page is kept in memory in the scalar
$content. If the fetch fails,
$content will be empty, and the script halts. If
there's something to parse, a reference to the
content is passed into the HTML::TokeParser object constructor.
HTML::TokeParser deconstructs a page into individual HTML elements.
Although this isn't the way most people think of
HTML, it does make it easier for both computers and programmers to
process web pages. Since nearly every web page has only one
<title> tag, the parser is instructed to
ignore all tokens until it finds the opening
<title> tag. The actual title string is a
text string and fetching that piece requires getting the next token.
The method get_token returns an array reference of
various sizes depending on the kind of token returned (see the
HTML::TokeParse manpage for details). In this case, the desired
element is the second one.
One important word of caution: these scripts are very simple web
crawlers, and if you plan to be grabbing a lot of pages from a web
server you don't own, you should do more research
into how to build polite web robots. See
O'Reilly's Perl &
LWP.
41.13.2. Sending Mail with Mail::Sendmail
Often, you may find it necessary to send an email reminder from a
Perl script. You could do this with sockets only, handling the whole
SMTP protocol in your code, but why bother? Someone has already done
this for you. In fact, there are several SMTP modules on CPAN, but
the easiest one to use for simple text messages is Mail::Sendmail.
Here's an example:
use Mail::Sendmail;
my %mail = (
Subject => "About your disk quota"
To => "jane@hostname.com, fred@hostname.com"
From => "admin@hostname.com",
Message => "You've exceeded your disk quotas",
smtp => "smtp-mailhost.hostname.com",
);
sendmail(%mail) or die "error: $Mail::Sendmail::error";
print "done\a\n";
Since most readers will be familiar with the way email works, this
module should be fairly easy to adapt to your own use. The one field
that may not be immediately clear is smtp. This
field should be set to the hostname or IP address of a machine that
will accept SMTP relay requests from the machine on which your script
is running. With the proliferation of email viruses of mass
destruction, mail administrators don't usually allow
their machines to be used by unknown parties. Talk to your local
system administrator to find a suitable SMTP host for your needs.
41.13.3. CGI Teaser
What Perl chapter would be complete without some mention of CGI? The
Common Gateway Interface is a standard by which web servers, like
Apache, allow external programs to interact with web clients. The
details of CGI can be found in
O'Reilly's CGI Programming with Perl, but the code below uses the
venerable CGI module to create a simple form and display the results
after the user has hit the submit button. You will need look through
your local web server's configuration files to see
where such a script needs to be in order for it to work.
Unfortunately, that information is very system-dependent.
use CGI;
$cgi = CGI->new;
$name = $cgi->param("usrname");
print
$cgi->header, $cgi->start_html,
$cgi->h1("My First CGI Program");
if( $name ){
print $cgi->p("Hello, $name");
}
print
$cgi->start_form,
$cgi->p("What's your name: "), $cgi->textfield(-name => "usrname"),
$cgi->submit, $cgi->end_form,
$cgi->end_html;
CGI scripts are unlike other scripts with which you are probably more
familiar, because these programs have a notion of programming state.
In other words, when the user first accesses this page,
$name will be empty and a blank form with a text
box will be displayed. When the user enters something into that
textbox and submits the form, the user's input will
be stored under the key usrname. After the user
presses the form's submit button, the values of that
form are available through the CGI method param.
Here, the desired value is stored under the key
usrname. If this value is populated, a simple
message is displayed before showing the form again.
Now you have nearly all the tools necessary to create your own
Internet search engine. I leave the details of creating a massive
data storage and retrieval system needed to catalog millions of web
pages as an exercise for the reader.
-- JJ
41.12. Make Custom grep Commands (etc.) with Perl42. Python
Copyright © 2003 O'Reilly & Associates. All rights reserved.
Wyszukiwarka
Podobne podstrony:
ch41CH41 (2)ch41ch41ch41ch41ch41ch41 (3)ch41ch41ch41ch41ch41CH41ch41ch41ch41ch41więcej podobnych podstron