developer.com - Reference
Click here to support our advertisers
SHOPPING
JOB BANK
CLASSIFIEDS
DIRECTORIES
REFERENCE
Online Library
LEARNING CENTER
JOURNAL
NEWS CENTRAL
DOWNLOADS
COMMUNITY
CALENDAR
ABOUT US
Journal:
Get the weekly email highlights from the most popular journal for developers!
Current issue
developer.com
developerdirect.com
htmlgoodies.com
javagoodies.com
jars.com
intranetjournal.com
javascripts.com
All Categories :
Java
Chapter 24
Developing Content and Protocol
Handlers
by Mike Fletcher
CONTENTS
What Are Protocol and Content Handlers?
MIME Types
Getting Java to Load New Handlers
Creating a Protocol Handler
Design
The fingerConnectionSource
Handler Source
Using the Handler
Creating a Content Handler
Design
Content Handler Skeleton
The tabStreamTokenizerClass
The getContent Method
Using the Content Handler
Summary
Java's URL class gives applets and applications easy
access to the World Wide Web using the HTTP protocol. This is
fine and dandy if you can get the information you need into a
format that a Web server or CGI script can access. However, wouldn't
it be nice if your code could talk directly to the server application
without going through an intermediary CGI script or some sort
of proxy? Wouldn't you like your Java-based Web browser to be
able to display your wonderful new image format? This is where
protocol and content handlers come in.
What Are Protocol and Content Handlers?
Handlers are classes that extend the capabilities of the
standard URL class. A protocol handler provides
a reference to a java.io.InputStream object (and a java.io.OutputStream
object, where appropriate) that retrieves the content of a URL.
Content handlers take an InputStream for a given MIME
type and convert it into a Java object of the appropriate type.
MIME Types
MIME (Multipurpose Internet Mail Extensions) is the Internet standard
for specifying the type of content a resource contains. As you
may have guessed from the name, it originally was proposed for
the context of enclosing nontextual components in Internet e-mail.
MIME allows different platforms (PCs, Macintoshes, UNIX workstations,
and others) to exchange multimedia content in a common format.
The MIME standard, described in RFC 1521, defines an extra set
of headers similar to those on Internet e-mail. The headers describe
attributes such as the method of encoding the content and the
MIME content type. MIME types are written as type/subtype,
where type is a general category such as text
or image and subtype is a more specific description
of the format such as html or jpeg. For example,
when a Web browser contacts an HTTP daemon to retrieve an HTML
file, the daemon's response looks something like this:
Content-type: text/html
<HEAD><TITLE>Document moved</TITLE></HEAD>
<BODY><H1>Document moved</H1>
The Web browser parses the Content-type: header and sees
that the data is text/html-an HTML document. If it was
a GIF image file, the header would have been Content-type:
image/gif.
IANA (Internet Assigned Numbers Authority), the group that maintains
the lists of assigned protocol numbers and the like, is responsible
for registering new content types. A current copy of the official
MIME types is available from ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/.
This site also has specifications or pointers to specifications
for each type.
Getting Java to Load New Handlers
The exact procedure for loading a protocol or content handler
depends on the Java implementation. The following instructions
are based on Sun's Java Developers Kit and should work for any
implementation derived from Sun's. If you have problems, check
the documentation for your particular Java version.
In the JDK implementation, the URL class and helpers
look for classes in the sun.net.www package. Protocol
handlers should be in a package called sun.net.www.protocol.ProtocolName,
where ProtocolName is the name of the protocol (such
as ftp or http). The handler class itself should
be named Handler. For example, the full name of the HTTP
protocol handler class, provided by Sun with the JDK, is sun.net.www.protocol.http.Handler.
To load your new protocol handler, you must construct a directory
structure corresponding to the package names and add the directory
to your CLASSPATH environment variable. Assume that you
have a handler for a protocol-let's call it the foo protocol-and
that your Java library directory is .../java/lib/ (...\java\lib\
on Windows machines). You must take the following steps to load
the foo protocol:
Make directories .../java/lib/sun, .../java/lib/sun/net,
and so on. The last directory should be named like this:
.../java/lib/sun/net/www/protocol/foo
Place your Handler.java file in the last directory
(that is, it should be named like this:
.../java/lib/sun/net/www/protocol/foo/Handler.java)
Compile the Handler.java file.
If you place the netClass.zip file containing the network
classes (located on the CD-ROM that accompanies this book) in
your CLASSPATH, the example handlers should load correctly.
Creating a Protocol Handler
Let's start extending Java with a handler for the finger
protocol. The finger protocol is defined in RFC 762.
The server listens on TCP port 79. It expects either the user
name for which you want information followed by ASCII carriage
return and linefeed characters, or (if you want information on
all users currently logged in) just the carriage return and linefeed
characters. The information is returned as ASCII text in a system-dependent
format (although most UNIX variants give similar information).
We will use an existing class (fingerClient) to handle
contacting the finger server and concentrate on developing
the protocol handler.
Design
The first decision we must make is how to structure URLs for our
protocol. We'll imitate the HTTP URL and specify that finger
URLs should be of the following format:
finger://host/user
In this syntax, host is the host to contact and user
is an optional user to ask for information about. If the user
name is omitted, we will return information about all users.
Because we already have a fingerClient class written,
we need to write only the subclasses to URLStreamHandler
and URLConnection. Our stream handler will use the client
object to format the returned information using HTML. The handler
will write the content into a StringBuffer, which will
be used to create a StringBufferInputStream. The fingerConnection,
a subclass of URLConnection, will take this stream and
implement the getInputStream() and getContent()
methods.
In our implementation, the protocol handler object does all the
work of retrieving the remote content; the connection object simply
retrieves the data from the stream provided. Usually, the connection
object handler would retrieve the content. The openConnection()
method would open a connection to the remote location, and the
getInputStream() would return a stream to read the contents.
In our case, the protocol is very simple (compared to something
as complex as FTP) and we can handle everything in the protocol
handler.
The fingerConnection
Source
The source for the fingerConnection class should go in
the same file as the Handler class. The constructor copies
the InputStream passed and calls the URLConnection
constructor. It also sets the URLConnection member to
indicate that the connection cannot take input. Listing 24.1 contains
the source for this class.
Listing 24.1. The fingerConnection
class.
class fingerConnection extends URLConnection {
InputStream in;
fingerConnection( URL u, InputStream in ) {
super( u );
this.in = in;
this.setDoInput( false );
}
public void connect( ) {
return;
}
public InputStream getInputStream( ) throws IOException {
return in;
}
public Object getContent( ) throws IOException {
String retval;
int nbytes;
byte buf[] = new byte[ 1024 ];
try {
while( (nbytes = in.read( buf, 0, 1024 )) != -1 ) {
retval += new String( buf, 0, 0, nbytes );
}
} catch( Exception e ) {
System.err.println(
"fingerConnection::getContent: Exception\n" + e );
e.printStackTrace( System.err );
}
return retval
}
}
Handler Source
First, let's rough out the skeleton of the Handler.java
file. We need the package statement so that our classes are compiled
into the package where the runtime handler will look for them.
We also import the fingerClient object here. The outline
of the class is shown in Listing 24.2.
Listing 24.2. Protocol handler skeleton.
package sun.net.www.protocol.finger;
import java.io.*;
import java.net.*;
import sun.net.www.protocol.finger.fingerClient;
// fingerConnection source goes here
public class Handler extends URLStreamHandler {
// openConnection() Method
}
The openConnection() Method
Now we'll develop the method responsible for returning an appropriate
URLConnection object to retrieve a given URL. The method
starts out by allocating a StringBuffer to hold our return
data. We also will parse out the host name and user name from
the URL argument. If the host was omitted, we default
to localhost. The code for openConnection()
is given in Listings 24.3 through 24.6.
Listing 24.3. The openConnection()
method: parsing the URL.
public synchronized URLConnection openConnection( URL u ) {
StringBuffer sb = new StringBuffer( );
String host = u.getHost( );
String user = u.getFile( ).substring( 1, u.getFile( ).length() );
if( host.equals( "" ) ) {
host = "localhost";
}
Next, the method writes an HTML header into the buffer (see Listing
24.4). This allows a Java-based Web browser to display the finger
information in a nice-looking format.
Listing 24.4. The openConnection()
method: writing the HTML header.
sb.append( "<HTML><head>\n");
sb.append( "<title>Fingering " );
sb.append( (user.equals("") ? "everyone" : user) );
sb.append( "@" + host );
sb.append( "</title></head>\n" );
sb.append( "<body>\n" );
sb.append( "<pre>\n" );
We'll then use the fingerClient class to get the information
into a String and then append it to our buffer. If there
is an error while getting the finger information, we
will put the error message from the exception into the buffer
instead (see Listing 24.5).
Listing 24.5. The openConnection()
method: retrieving the finger
information.
try {
String info = null;
info = (new fingerClient( host, user )).getInfo( );
sb.append( info )
} catch( Exception e ) {
sb.append( "Error fingering: " + e );
}
Finally, we'll close off the open HTML tags and create a fingerConnection
object that will be returned to the caller (see Listing 24.6).
Listing 24.6. The openConnection()
method: finishing the HTML and returning a fingerConnection
object.
sb.append( "\n</pre></body>\n</html>\n" );
return new fingerConnection( u,
(new StringBufferInputStream( sb.toString( ) ) ) );
}
Using the Handler
Once all the code is compiled and in the right locations, load
the urlFetcher applet provided on the CD-ROM that accompanies
this book and enter a finger URL. If everything loads
right, you should see something like Figure 24.1. If you get an
error that says something along the lines of BAD URL "finger://...":
unknown protocol, check that you have your CLASSPATH
set correctly.
Figure 24.1 : The urlFetcher applet displaying a finger
URL.
Creating a Content Handler
The content handler example presented in this section is for the
MIME-type text/tab-separated-values. If you have ever used a spreadsheet
or database program, this type will be familiar. Many applications
can import and export data in an ASCII text file, where each column
of data in a row is separated by a tab character (\t).
The first line is interpreted as the names of the fields, and
the remaining lines are the actual data.
Design
Our first design decision is to figure out what type of Java object
or objects to use to map the tab-separated values. Because this
is a textual content, some sort of String object would
seem to be the best solution. The spreadsheet characteristics
of rows and columns of data can be represented by arrays. Putting
these two facts together gives us a data type of String[][],
or an array of arrays of String objects. The first array
is an array of String[] objects, each representing one
row of data. Each of these arrays consists of a String
for each cell of the data.
We'll also need to have some way of breaking the input stream
into separate fields. We will make a subclass of java.io.StreamTokenizer
to handle this task. The StreamTokenizer class provides
methods for breaking an InputStream into individual tokens.
You may want to browse through the entry for StreamTokenizer
in the API reference if you are not familiar with it.
Content Handler Skeleton
Content handlers are implemented by subclassing the java.net.ContentHandler
class. These subclasses are responsible for implementing a getContent()
method. We'll start with the skeleton of the class and then import
the networking and I/O packages as well as the java.util.Vector
class. We also will define the skeleton for our tabStreamTokenizer
class. Listing 24.7 shows the skeleton for this content handler.
Listing 24.7. Content handler skeleton.
/*
* Handler for text/tab-separated-values MIME type.
*/
// This needs to go in this package for JDK-derived
// Java implementations
package sun.net.www.content.text;
import java.net.*;
import java.io.*;
class tabStreamTokenizer extends StreamTokenizer {
public static final int TT_TAB = ''\t'
// Constructor
}
import java.util.Vector;
public
class tab_separated_values extends ContentHandler {
// getContent method
}
The tabStreamTokenizer
Class
We will first define the class that breaks the input into the
separate fields. Most of the functionality we need is provided
by the StreamTokenizer class, so we only have to define
a constructor that specifies the character classes needed to get
the behavior we want. For the purposes of this content handler,
there are three types of tokens: TT_TAB tokens, which
represent fields; TT_EOL tokens, which signal the end
of a line (that is, the end of a row of data); and TT_EOF
tokens, which signal the end of the input file. Because this class
is relatively simple, it is presented in its entirety in Listing
24.8.
Listing 24.8. The tabStreamTokenizer
class.
class tabStreamTokenizer extends StreamTokenizer {
public static final int TT_TAB = '\t';
tabStreamTokenizer( InputStream in ) {
super( in );
// Undo parseNumbers() and whitespaceChars(0, ' ')
ordinaryChars( '0', '9' );
ordinaryChar( '.' );
ordinaryChar( '-' );
ordinaryChars( 0, ' ' );
// Everything but TT_EOL and TT_TAB is a word
wordChars( 0, ('\t'-1) );
wordChars( ('\t'+1), 255 );
// Make sure TT_TAB and TT_EOL get returned verbatim.
whitespaceChars( TT_TAB, TT_TAB );
ordinaryChar( TT_EOL );
}
}
The getContent Method
Subclasses of ContentHandler must provide an implementation
of getContent() that returns a reference to an Object.
The method takes as its parameter a URLConnection object
from which the class can obtain an InputStream to read
the resource's data.
The getContent Skeleton
First, we'll define the overall structure and method variables.
We need a flag (which we'll call done) to signal when
we've read all the field names from the first line of text. The
number of fields (columns) in each row of data will be determined
by the number of fields in the first line of text and will be
kept in an int variable called numFields. We
also will declare another integer, index, for use while
inserting the rows of data into a String[].
We need some method of holding an arbitrary number of objects
because we cannot tell the number of data rows in advance. To
do this, we'll use the java.util.Vector object, which
we'll call lines, to keep each String[]. Finally,
we will declare an instance of our tabStreamTokenizer,
using the getInputStream() method from the URLConnection
passed as an argument to the constructor. Listing 24.9 shows the
skeleton code for the getContent() method.
Listing 24.9. The getContent()
skeleton.
public Object getContent( URLConnection con )
throws IOException
{
boolean done = false;
int numFields = 0;
int index = 0;
Vector lines = new Vector();
tabStreamTokenizer in =
new tabStreamTokenizer( con.getInputStream( ) );
// Read in the first line of data (Listing 31.10 & 31.11)
// Read in the rest of the file (Listing 31.12)
// Stuff all data into a String[][] (Listing 31.13)
}
Reading the First Line
The first line of the file will tell us the number of fields and
the names of the fields in each row for the rest of the file.
Because we don't know beforehand how many fields there are, we'll
keep each field in a Vector called firstLine.
Each TT_WORD token that the tokenizer returns is the
name of one field. We know we are done once it returns a TT_EOL
token and can set the done flag to true. We
will use a switch statement on the ttype member
of our tabStreamTokenizer to decide what action to take.
This is done in the code in Listing 24.10.
Listing 24.10. Reading the first line of data.
Vector firstLine = new Vector( );
while( !done && in.nextToken( ) != in.TT_EOF ) {
switch( in.ttype ) {
case in.TT_WORD:
firstLine.addElement( new String( in.sval ) );
numFields++;
break;
case in.TT_EOL:
done = true;
break;
}
}
Now that we have the first line in memory, we need to build an
array of String objects from those stored in the Vector.
To accomplish this, we'll first allocate the array to the size
just determined. Then we will use the copyInto() method
to transfer the strings into the array just allocated. Finally,
we'll insert the array into lines (see Listing 24.11).
Listing 24.11. Copying field names into an array.
// Copy first line into array
String curLine[] = new String[ numFields ];
firstLine.copyInto( curLine );
lines.addElement( curLine );
Read the Rest of the File
Before reading the remaining data, we have to allocate a new array
to hold the next row. Then we loop until encountering the end
of the file, signified by TT_EOF. Each time we retrieve
a TT_WORD, we insert the String into curLine
and increment index.
The end of the line lets us know when a row of data is done, at
which time we will copy the current line into Vector.
Then we will allocate a new String[] to hold the next
line and set index back to zero (to insert the next item
starting at the first element of the array). The code to implement
this is given in Listing 24.12.
Listing 24.12. Reading the rest of the data.
curLine = new String[ numFields ];
while( in.nextToken( ) != in.TT_EOF ) {
switch( in.ttype ) {
case in.TT_WORD:
curLine[ index++ ] = new String( in.sval );
break;
case in.TT_EOL:
lines.addElement( curLine );
curLine = new String[ numFields ];
index = 0;
break;
}
}
Stuff All Data into String[][]
At this point in the code, all the data has been read in. All
that remains is to copy the data from lines into an array
of arrays of String, as shown in Listing 24.13.
Listing 24.13. Returning TSV data as String[][].
String retval[][] = new String[ lines.size() ][];
lines.copyInto( retval );
return retval;
Using the Content Handler
To show how the content handler works, we'll modify the urlFetcher
applet used earlier in this chapter to demonstrate the finger
protocol handler. We'll change it to use the getContent()
method to retrieve the contents of a resource rather than reading
the data from the stream returned by getInputStream().
We'll show the changes to the doFetch() method of the
urlFetcher applet necessary to determine what type of
Object was returned and display it correctly. The first
change is to call the getContent() method and get an
Object back rather than getting an InputStream.
Listing 24.14 shows this change.
Listing 24.14. Modified urlFetcher.doFetch()
code: call getContent()
to get an Object.
try {
boolean displayed = false;
URLConnection con = target.openConnection();
Object obj = con.getContent( );
Next come tests using the instanceof operator. We handle
String objects and arrays of String objects
by placing the text into the TextArea. Arrays are printed
item by item. If the object is a subclass of InputStream,
we read the data from the stream and display it. Image
content is just noted as being an Image. For any other
content type, we simply throw our hands up and remark that we
cannot display the content (because the urlFetcher applet
is not a full-fledged Web browser). The code to do this is shown
in Listing 24.15.
Listing 24.15. Modified urlFetcher.doFetch()
code: determine the type of the Object
and display it.
if( obj instanceof String ) {
contentArea.setText( (String) obj );
displayed = true;
}
if( obj instanceof String[] ) {
String array[] = (String []) obj;
StringBuffer buf = new StringBuffer( );
for( int i = 0; i < array.length; i++ )
buf.append( "item " + i + ": " + array[i] + "\n" );
contentArea.setText( buf.toString( ) );
displayed = true;
}
if( obj instanceof String[][] ) {
String array[][] = (String [][]) obj;
StringBuffer buf = new StringBuffer( );
for( int i = 0; i < array.length; i++ ) {
buf.append( "Row " + i + ":\n\t" );
for( int j = 0; j < array[i].length; j++ )
buf.append( "item " + j + ": "
+ array[i][j] + "\t" );
buf.append( "\n" );
}
contentArea.setText( buf.toString() );
displayed = true;
}
if( obj instanceof Image ) {
contentArea.setText( "Image" );
diplayed = true;
}
if( obj instanceof InputStream ) {
int c;
StringBuffer buf = new StringBuffer( );
while( (c = ((InputStream) obj).read( )) != -1 )
buf.append( (char) c );
contentArea.setText( buf.toString( ) );
displayed = true;
}
if( !displayed ) {
contentArea.setText( "Don't know how to display "
obj.getClass().getName( ) );
}
// Same code to display content type and length
} catch( IOException e ) {
showStatus( "Error fetching \"" + target + "\": " + e );
return;
}
The complete modified applet source is on the CD-ROM that accompanies
this book as urlFetcher_Mod.java in the tsvContentHandler
directory. Figure 24.2 shows what the applet will look like when
displaying text/tab-separated-values. The file displayed in the
figure is included on the CD-ROM as example.tsv.
Figure 24.2 : The urlFetcher_Mod applet.
Most HTTP daemons should return the correct content type for files
ending in .tsv. Many Web browsers have a menu option that shows
you information such as the content type about a URL (for example,
the View | Document Info option in Netscape Navigator). You can
use this feature to see what MIME type the sample data is being
returned as. If the data does not show up as text/tab-separated-values,
try one of the following things:
Ask your Webmaster to look at the MIME configuration file
for your HTTP daemon. The Webmaster will either be able to tell
you the proper file suffix or modify the daemon to return the
proper type.
If you can install CGI scripts on your Web server, you may
want to look at a sample script on the CD-ROM that accompanies
this book (named tsv.sh); it has the content handler example that
returns data in the proper format.
Summary
After reading this chapter, you should have an understanding of
how Java can be extended fairly easily to deal with new application
protocols and data formats. You now know what classes you have
to derive your handlers from (URLConnection and URLStreamHandler
for protocol handlers, ContentHandler for content handlers)
and how to get Java to load the new handler classes.
Contact
reference@developer.com with questions or comments.
Copyright 1998
EarthWeb Inc., All rights reserved.
PLEASE READ THE ACCEPTABLE USAGE STATEMENT.
Copyright 1998 Macmillan Computer Publishing. All rights reserved.
Wyszukiwarka
Podobne podstrony:
ch24 (16)ch24ch24 (10)ch24ch24ch24ch24 (8)ch24CH24ch24ch24 (5)ch24 (13)CH24 (12)ch24ch24ch24 (6)ch24!ch24!ch24więcej podobnych podstron