Encoding "Binary" Files into ASCII (Unix Power Tools, 3rd Edition)
21.12. Encoding "Binary" Files into ASCII
Email transport systems were
originally designed to transmit characters with a seven-bit
encoding -- like ASCII. This meant they could
send messages with plain English text but not
"binary" text, such as program
files or graphics (or non-English text!), that used all of an
eight-bit byte. Usenet (Section 1.21),
the newsgroup system,
was transmitted like email and had its same seven-bit limitations.
The solution -- which is still used today -- is to
encode eight-bit text into characters that use
only the seven low bits.
The first popular solution on Unix-type systems was
uuencoding.
That method is mostly obsolete now (though you'll
still find it used sometimes); it's been replaced by
MIME encoding. The next two sections cover both of
those -- though we recommend avoiding
uuencode like the plague.
21.12.1. uuencoding
The
uuencode utility encodes eight-bit data into a
seven-bit representation for sending via email or on Usenet. The
recipient can use uudecode to restore the original
data. Unfortunately, there are several different and incompatible
versions of these two utilities. Also, uuencoded data
doesn't travel well through all mail
gateways -- partly because uuencoding is sensitive to changes in
whitespace (space and TAB) characters, and some gateways munge
(change or corrupt) whitespace. So if you're
encoding text for transmission, use MIME instead
of uuencode whenever you can.
To create an ASCII version of a binary file, use
the uuencode utility. For instance,
a compressed file (Section 15.6)
is definitely eight-bit; it needs encoding.
A uuencoded file (there's an example later in this
article) starts with a begin line that gives the
file's name; this name comes from the first argument
you give the uuencode utility as it encodes a
file. To make uuencode read a file directly, give
the filename as the second argument. uuencode
writes the encoded file to its standard output. For example, to
encode the file emacs.tar.gz from your
~/tarfiles directory and store it in a file
named emacs.tar.gz.uu:
% uuencode emacs.tar.gz ~/tarfiles/emacs.tar.gz > emacs.tar.gz.uu
You can then insert emacs.tar.gz.uu into a mail
message and send it to someone. Of course, the
ASCII-only encoding takes more space than the
original binary format. The encoded file will be about one-third
larger.[64]
[64]If so, why bother
gzipping? Why not forget about both
gzip and uuencode? Well, you
can't. Remember that tar files
are binary files to start with, even if every file in the archive is
an ASCII text file. You'd need to
uuencode a file before mailing it, anyway, so
you'd still pay the 33 percent size penalty that
uuencode incurs. Using gzip
minimizes the damage.
If you'd rather, you can combine the steps above
into one pipeline. Given only one command-line argument (the name of
the file for the begin line),
uuencode will read its standard input. Instead of
creating the ~/tarfiles/emacs.tar.gz, making a
second uuencoded file, then mailing that file, you can give
tar the
"filename" so it writes to its
standard output. That feeds the archive down the pipe:[65]
[65]With GNU tar, you can use tar czf -
emacs | uuencode .... That's not the point
of this example, though. We're just showing how to
uuencode some arbitrary data.
mail Section 1.21
% tar cf - emacs | gzip | uuencode emacs.tar.gz | \
mail -s "uuencoded emacs file" whoever@wherever.com
What happens when you receive a uuencoded, compressed
tar file? The same thing, in reverse.
You'll get a mail message that looks something like
this:
From: you@whichever.ie
To: whoever@wherever.com
Subject: uuencoded emacs file
begin 644 emacs.tar.gz
M+DQ0"D%L;"!O9B!T:&5S92!P<F]B;&5M<R!C86X@8F4@<V]L=F5D(&)Y(")L
M:6YK<RPB(&$@;65C:&%N:7-M('=H:6-H"F%L;&]W<R!A(&9I;&4@=&\@:&%V
M92!T=V\@;W(@;6]R92!N86UE<RX@(%5.25@@<')O=FED97,@='=O(&1I9F9E
M<F5N= IK:6YD<R!O9B!L:6YK<SH*+DQS($(*+DQI"EQF0DAA<F0@;&EN:W-<
...
end
So you save the message in a file, complete with headers.
Let's say you call this file
mailstuff. How do you get the original files
back? Use the following sequence of commands:
% uudecode mailstuff
% gunzip emacs.tar.gz
% tar xf emacs.tar
The uudecode command
searches through the file, skipping From:, etc.,
until it sees its special begin line; it decodes
the rest of the file (until the corresponding end
line) and creates the file emacs.tar.gz. Then
gunzip recreates your original
tar file, and tar xf extracts
the individual files from the archive.
Again, though, you'll be better off using
MIME encoding whenever you can.
21.12.2. MIME Encoding
When MIME
(Multipurpose
Internet Mail Extensions) was designed in the early 1990s, one main
goal was robust email communications. That meant coming up with a
mail encoding scheme that would work on all platforms and get through
all mail transmission paths.
Some text is "mostly
ASCII": for instance,
it's in a language like German or French that uses
many ASCII characters plus some eight-bit
characters (characters with a octal value greater than 177). The
MIME standard allows that text to be minimally
encoded in a way that it can be read fairly well without decoding:
the quoted-printable encoding. Other text is
full binary -- either not designed for humans to read, or so far
from ASCII that an ASCII
representation would be pointless. In that case,
you'll want to use the
base64 encoding.
Go to http://examples.oreilly.com/upt3 for more information on:
mimencode, mailto
Most modern email programs automatically
MIME-encode files. Unfortunately, some
aren't too smart about it. The
Metamail
utilities come with a utility called mimencode
(also named mmencode) for encoding and decoding
MIME formats. Another Metamail utility,
mailto, encodes and sends MIME
messages directly -- but let's use
mimencode, partly because of the extra control it
gives you.
By default, mimencode reads text from standard
input, uses a base64 encoding, and writes the encoded text to
standard output. If you add the -q option,
mimencode uses quoted-printable encoding instead.
Unlike uuencoded messages, which contain the filename in the message
body, MIME-encoded messages need information in
the message header (the lines
"To:",
"From:", etc.).
The
mail utility (except an older version)
doesn't let you make a message header. So
let's do it directly: create a mail header with
cat > (Section 11.2), create a mail body with
mimencode, and send it using a common system mail
transfer agent,
sendmail. (You could automate this with a
script, of course, but we're just demonstrating.)
The MIME standard header formats are still
evolving; we'll use a simple set of header fields
that should do the job. Here's the setup.
Let's do it first in three steps, using temporary
files:
$ cat > header
From: jpeek@oreilly.com
To: jpeek@jpeek.com
Subject: base64-encoded smallfile
MIME-Version: 1.0
Content-Type: application/octet-stream; name="smallfile.tar.gz"
Content-Transfer-Encoding: base64
CTRL-d
$ tar cf - smallfile | gzip | mimencode > body
$ cat header body | /usr/lib/sendmail -t
The cat > command lets me create the
header file by typing it in at the terminal; I
could have used a text editor instead. One important note:
the header must end with a blank line. The
second command creates the body file. The third
command uses cat to output the header, then the
body; the message we've built is piped to
sendmail, whose -t option tells
it to read the addresses from the message header. You should get a
message something like this:
Date: Wed, 22 Nov 2000 11:46:53 -0700
Message-Id: <200011221846.LAA18155@oreilly.com>
From: jpeek@oreilly.com
To: jpeek@jpeek.com
Subject: base64-encoded smallfile
MIME-Version: 1.0
Content-Type: application/octet-stream; name="smallfile.tar.gz"
Content-Transfer-Encoding: base64
H4sIACj6GzoAA+1Z21YbRxb1c39FWcvBMIMu3A0IBWxDzMTYDuBgrxU/lKSSVHF3V6erGiGv
rPn22edU3wRIecrMPLgfEGpVV53LPvtcOktcW6au3dnZ2mrZcfTkb7g6G53O7vb2k06ns7G3
06HPzt7uDn/Sra1N/L+32dnd29ve3tjD+s3Nna0novN3CHP/yqyTqRBPfk+U+rpknUnlf0Oc
...
Your mail client may be able to extract that file directly. You also
can use mimencode -u. But
mimencode doesn't know about mail
headers, so you should strip off the header first. The behead (Section 21.5) script
can do that. For instance, if you've saved the mail
message in a file msg:
$ behead msg | mimencode -u > smallfile.tar.gz
Extract (Section 39.2) smallfile.tar.gz and
compare it to your original smallfile (maybe
with cmp). They should be identical.
If you're planning to
do this often, it's important to understand how to
form an email header and body properly. For more information, see
relevant Internet RFCs (standards documents) and
O'Reilly's Programming
Internet Email by David Wood.
--JP and ML
21.11. Hacking on Characters with tr21.13. Text Conversion with dd
Copyright © 2003 O'Reilly & Associates. All rights reserved.
Wyszukiwarka