Compressing Files to Save Space (Unix Power Tools, 3rd Edition)
15.6. Compressing Files to Save Space
gzip is a fast and
efficient compression program distributed by the
GNU project. The basic function of
gzip is to take a file
filename, compress it, save the compressed
version as filename.gz, and remove the original,
uncompressed file. The original file is removed only if
gzip is successful; it is very difficult to delete
a file accidentally in this manner. Of course, being
GNU software, gzip has more
options than you want to think about, and many aspects of its
behavior can be modified using command-line options.
First, let's say that we have a large file named
garbage.txt:
rutabaga% ls -l garbage.txt*
-rw-r--r-- 1 mdw hack 312996 Nov 17 21:44 garbage.txt
If we compress this file using gzip, it replaces
garbage.txt with the compressed file
garbage.txt.gz. We end up with the following:
rutabaga% gzip garbage.txt
rutabaga% ls -l garbage.txt*
-rw-r--r-- 1 mdw hack 103441 Nov 17 21:48 garbage.txt.gz
Note that garbage.txt is removed when
gzip completes.
You can give gzip a list of filenames; it
compresses each file in the list, storing each with a
.gz extension. (Unlike the
zip program for Unix and MS-DOS
systems, gzip will not, by default, compress
several files into a single .gz archive.
That's what tar is for; see
Section 15.7.)
Go to http://examples.oreilly.com/upt3 for more information on: gzip
How efficiently a file is compressed depends upon its format and
contents. For example, many audio and graphics file formats (such as
MP3 and JPEG) are already well
compressed, and gzip will have little or no effect
upon such files. Files that compress well usually include plain-text
files and binary files such as executables and libraries. You can get
information on a gzip ped file using
gzip -l. For example:
rutabaga% gzip -l garbage.txt.gz
compressed uncompr. ratio uncompressed_name
103115 312996 67.0% garbage.txt
To get our
original file back from the compressed version, we use
gunzip, as in:
rutabaga% gunzip garbage.txt.gz
rutabaga% ls -l garbage.txt
-rw-r--r-- 1 mdw hack 312996 Nov 17 21:44 garbage.txt
which is identical to the original file. Note that when you
gunzip a file, the compressed version is removed
once the uncompression is complete.
gzip stores the name of the original, uncompressed
file in the compressed version. This allows the name of the
compressed file to be irrelevant; when the file is uncompressed it
can be restored to its original splendor.
To
uncompress a file to its original filename, use the
-N option with gunzip. To see the
value of this option, consider the following sequence of commands:
rutabaga% gzip garbage.txt
rutabaga% mv garbage.txt.gz rubbish.txt.gz
If we were to gunzip
rubbish.txt.gz at this
point, the uncompressed file would be named
rubbish.txt, after the new (compressed)
filename. However, with the -N option, we get the
following:
rutabaga% gunzip -N rubbish.txt.gz
rutabaga% ls -l garbage.txt
-rw-r--r-- 1 mdw hack 312996 Nov 17 21:44 garbage.txt
gzip and gunzip can also
compress or uncompress data from standard input and output. If
gzip is given no filenames to compress, it
attempts to compress data read from standard input. Likewise, if you
use the -c option with
gunzip, it writes uncompressed data to standard
output. For example, you could pipe the output of a command to
gzip to compress the output stream and save it to
a file in one step, as in:
rutabaga% ls -laR $HOME | gzip > filelist.gz
This will produce a recursive directory listing of your home
directory and save it in the compressed file
filelist.gz. You can display the contents of
this file with the command:
rutabaga% gunzip -c filelist.gz | less
This will uncompress filelist.gz and pipe the
output to the less (Section 12.3) command. When you use
gunzip -c, the file on disk
remains compressed.
The gzcat command is
identical to gunzip -c. You can
think of this as a version of cat for compressed
files. Some systems, including Linux, even have a version of the
pager less for compressed files:
zless.
When compressing files, you can use one of the options
-1, -2, through -9
to specify the speed and quality of the compression used.
-1 (also - -fast)
specifies the fastest method, which compresses the files less
compactly, while -9 (also - -best)
uses the slowest, but best compression method. If you
don't specify one of these options, the default is
-6. None of these options has any bearing on how you
use gunzip; gunzip can
uncompress the file no matter what speed option you use.
Go to http://examples.oreilly.com/upt3 for more information on: bzip, bzip2
Another compression/decompression program has emerged to take the
lead from gzip.
bzip2 is the new kid on the block and sports even
better compression (on the average about 10 to 20% better than
gzip), at the expense of longer compression times.
You cannot use bunzip2 to
uncompress files compressed with gzip and vice
versa. Since you cannot expect everybody to have
bunzip2 installed on their machine, you might want
to confine yourself to gzip for the time being if
you want to send the compressed file to somebody else (or, as many
archives do, provide both gzip- and
bzip2-compressed versions of the file). However,
it pays to have bzip2 installed, because more and
more FTP servers now provide
bzip2-compressed packages to conserve disk space
and, more importantly these days, bandwidth. You can recognize
bzip2-compressed files from their typical
.bz2 file name extension.
While the command-line options of bzip2 are not
exactly the same as those of gzip, those that have
been described in this section are, except for -
-best and - -fast, which
bzip2 doesn't have. For more
information, see the bzip2 manual page.
The bottom line is that you should use
gzip/gunzip or
bzip2/bunzip2 for your
compression needs. If you encounter a file with the extension
.Z, it was probably produced by
compress, and gunzip can
uncompress it for you.
[These days, the only real use for
compress -- if you have gzip
and bzip2 -- is for creating compressed images
needed by some embedded hardware, such as older Cisco IOS images.
-- DJPH]
-- MW, MKD, and LK
15.5. Limiting File Sizes15.7. Save Space: tar and compress a Directory Tree
Copyright © 2003 O'Reilly & Associates. All rights reserved.
Wyszukiwarka
Podobne podstrony:
ch15 (7)ch15ch15ch15CH15 (2)ch15ch15CH15 (18)ch15 (28)ch15ch15 (10)ch15ch15ch15ch15ch15 (3)ch15więcej podobnych podstron