Finding File Types (Unix Power Tools, 3rd Edition)
12.6. Finding File Types
Many different kinds of files live on
the typical Unix system: database files, executable files, regular
text files, files for applications like StarOffice,
tar files, mail messages, directories, font files,
and so on.
You often want to check to make sure you have the right
"kind" of file before doing
something. For example, you'd like to read the file
tar. But before typing more
tar, you'd like to know whether this file
is your set of notes on carbon-based sludge or the
tar executable. If you're wrong,
the consequences might be unpleasant. Sending the
tar executable to your screen might screw up your
terminal settings, log you off, or do any number of unpleasant
things.
Go to http://examples.oreilly.com/upt3 for more information on: file
The file utility tells you what sort of file
something is. It's fairly self-explanatory:
% file /bin/sh
/bin/sh: sparc demand paged executable
% file 2650
2650: [nt]roff, tbl, or eqn input text
% file 0001,v
0001,v: ascii text
% file foo.sh
foo.sh: shell commands
file is actually quite clever though it
isn't always correct -- some versions are better
than others. It doesn't just tell you if
something's binary or text; it looks at the
beginning of the file and tries to figure out what
it's doing. So, for example, you see that file
2650 is an nroff (Section 45.12) file and
foo.sh is a shell script. It
isn't quite clever enough to figure out that
0001,v is an RCS (Section 39.5) archive,
but it does know that it's a plain ASCII text file.
Many versions of file can be customized to
recognize additional file types. The file
/etc/magic tells file how to
recognize different kinds of files. [My Linux system has the
file command from
ftp://ftp.astron.com/pub/file/, which uses a
multiple-database format. It's updated fairly often
to understand new file formats. -- JP]
It's capable of a lot (and should be capable of even
more), but we'll satisfy ourselves with an
introductory explanation. Our goal will be to teach
file to recognize RCS archives.
/etc/magic has four fields:
offset data-type value file-type
These are as follows:
offset
The offset into the file at which magic will try
to find something. If you're looking for something
right at the beginning of the file, the offset should be
0. (This is usually what you want.)
data-type
The type of test to make. Use string for text
comparisons, byte for byte comparisons,
short for two-byte comparisons, and
long for four-byte comparisons.
value
The value you want to find. For string comparisons, any text string
will do; you can use the standard Unix escape sequences (such as
\n for newline). For numeric comparisons (byte,
short, long), this field should be a number, expressed as a C
constant (e.g., 0x77 for the hexadecimal byte 77).
file-type
The string that file will print if this test
succeeds.
So, we know that RCS archives begin with the word
head. This word is right at the beginning of the
file (offset 0). Since we obviously want a string comparison, we make
the the following addition to /etc/magic:
0 string head RCS archive
This says, "The file is an RCS archive if you find
the string head at an offset of 0 bytes from the
beginning of the file." Does it work?
% file RCS/0001,v
RCS/0001,v: RCS archive
As I said, the tests can be much more complicated, particularly if
you're working with binary files. To recognize
simple text files, this is all you need to know.
-- ML
12.5. What's in That Whitespace?12.7. Squash Extra Blank Lines
Copyright © 2003 O'Reilly & Associates. All rights reserved.
Wyszukiwarka
Podobne podstrony:
ch12ch12 (15)ch12 (16)ch12ch12ch12ch12ch12ch12ch12ch12ch12ch12ch12budynas SM ch12ch12CH12ch12 (3)więcej podobnych podstron