Search the FAQ Archives

3 - A - B - C - D - E - F - G - H - I - J - K - L - M
N - O - P - Q - R - S - T - U - V - W - X - Y - Z
faqs.org - Internet FAQ Archives

Graphics File Formats FAQ (Part 4 of 4): Tips and Tricks of the Trade
Section - 3. How can I identify the format of a graphics file?

( Part1 - Part2 - Part3 - Part4 - Single Page )
[ Usenet FAQs | Web FAQs | Documents | RFC Index | Forum archive ]


Top Document: Graphics File Formats FAQ (Part 4 of 4): Tips and Tricks of the Trade
Previous Document: 2. How can I determine the byte-order of a system at run-time?
Next Document: 4. What are the format identifiers of some popular file formats?
See reader questions & answers on this topic! - Help others by sharing your knowledge
When writing any type of file or data stream reader it is very important
to implement some sort of method for verifying that the input data is in
the format you expect. Here are a few methods:

1) Trust the user of your program to always supply the correct data,
thereby freeing you from the tedious task of writing any type of format
identification routines. Choose this method and you will provide solid
proof that contradicts the popular claim that users are inherently far
more stupid than programmers.

2) Read the file extension or descriptor. A GIF file will always have the
extension .GIF, right? Targa files .TGA, yes?  And TIFF files will have an
extension of .TIF or a descriptor of TIFF. So no problem?

Well, for the most part, this is true. This method certainly isn't
bulletproof, however.  Your reader will occasionally be fed the odd-batch
of mis-label files ("I thought they were PCX files!"). Or files with
unrecognized mangled extensions  (.TAR rather than .TGA or .JFI rather
than .JPG) that your reader knows how to read, but won't read because it
doesn't recognize the extensions. File extensions also won't usually tell
you the revision of the file format you are reading (with some revisions
creating an almost entirely new format). And more than one file format
share the more common file extensions (such as .IMG and .PIC). And last of
all, data streams have no file extensions or descriptors to read at all.

3) Read the file and attempt to recognize the format by specific patterns
in the data. Most file formats contain some sort of identifying pattern of
data that is identical in all files. In some cases this pattern gives and
indication of the revision of the format (such as GIF87a and GIF89a) or
the endianness of the data format.

Nothing is easy, however. Not all formats contain such identifiers (such
as PCX). And those that do don't necessarily put it at the beginning of
the file. This means if the data is in the format of a stream you many
have to read (and buffer) most or all of the data before you can determine
the format. Of course, not all graphics formats are suitable to be read as
a data stream anyway.

Your best bet for a method of format detection is a combination of methods
two and three. First believe the file extension or descriptor, read some
data, and check for identifying data patterns. If this test fails, then
attempt to recognize all other known patterns.

Run-time file format identification a black-art at best.

User Contributions:

Comment about this article, ask questions, or add new information about this topic:




Top Document: Graphics File Formats FAQ (Part 4 of 4): Tips and Tricks of the Trade
Previous Document: 2. How can I determine the byte-order of a system at run-time?
Next Document: 4. What are the format identifiers of some popular file formats?

Part1 - Part2 - Part3 - Part4 - Single Page

[ Usenet FAQs | Web FAQs | Documents | RFC Index ]

Send corrections/additions to the FAQ Maintainer:
jdm@ora.com (James D. Murray)





Last Update March 27 2014 @ 02:11 PM