Top Document: Graphics File Formats FAQ (Part 4 of 4): Tips and Tricks of the Trade Previous Document: 2. How can I determine the byte-order of a system at run-time? Next Document: 4. What are the format identifiers of some popular file formats? See reader questions & answers on this topic! - Help others by sharing your knowledge When writing any type of file or data stream reader it is very important to implement some sort of method for verifying that the input data is in the format you expect. Here are a few methods: 1) Trust the user of your program to always supply the correct data, thereby freeing you from the tedious task of writing any type of format identification routines. Choose this method and you will provide solid proof that contradicts the popular claim that users are inherently far more stupid than programmers. 2) Read the file extension or descriptor. A GIF file will always have the extension .GIF, right? Targa files .TGA, yes? And TIFF files will have an extension of .TIF or a descriptor of TIFF. So no problem? Well, for the most part, this is true. This method certainly isn't bulletproof, however. Your reader will occasionally be fed the odd-batch of mis-label files ("I thought they were PCX files!"). Or files with unrecognized mangled extensions (.TAR rather than .TGA or .JFI rather than .JPG) that your reader knows how to read, but won't read because it doesn't recognize the extensions. File extensions also won't usually tell you the revision of the file format you are reading (with some revisions creating an almost entirely new format). And more than one file format share the more common file extensions (such as .IMG and .PIC). And last of all, data streams have no file extensions or descriptors to read at all. 3) Read the file and attempt to recognize the format by specific patterns in the data. Most file formats contain some sort of identifying pattern of data that is identical in all files. In some cases this pattern gives and indication of the revision of the format (such as GIF87a and GIF89a) or the endianness of the data format. Nothing is easy, however. Not all formats contain such identifiers (such as PCX). And those that do don't necessarily put it at the beginning of the file. This means if the data is in the format of a stream you many have to read (and buffer) most or all of the data before you can determine the format. Of course, not all graphics formats are suitable to be read as a data stream anyway. Your best bet for a method of format detection is a combination of methods two and three. First believe the file extension or descriptor, read some data, and check for identifying data patterns. If this test fails, then attempt to recognize all other known patterns. Run-time file format identification a black-art at best. User Contributions:Top Document: Graphics File Formats FAQ (Part 4 of 4): Tips and Tricks of the Trade Previous Document: 2. How can I determine the byte-order of a system at run-time? Next Document: 4. What are the format identifiers of some popular file formats? Part1 - Part2 - Part3 - Part4 - Single Page [ Usenet FAQs | Web FAQs | Documents | RFC Index ] Send corrections/additions to the FAQ Maintainer: jdm@ora.com (James D. Murray)
Last Update March 27 2014 @ 02:11 PM
|
Comment about this article, ask questions, or add new information about this topic: