Graphics File Formats FAQ (Part 4 of 4): Tips and Tricks of the TradeSection - 1. What's this business about endianness?

Top Document: Graphics File Formats FAQ (Part 4 of 4): Tips and Tricks of the Trade
Previous Document: 0. What's the best way to read a file header?
Next Document: 2. How can I determine the byte-order of a system at run-time?

See reader questions & answers on this topic! - Help others by sharing your knowledge

So you've been pulling you hair out trying to discover why your elegant
and perfect-beyond-reproach code, running on your Macintosh or Sun, is
reading garbage from PCX and TGA files. Or perhaps your MS-DOS or Windows
application just can't seem to make heads or tails out of that Sun Raster
file. And, to make matters even more mysterious, it seems your most
illustrious creation will read some TIFF files, but not others.

As was hinted at in the previous section, just reading the header of a
graphics file one field is not enough to insure data is always read correctly
(not enough for portable code, anyway). In addition to structure, we must also
consider the endianness of the file's data, and the endianness of the
system's architecture our code is running on.

Here's are some baseline rules to follow:

  1) Graphics files typically use a fixed byte-ordering scheme. For example, 
     PCX and TGA files are always little-endian; Sun Raster and Macintosh
     PICT are always big-endian.
  2) Graphics files that may contain data using either byte-ordering scheme
     (for example TIFF) will have an identifier that indicates the
     endianness of the data.
  3) ASCII-based graphics files (such as DXF and most 3D object files),
     have no endianness and are always read in the same way on any system.
  4) Most CPUs use a fixed byte-ordering scheme. For example, the 80486
     is little-endian and the 68040 is big-endian.
  5) You can test for the type of endianness a system using software.
  6) There are many systems that are neither big- nor little-endian; these
     middle-endian systems will possibly cause such byte-order detection
     tests to return erroneous results.

Now we know that using fread() on a big-endian system to read data from a
file that was originally written in little-endian order will return
incorrect data. Actually, the data is correct, but the bytes that make up
the data are arranged in the wrong order. If we attempt to read the 16-bit
value 1234h from a little-endian file, it would be stored in memory using
the big-endian byte-ordering scheme and the value 3412h would result. What
we need is a swap function to change the resulting position of the bytes:

  WORD SwapTwoBytes(WORD w)
  {
      register WORD tmp;
      tmp =  (w & 0x00FF);
      tmp = ((w & 0xFF00) >> 0x08) | (tmp << 0x08);
      return(tmp);
  }
  
Now we can read a two-byte header value and swap the bytes as such:

  fread(&Header.Height, sizeof(Header.Height), 1, fp);
  Header.Height = SwapTwoBytes(Header.Height);

But what about four-byte values? The value 12345678h would be stored as
78563412h. What we need is a swap function to handle four-byte values:

  DWORD SwapFourBytes(DWORD dw)
  {
      register DWORD tmp;
      tmp =  (dw & 0x000000FF);
      tmp = ((dw & 0x0000FF00) >> 0x08) | (tmp << 0x08);
      tmp = ((dw & 0x00FF0000) >> 0x10) | (tmp << 0x08);
      tmp = ((dw & 0xFF000000) >> 0x18) | (tmp << 0x08);
      return(tmp);
  }

But how do we know when to swap and when not to swap? We always know the
byte-order of a graphics file that we are reading, but how do we check
what the endianness of system we are running on is? Using the C language,
we might use preprocessor switches to cause a conditional compile based on
a system definition flag:

  #define MSDOS     1
  #define WINDOWS   2
  #define MACINTOSH 3
  #define AMIGA     4
  #define SUNUNIX   5
  
  #define SYSTEM    MSDOS
  
  #if defined(SYSTEM == MSDOS)  
    // Little-endian code here
  #elif defined(SYSTEM == WINDOWS)  
    // Little-endian code here
  #elif defined(SYSTEM == MACINTOSH)  
    // Big-endian code here
  #elif defined(SYSTEM == AMIGA)  
    // Big-endian code here
  #elif defined(SYSTEM == SUNUNIX)  
    // Big-endian code here
  #else
  #error Unknown SYSTEM definition
  #endif

My reaction to the above code was *YUCK!* (and I hope yours was too!).  A
snarl of fread(), fwrite(), SwapTwoBytes(), and SwapFourBytes() functions
laced between preprocessor statements is hardly elegant code, although
sometimes it is our best choice. Fortunately, this is not one of those
times.

What we first need is a set of functions to read the data from a file
using the byte-ordering scheme of the data. This effectively combines the
read\write and swap operations into one set of functions. Considering the
following:

  WORD GetBigWord(FILE *fp)
  {
      register WORD w;
      w =  (WORD) (fgetc(fp) & 0xFF);
      w = ((WORD) (fgetc(fp) & 0xFF)) | (w << 0x08);
      return(w);
  }
  
  WORD GetLittleWord(FILE *fp)
  {
      register WORD w;
      w =  (WORD) (fgetc(fp) & 0xFF);
      w |= ((WORD) (fgetc(fp) & 0xFF) << 0x08);
      return(w);
  }
  
  DWORD GetBigDoubleWord(FILE *fp)
  {
      register DWORD dw;
      dw =  (DWORD) (fgetc(fp) & 0xFF);
      dw = ((DWORD) (fgetc(fp) & 0xFF)) | (dw << 0x08);
      dw = ((DWORD) (fgetc(fp) & 0xFF)) | (dw << 0x08);
      dw = ((DWORD) (fgetc(fp) & 0xFF)) | (dw << 0x08);
      return(dw);
  }
  
  DWORD GetLittleDoubleWord(FILE *fp)
  {
      register DWORD dw;
      dw =  (DWORD) (fgetc(fp) & 0xFF);
      dw |= ((DWORD) (fgetc(fp) & 0xFF) << 0x08);
      dw |= ((DWORD) (fgetc(fp) & 0xFF) << 0x10);
      dw |= ((DWORD) (fgetc(fp) & 0xFF) << 0x18);
      return(dw);
  }
  
  void PutBigWord(WORD w, FILE *fp)
  {
      fputc((w >> 0x08) & 0xFF, fp);
      fputc(w & 0xFF, fp);
  }
  
  void PutLittleWord(WORD w, FILE *fp)
  {
      fputc(w & 0xFF, fp);
      fputc((w >> 0x08) & 0xFF, fp);
  }
  
  void PutBigDoubleWord(DWORD dw, FILE *fp)
  {
      fputc((dw >> 0x18) & 0xFF, fp);
      fputc((dw >> 0x10) & 0xFF, fp);
      fputc((dw >> 0x08) & 0xFF, fp);
      fputc(dw & 0xFF, fp);
  }
  
  void PutLittleDoubleWord(DWORD dw, FILE *fp)
  {
      fputc(dw & 0xFF, fp);
      fputc((dw >> 0x08) & 0xFF, fp);
      fputc((dw >> 0x10) & 0xFF, fp);
      fputc((dw >> 0x18) & 0xFF, fp);
  }

If we were reading a little-endian file on a big-endian system (or visa
versa), the previous code:

  fread(&Header.Height, sizeof(Header.Height), 1, fp);
  Header.Height = SwapTwoBytes(Header.Height);

Would be replaced by:
  
  Header.Height = GetLittleWord(fp);

The code to write the same value to a file would be changed from:

  Header.Height = SwapTwoBytes(Header.Height);
  fwrite(&Header.Height, sizeof(Header.Height), 1, fp);

To the slightly more readable:

  PutLittleWord(Header.Height, fp);

Note that these functions are the same regardless of the endianness of a
system. For example, the ReadLittleWord() will always read a two-byte value
from a little-endian file regardless of the endianness of the system;
PutBigDoubleWord() will always write a four-byte big-endian value, and so
forth.

User Contributions:

Comment about this article, ask questions, or add new information about this topic:

Archived related questions and answers

[ Usenet FAQs | Web FAQs | Documents | RFC Index ]

Send corrections/additions to the FAQ Maintainer:
jdm@ora.com (James D. Murray)

Last Update March 27 2014 @ 02:11 PM

Graphics File Formats FAQ (Part 4 of 4): Tips and Tricks of the Trade
Section - 1. What's this business about endianness?

Search the FAQ Archives

Graphics File Formats FAQ (Part 4 of 4): Tips and Tricks of the Trade
Section - 1. What's this business about endianness?

User Contributions:

Comment about this article, ask questions, or add new information about this topic: