Search the FAQ Archives

3 - A - B - C - D - E - F - G - H - I - J - K - L - M
N - O - P - Q - R - S - T - U - V - W - X - Y - Z
faqs.org - Internet FAQ Archives

comp.compression Frequently Asked Questions (part 2/3)
Section - [75] Introduction to JPEG

( Part1 - Part2 - Part3 - Single Page )
[ Usenet FAQs | Web FAQs | Documents | RFC Index | Counties ]


Top Document: comp.compression Frequently Asked Questions (part 2/3)
Previous Document: [74] Introduction to JBIG
Next Document: [76] What is Vector Quantization?
See reader questions & answers on this topic! - Help others by sharing your knowledge
Here is a brief overview of the inner workings of JPEG, plus some references
for more detailed information, written by Tom Lane <tgl@sss.pgh.pa.us>.
Please read item 19 in part 1 first.

JPEG works on either full-color or gray-scale images; it does not handle
bilevel (black and white) images, at least not well.  It doesn't handle
colormapped images either; you have to pre-expand those into an unmapped
full-color representation.  JPEG works best on "continuous tone" images.
Images with many sudden jumps in color values will not compress well.

There are a lot of parameters to the JPEG compression process.  By adjusting
the parameters, you can trade off compressed image size against reconstructed
image quality over a *very* wide range.  You can get image quality ranging
from op-art (at 100x smaller than the original 24-bit image) to quite
indistinguishable from the source (at about 3x smaller).  Usually the
threshold of visible difference from the source image is somewhere around 10x
to 20x smaller than the original, ie, 1 to 2 bits per pixel for color images.
Grayscale images do not compress as much.  In fact, for comparable visual
quality, a grayscale image needs perhaps 25% less space than a color image;
certainly not the 66% less that you might naively expect.

JPEG defines a "baseline" lossy algorithm, plus optional extensions for
progressive and hierarchical coding.  There is also a separate lossless
compression mode; this typically gives about 2:1 compression, ie, about 12
bits per color pixel.  Most currently available JPEG hardware and software
handles only the baseline mode.


Here's the outline of the baseline compression algorithm:

1. Transform the image into a suitable color space.  This is a no-op for
grayscale, but for color images you generally want to transform RGB into a
luminance/chrominance color space (YCbCr, YUV, etc).  The luminance component
is grayscale and the other two axes are color information.  The reason for
doing this is that you can afford to lose a lot more information in the
chrominance components than you can in the luminance component: the human eye
is not as sensitive to high-frequency chroma info as it is to high-frequency
luminance.  (See any TV system for precedents.)  You don't have to change the
color space if you don't want to, since the remainder of the algorithm works
on each color component independently, and doesn't care just what the data
is.  However, compression will be less since you will have to code all the
components at luminance quality.  Note that colorspace transformation is
slightly lossy due to roundoff error, but the amount of error is much smaller
than what we typically introduce later on.

2. (Optional) Downsample each component by averaging together groups of
pixels.  The luminance component is left at full resolution, while the chroma
components are often reduced 2:1 horizontally and either 2:1 or 1:1 (no
change) vertically.  In JPEG-speak these alternatives are usually called 2h2v
and 2h1v sampling, but you may also see the terms "411" and "422" sampling.
This step immediately reduces the data volume by one-half or one-third.
In numerical terms it is highly lossy, but for most images it has almost no
impact on perceived quality, because of the eye's poorer resolution for chroma
info.  Note that downsampling is not applicable to grayscale data; this is one
reason color images are more compressible than grayscale.

3. Group the pixel values for each component into 8x8 blocks.  Transform each
8x8 block through a discrete cosine transform (DCT).  The DCT is a relative of
the Fourier transform and likewise gives a frequency map, with 8x8 components.
Thus you now have numbers representing the average value in each block and
successively higher-frequency changes within the block.  The motivation for
doing this is that you can now throw away high-frequency information without
affecting low-frequency information.  (The DCT transform itself is reversible
except for roundoff error.)  See question 25 for fast DCT algorithms.

4. In each block, divide each of the 64 frequency components by a separate
"quantization coefficient", and round the results to integers.  This is the
fundamental information-losing step.  The larger the quantization
coefficients, the more data is discarded.  Note that even the minimum possible
quantization coefficient, 1, loses some info, because the exact DCT outputs
are typically not integers.  Higher frequencies are always quantized less
accurately (given larger coefficients) than lower, since they are less visible
to the eye.  Also, the luminance data is typically quantized more accurately
than the chroma data, by using separate 64-element quantization tables.
Tuning the quantization tables for best results is something of a black art,
and is an active research area.  Most existing encoders use simple linear
scaling of the example tables given in the JPEG standard, using a single
user-specified "quality" setting to determine the scaling multiplier.  This
works fairly well for midrange qualities (not too far from the sample tables
themselves) but is quite nonoptimal at very high or low quality settings.

5. Encode the reduced coefficients using either Huffman or arithmetic coding.
(Strictly speaking, baseline JPEG only allows Huffman coding; arithmetic
coding is an optional extension.)   Notice that this step is lossless, so it
doesn't affect image quality.  The arithmetic coding option uses Q-coding;
it is identical to the coder used in JBIG (see question 74).  Be aware that
Q-coding is patented.  Most existing implementations support only the Huffman
mode, so as to avoid license fees.  The arithmetic mode offers maybe 5 or 10%
better compression, which isn't enough to justify paying fees.

6. Tack on appropriate headers, etc, and output the result.  In a normal
"interchange" JPEG file, all of the compression parameters are included
in the headers so that the decompressor can reverse the process.  These
parameters include the quantization tables and the Huffman coding tables.
For specialized applications, the spec permits those tables to be omitted
from the file; this saves several hundred bytes of overhead, but it means
that the decompressor must know a-priori what tables the compressor used.
Omitting the tables is safe only in closed systems.


The decompression algorithm reverses this process.  The decompressor
multiplies the reduced coefficients by the quantization table entries to
produce approximate DCT coefficients.  Since these are only approximate,
the reconstructed pixel values are also approximate, but if the design
has done what it's supposed to do, the errors won't be highly visible.
A high-quality decompressor will typically add some smoothing steps to
reduce pixel-to-pixel discontinuities.

The JPEG standard does not specify the exact behavior of compressors and
decompressors, so there's some room for creative implementation.  In
particular, implementations can trade off speed against image quality by
choosing more accurate or faster-but-less-accurate approximations to the
DCT.  Similar tradeoffs exist for the downsampling/upsampling and colorspace
conversion steps.  (The spec does include some minimum accuracy requirements
for the DCT step, but these are widely ignored, and are not too meaningful
anyway in the absence of accuracy requirements for the other lossy steps.)


Extensions:

The progressive mode is intended to support real-time transmission of images.
It allows the DCT coefficients to be sent piecemeal in multiple "scans" of
the image.  With each scan, the decoder can produce a higher-quality
rendition of the image.  Thus a low-quality preview can be sent very quickly,
then refined as time allows.  The total space needed is roughly the same as
for a baseline JPEG image of the same final quality.  (In fact, it can be
somewhat *less* if a custom Huffman table is used for each scan, because the
Huffman codes can be optimized over a smaller, more uniform population of
data than appears in a baseline image's single scan.)  The decoder must do
essentially a full JPEG decode cycle for each scan: inverse DCT, upsample,
and color conversion must all be done again, not to mention any color
quantization for 8-bit displays.  So this scheme is useful only with fast
decoders or slow transmission lines.  Up until 1995, progressive JPEG was a
rare bird, but its use is now spreading as software decoders have become fast
enough to make it useful with modem-speed data transmission.

The hierarchical mode represents an image at multiple resolutions.  For
example, one could provide 512x512, 1024x1024, and 2048x2048 versions of the
image.  The higher-resolution images are coded as differences from the next
smaller image, and thus require many fewer bits than they would if stored
independently.  (However, the total number of bits will be greater than that
needed to store just the highest-resolution frame in baseline form.)
The individual frames in a hierarchical sequence can be coded progressively
if desired.  Hierarchical mode is not widely supported at present.

Part 3 of the JPEG standard, approved at the end of 1995, introduces several
new extensions.  The one most likely to become popular is variable
quantization, which allows the quantization table to be scaled to different
levels in different parts of the image.  In this way the "more critical"
parts of the image can be coded at higher quality than the "less critical"
parts.  A signaling code can be inserted at any DCT block boundary to set a
new scaling factor.

Another Part 3 extension is selective refinement.  This feature permits a
scan in a progressive sequence, or a refinement frame of a hierarchical
sequence, to cover only part of the total image area.  This is an
alternative way of solving the variable-quality problem.  My (tgl's) guess
is that this will not get widely implemented, with variable quantization
proving a more popular approach, but I've been wrong before.

The third major extension added by Part 3 is a "tiling" concept that allows
an image to be built up as a composite of JPEG frames, which may have
different sizes, resolutions, quality settings, even colorspaces.  (For
example, a color image that occupies a small part of a mostly-grayscale page
could be represented as a separate frame, without having to store the whole
page in color.)  Again, there's some overlap in functionality with variable
quantization and selective refinement.  The general case of arbitrary tiles
is rather complex and is unlikely to be widely implemented.  In the simplest
case all the tiles are the same size and use similar quality settings.
This case may become popular even if the general tiling mechanism doesn't,
because it surmounts the 64K-pixel-on-a-side image size limitation that was
(not very foresightedly) built into the basic JPEG standard.  The individual
frames are still restricted to 64K for compatibility reasons, but the total
size of a tiled JPEG image can be up to 2^32 pixels on a side.

Lossless JPEG:

The separate lossless mode does not use DCT, since roundoff errors prevent a
DCT calculation from being lossless.  For the same reason, one would not
normally use colorspace conversion or downsampling, although these are
permitted by the standard.  The lossless mode simply codes the difference
between each pixel and the "predicted" value for the pixel.  The predicted
value is a simple function of the already-transmitted pixels just above and
to the left of the current one (for example, their average; 8 different
predictor functions are permitted).  The sequence of differences is encoded
using the same back end (Huffman or arithmetic) used in the lossy mode.

Lossless JPEG with the Huffman back end is certainly not a state-of-the-art
lossless compression method, and wasn't even when it was introduced.  The
arithmetic-coding back end may make it competitive, but you're probably best
off looking at other methods if you need only lossless compression.

The main reason for providing a lossless option is that it makes a good
adjunct to the hierarchical mode: the final scan in a hierarchical sequence
can be a lossless coding of the remaining differences, to achieve overall
losslessness.  This isn't quite as useful as it may at first appear, because
exact losslessness is not guaranteed unless the encoder and decoder have
identical IDCT implementations (ie, identical roundoff errors).  And you
can't use downsampling or colorspace conversion either if you want true
losslessness.  But in some applications the combination is useful.


References:

For a good technical introduction to JPEG, see:
	Wallace, Gregory K.  "The JPEG Still Picture Compression Standard",
	Communications of the ACM, April 1991 (vol. 34 no. 4), pp. 30-44.
(Adjacent articles in that issue discuss MPEG motion picture compression,
applications of JPEG, and related topics.)  If you don't have the CACM issue
handy, a PostScript file containing a revised version of this article is
available at ftp://ftp.uu.net/graphics/jpeg/wallace.ps.gz.  This file
(actually a preprint for a later article in IEEE Trans. Consum. Elect.)
omits the sample images that appeared in CACM, but it includes corrections
and some added material.  Note: the Wallace article is copyright ACM and
IEEE, and it may not be used for commercial purposes.

An alternative, more leisurely explanation of JPEG can be found in "The Data
Compression Book" by Mark Nelson ([Nel 1991], see question 7).  This book
provides excellent introductions to many data compression methods including
JPEG, plus sample source code in C.  The JPEG-related source code is far from
industrial-strength, but it's a pretty good learning tool.

An excellent textbook about JPEG is "JPEG Still Image Data Compression
Standard" by William B. Pennebaker and Joan L. Mitchell.  Published by Van
Nostrand Reinhold, 1993, ISBN 0-442-01272-1.  650 pages, price US$59.95.
(VNR will accept credit card orders at 800/842-3636, or get your local
bookstore to order it.)  This book includes the complete text of the ISO
JPEG standards, DIS 10918-1 and draft DIS 10918-2.  Review by Tom Lane:
"This is by far the most complete exposition of JPEG in existence.  It's
written by two people who know what they are talking about: both served on
the ISO JPEG standards committee.  If you want to know how JPEG works or
why it works that way, this is the book to have."

There are a number of errors in the first printing of the Pennebaker and
Mitchell book.  An errata list is available at
ftp://ftp.uu.net/graphics/jpeg/pm.errata.gz.  At last report, all known
errors were fixed in the second printing.

The official specification of JPEG is not currently available on-line, and
is not likely ever to be available for free because of ISO and ITU copyright
restrictions.  You can order it from your national standards agency as ISO
standards IS 10918-1, 10918-2, 10918-3, or as ITU-T standards T.81, T.83,
T.84.  See ftp://ftp.uu.net/graphics/jpeg/jpeg.documents.gz for more info.
NOTE: buying the Pennebaker and Mitchell textbook is a much better deal
than purchasing the standard directly: it's cheaper and includes a lot of
useful explanatory material along with the full draft text of the spec.
The book unfortunately doesn't include Part 3 of the spec, but if you need
Part 3, buy the book and just that part and you'll still be ahead.


User Contributions:

Comment about this article, ask questions, or add new information about this topic:

CAPTCHA




Top Document: comp.compression Frequently Asked Questions (part 2/3)
Previous Document: [74] Introduction to JBIG
Next Document: [76] What is Vector Quantization?

Part1 - Part2 - Part3 - Single Page

[ Usenet FAQs | Web FAQs | Documents | RFC Index ]

Send corrections/additions to the FAQ Maintainer:
jloup@gzip.OmitThis.org





Last Update March 27 2014 @ 02:11 PM