Search the FAQ Archives

3 - A - B - C - D - E - F - G - H - I - J - K - L - M
N - O - P - Q - R - S - T - U - V - W - X - Y - Z - Internet FAQ Archives

comp.compression Frequently Asked Questions (part 2/3)
Section - [78] The Burrows-Wheeler block sorting algorithm (long)

( Part1 - Part2 - Part3 - Single Page )
[ Usenet FAQs | Web FAQs | Documents | RFC Index | Sex offenders ]

Top Document: comp.compression Frequently Asked Questions (part 2/3)
Previous Document: [77] Introduction to Fractal compression (long)
See reader questions & answers on this topic! - Help others by sharing your knowledge

A high-quality implementation of the Burrows-Wheeler
block-sorting-based lossless compression algorithm is available at

Mark Nelson wrote an excellent article "Data Compression with the
Burrows-Wheeler Transform" for Dr. Dobb's Journal, September 1996.  A copy
of the article is at

Another introduction written by Sampo Syreeni <>:

    The Burrows-Wheeler block sorting compression algorithm is described in
"A Block-sorting Lossless Data Compression Algorithm" by M. Burrows and D.J.
Wheeler, dated in May 10, 1994. A postscript copy of this paper has been made
available by Digital on the Systems Research Center (SRC) FTP site at

    The method was originally discovered by one of the authors (Wheeler) back
in 1983, but has not been published before. As such, the method is fairly new
and hasn't yet gained popularity.

    The method described in the original paper is really a composite of three
different algorithms: the block sorting main engine (a lossless, very slightly
expansive preprocessor), the move-to-front coder (a byte-for-byte simple,
fast, locally adaptive noncompressive coder) and a simple statistical
compressor (first order Huffman is mentioned as a candidate) eventually doing
the compression. Of these three methods only the first two are discussed here
as they are what constitutes the heart of the algorithm. These two algorithms
combined form a completely reversible (lossless) transformation that - with
typical input - skews the first order symbol distributions to make the data
more compressible with simple methods. Intuitively speaking, the method
transforms slack in the higher order probabilities of the input block (thus
making them more even, whitening them) to slack in the lower order statistics.
This effect is what is seen in the histogram of the resulting symbol data.

    The block sorting preprocessor operates purely on a block basis. One way
to understand the idea is to think of the input block arranged as a circular
array where, for every symbol, the succeeding symbols are used as a predictor.
This predictor is then used to group the symbols with similar right neighbors
together. This predictor is realized (conceptually) as a two phase process.
The first phase forms all cyclic shifts of the input block whose size is
usually a power of two. Note here that the original string is always present
intact on some row of the resulting matrix. If the block length is n then
there exist n unique rotations of the original string (to the left). These
rotations are now viewed as the rows of an N x N matrix of symbols. The second
phase consists of sorting this resulting conceptual matrix. This phase results
in the rows coming into order based on their first few symbols. If there is
some commonly repeated string in the input block (the original paper gives
"the" as an example), the sorting phase brings all those rotations that have a
part of this string as the row start very close to each other. The preceding
symbol in this common string is then found in the last column of the sorted
matrix. This way common strings result in short bursts of just a few distinct
characters being formed in the last column of the matrix. The last column is
what is then output from the second phase. One further bit of information is
derived from the input data. This is an integer with enough bits to tell the
size of the input string (that is, log_2(n)). The number is used to note the
row position into which the original input block got in the sorting algorithm.
This integer always results in expansion of the data, but is necessary for us
to be able successfully decompress the string. The absolute amount of overhead
increases as the logarithm of the input block size so its percentage of the
output data becomes negligible with useful block sizes anyway.

    The characteristics of the transformation process make the output from the
sort ideal for certain kinds of further manipulation. The extreme local
fluctuations in the first order statistics of the output string lead one to
use a transformation that boosts and flattens the local fluttering of the
statistics. The best example (and, of course, the one given in the original
paper) is move-to-front coding. This coder codes a symbol as the number of
distinct symbols seen since the symbol's last occurrence. Basically this means
that the coder outputs the index of an input symbol in a dynamic LIFO stack
and then updates the stack by moving the symbol to the top. This is easy and
efficient to implement and results in fast local adaptation. As just a few
common symbols will (locally) govern the input to the coder, these symbols
will be kept on the top of the stack and thus the output will mainly consist
of low numbers. This makes it highly susceptible to first order statistical
compression methods which are, in case, easy and efficient to implement.

    The transform matrix described above would require enormous amounts of
storage space and would not result in a usable algorithm as such. The method
can, however, be realized very efficiently by suffix and quick sort methods.
Thus the whole transformation together with the eventual simple compression
engine is extremely fast but still achieves impressive compression on typical
input data. When implemented well, the speeds achieved can be in the order of
pure LZ and the compression ratios can still approach state-of-the-art Markov
modeling coders. The engine also responds well to increasing block sizes -
the longer the input block, the more space there is for the patterns to form
and the more similar input strings there will be in it. This results in almost
monotonously increasing compression ratios even as the block length goes well
into the megabyte range.
    The decompression cascade is basically just the compression cascade
backwards. More logic is needed to reverse the main sorting stage, however.
This logic involves reasoning around the order of the first the last column of
the conceptual coding matrix. The reader is referred to the original paper for
an in depth treatment of the subject. The original paper also contains a more
thorough discussion of why the method works and how to implement it.

    And now a little demonstration. The original block to be compressed is
chosen to be the (rather pathological) string "good, jolly good". This was
taken as an example because it has high redundancy and it is exactly 16 bytes
long. The first picture shows the cyclic shifts (rotations) of the input
string. The second shows the matrix after sorting. Note that the last column
now has many double characters in it. Note also that the original string has
been placed into the 6th row now. The third picture shows the output for this
input block. The index integer has been packed to a full byte although 4 bits
would suffice in this case (log_2(16)=4). The fourth and fifth pictures show
the transformed string after move-to-front-coding. The sixth picture shows the
statistical distribution of the characters in the output string. Notice the
disproportionately large amount of ones and zeros, even with a very short
string like this. This is the output that is then routed through the simple
statistical encoder. It should compress very well, as the distribution of the
characters in the input block is now very uneven.

      0 1 2 3 4 5 6 7 8 9 A B C D E F        0 1 2 3 4 5 6 7 8 9 A B C D E F
      -------------------------------        -------------------------------
  0 | g o o d ,   j o l l y   g o o d    0 |   g o o d g o o d ,   j o l l y
  1 | o o d ,   j o l l y   g o o d g    1 |   j o l l y   g o o d g o o d ,
  2 | o d ,   j o l l y   g o o d g o    2 | ,   j o l l y   g o o d g o o d
  3 | d ,   j o l l y   g o o d g o o    3 | d ,   j o l l y   g o o d g o o
  4 | ,   j o l l y   g o o d g o o d    4 | d g o o d ,   j o l l y   g o o
  5 |   j o l l y   g o o d g o o d ,    5 | g o o d ,   j o l l y   g o o d
  6 | j o l l y   g o o d g o o d ,      6 | g o o d g o o d ,   j o l l y
  7 | o l l y   g o o d g o o d ,   j    7 | j o l l y   g o o d g o o d ,
  8 | l l y   g o o d g o o d ,   j o    8 | l l y   g o o d g o o d ,   j o
  9 | l y   g o o d g o o d ,   j o l    9 | l y   g o o d g o o d ,   j o l
  A | y   g o o d g o o d ,   j o l l    A | o d ,   j o l l y   g o o d g o
  B |   g o o d g o o d ,   j o l l y    B | o d g o o d ,   j o l l y   g o
  C | g o o d g o o d ,   j o l l y      C | o l l y   g o o d g o o d ,   j
  D | o o d g o o d ,   j o l l y   g    D | o o d ,   j o l l y   g o o d g
  E | o d g o o d ,   j o l l y   g o    E | o o d g o o d ,   j o l l y   g
  F | d g o o d ,   j o l l y   g o o    F | y   g o o d g o o d ,   j o l l

             1. The shifts                    2. In lexicographic order

         "y,dood  oloojggl",5                   1,113,1,0,112,110,0,3,5

     3. The output from block sort          4. After move-to-front-coding

                                             00: 4;  01: 3;  03: 1;  05: 1;
          79,2D,66,72,0,1,24,0,              24: 1;  2D: 1;  66: 1;  6E: 1;
          1,71,1,0,70,6E,0,3,5               70: 1;  71: 1;  72: 1;  79: 1

           5. In hexadecimal                      6. The statistics


          End of part 2 of the comp.compression faq.

User Contributions:

Report this comment as inappropriate
May 23, 2019 @ 4:04 am
Hello, my name is Jim and I was just looking your website
over and thought I would message you on your contact form and offer
some help. I really like your site but I noticed you weren_t getting a
lot of traffic and your Alexa ranking isn_t as strong as it could be.

Fortunately, I may have an answer for you. I can get you 1,000_s of
visitors looking at ready to buy your product, service or
sign up for an offer and fast. Our advertising network of over 9000
websites provides a low cost and effective online marketing solutions
that actually works. I can help your business get more online quality
traffic by advertising your business on websites that are targeted to
your specific market. The Internet is vast but you don_t have to spend
huge amounts of cash to jump start your business. I can get you 5,000
highly targeted visitors directly to your website for as little as
$29.00 for a 30 day trial run.

If you would like to talk personally and have specific questions, call
me @ 480-331-6775 from 9am to 5pm MST. Also check out the short video
here and see how everything works.

Best Regards,
James Anthony

Comment about this article, ask questions, or add new information about this topic:


Top Document: comp.compression Frequently Asked Questions (part 2/3)
Previous Document: [77] Introduction to Fractal compression (long)

Part1 - Part2 - Part3 - Single Page

[ Usenet FAQs | Web FAQs | Documents | RFC Index ]

Send corrections/additions to the FAQ Maintainer:

Last Update March 27 2014 @ 02:11 PM