Search the FAQ Archives

3 - A - B - C - D - E - F - G - H - I - J - K - L - M
N - O - P - Q - R - S - T - U - V - W - X - Y - Z
faqs.org - Internet FAQ Archives

comp.compression Frequently Asked Questions (part 2/3)
Section - [78] The Burrows-Wheeler block sorting algorithm (long)

( Part1 - Part2 - Part3 - Single Page )
[ Usenet FAQs | Web FAQs | Documents | RFC Index | Houses ]


Top Document: comp.compression Frequently Asked Questions (part 2/3)
Previous Document: [77] Introduction to Fractal compression (long)
See reader questions & answers on this topic! - Help others by sharing your knowledge

A high-quality implementation of the Burrows-Wheeler
block-sorting-based lossless compression algorithm is available at
http://www.cs.man.ac.uk/arch/people/j-seward/bzip-0.21.tar.gz

Mark Nelson wrote an excellent article "Data Compression with the
Burrows-Wheeler Transform" for Dr. Dobb's Journal, September 1996.  A copy
of the article is at http://www.dogma.net/markn/articles/bwt/bwt.htm

Another introduction written by Sampo Syreeni <tmaaedu@nexus.edu.lahti.fi>:

    The Burrows-Wheeler block sorting compression algorithm is described in
"A Block-sorting Lossless Data Compression Algorithm" by M. Burrows and D.J.
Wheeler, dated in May 10, 1994. A postscript copy of this paper has been made
available by Digital on the Systems Research Center (SRC) FTP site at
ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.Z

    The method was originally discovered by one of the authors (Wheeler) back
in 1983, but has not been published before. As such, the method is fairly new
and hasn't yet gained popularity.

    The method described in the original paper is really a composite of three
different algorithms: the block sorting main engine (a lossless, very slightly
expansive preprocessor), the move-to-front coder (a byte-for-byte simple,
fast, locally adaptive noncompressive coder) and a simple statistical
compressor (first order Huffman is mentioned as a candidate) eventually doing
the compression. Of these three methods only the first two are discussed here
as they are what constitutes the heart of the algorithm. These two algorithms
combined form a completely reversible (lossless) transformation that - with
typical input - skews the first order symbol distributions to make the data
more compressible with simple methods. Intuitively speaking, the method
transforms slack in the higher order probabilities of the input block (thus
making them more even, whitening them) to slack in the lower order statistics.
This effect is what is seen in the histogram of the resulting symbol data.

    The block sorting preprocessor operates purely on a block basis. One way
to understand the idea is to think of the input block arranged as a circular
array where, for every symbol, the succeeding symbols are used as a predictor.
This predictor is then used to group the symbols with similar right neighbors
together. This predictor is realized (conceptually) as a two phase process.
The first phase forms all cyclic shifts of the input block whose size is
usually a power of two. Note here that the original string is always present
intact on some row of the resulting matrix. If the block length is n then
there exist n unique rotations of the original string (to the left). These
rotations are now viewed as the rows of an N x N matrix of symbols. The second
phase consists of sorting this resulting conceptual matrix. This phase results
in the rows coming into order based on their first few symbols. If there is
some commonly repeated string in the input block (the original paper gives
"the" as an example), the sorting phase brings all those rotations that have a
part of this string as the row start very close to each other. The preceding
symbol in this common string is then found in the last column of the sorted
matrix. This way common strings result in short bursts of just a few distinct
characters being formed in the last column of the matrix. The last column is
what is then output from the second phase. One further bit of information is
derived from the input data. This is an integer with enough bits to tell the
size of the input string (that is, log_2(n)). The number is used to note the
row position into which the original input block got in the sorting algorithm.
This integer always results in expansion of the data, but is necessary for us
to be able successfully decompress the string. The absolute amount of overhead
increases as the logarithm of the input block size so its percentage of the
output data becomes negligible with useful block sizes anyway.

    The characteristics of the transformation process make the output from the
sort ideal for certain kinds of further manipulation. The extreme local
fluctuations in the first order statistics of the output string lead one to
use a transformation that boosts and flattens the local fluttering of the
statistics. The best example (and, of course, the one given in the original
paper) is move-to-front coding. This coder codes a symbol as the number of
distinct symbols seen since the symbol's last occurrence. Basically this means
that the coder outputs the index of an input symbol in a dynamic LIFO stack
and then updates the stack by moving the symbol to the top. This is easy and
efficient to implement and results in fast local adaptation. As just a few
common symbols will (locally) govern the input to the coder, these symbols
will be kept on the top of the stack and thus the output will mainly consist
of low numbers. This makes it highly susceptible to first order statistical
compression methods which are, in case, easy and efficient to implement.

    The transform matrix described above would require enormous amounts of
storage space and would not result in a usable algorithm as such. The method
can, however, be realized very efficiently by suffix and quick sort methods.
Thus the whole transformation together with the eventual simple compression
engine is extremely fast but still achieves impressive compression on typical
input data. When implemented well, the speeds achieved can be in the order of
pure LZ and the compression ratios can still approach state-of-the-art Markov
modeling coders. The engine also responds well to increasing block sizes -
the longer the input block, the more space there is for the patterns to form
and the more similar input strings there will be in it. This results in almost
monotonously increasing compression ratios even as the block length goes well
into the megabyte range.
    The decompression cascade is basically just the compression cascade
backwards. More logic is needed to reverse the main sorting stage, however.
This logic involves reasoning around the order of the first the last column of
the conceptual coding matrix. The reader is referred to the original paper for
an in depth treatment of the subject. The original paper also contains a more
thorough discussion of why the method works and how to implement it.

    And now a little demonstration. The original block to be compressed is
chosen to be the (rather pathological) string "good, jolly good". This was
taken as an example because it has high redundancy and it is exactly 16 bytes
long. The first picture shows the cyclic shifts (rotations) of the input
string. The second shows the matrix after sorting. Note that the last column
now has many double characters in it. Note also that the original string has
been placed into the 6th row now. The third picture shows the output for this
input block. The index integer has been packed to a full byte although 4 bits
would suffice in this case (log_2(16)=4). The fourth and fifth pictures show
the transformed string after move-to-front-coding. The sixth picture shows the
statistical distribution of the characters in the output string. Notice the
disproportionately large amount of ones and zeros, even with a very short
string like this. This is the output that is then routed through the simple
statistical encoder. It should compress very well, as the distribution of the
characters in the input block is now very uneven.


      0 1 2 3 4 5 6 7 8 9 A B C D E F        0 1 2 3 4 5 6 7 8 9 A B C D E F
      -------------------------------        -------------------------------
  0 | g o o d ,   j o l l y   g o o d    0 |   g o o d g o o d ,   j o l l y
  1 | o o d ,   j o l l y   g o o d g    1 |   j o l l y   g o o d g o o d ,
  2 | o d ,   j o l l y   g o o d g o    2 | ,   j o l l y   g o o d g o o d
  3 | d ,   j o l l y   g o o d g o o    3 | d ,   j o l l y   g o o d g o o
  4 | ,   j o l l y   g o o d g o o d    4 | d g o o d ,   j o l l y   g o o
  5 |   j o l l y   g o o d g o o d ,    5 | g o o d ,   j o l l y   g o o d
  6 | j o l l y   g o o d g o o d ,      6 | g o o d g o o d ,   j o l l y
  7 | o l l y   g o o d g o o d ,   j    7 | j o l l y   g o o d g o o d ,
  8 | l l y   g o o d g o o d ,   j o    8 | l l y   g o o d g o o d ,   j o
  9 | l y   g o o d g o o d ,   j o l    9 | l y   g o o d g o o d ,   j o l
  A | y   g o o d g o o d ,   j o l l    A | o d ,   j o l l y   g o o d g o
  B |   g o o d g o o d ,   j o l l y    B | o d g o o d ,   j o l l y   g o
  C | g o o d g o o d ,   j o l l y      C | o l l y   g o o d g o o d ,   j
  D | o o d g o o d ,   j o l l y   g    D | o o d ,   j o l l y   g o o d g
  E | o d g o o d ,   j o l l y   g o    E | o o d g o o d ,   j o l l y   g
  F | d g o o d ,   j o l l y   g o o    F | y   g o o d g o o d ,   j o l l

             1. The shifts                    2. In lexicographic order


                                                121,45,102,114,0,1,36,0,
         "y,dood  oloojggl",5                   1,113,1,0,112,110,0,3,5

     3. The output from block sort          4. After move-to-front-coding


                                             00: 4;  01: 3;  03: 1;  05: 1;
          79,2D,66,72,0,1,24,0,              24: 1;  2D: 1;  66: 1;  6E: 1;
          1,71,1,0,70,6E,0,3,5               70: 1;  71: 1;  72: 1;  79: 1

           5. In hexadecimal                      6. The statistics

------------------------------------------------------------------------------

          End of part 2 of the comp.compression faq.

User Contributions:

1
Report this comment as inappropriate
Sep 23, 2019 @ 12:00 am
The Major Advantages of uniform dating

online dating services is not a newest platform but it has taken a sudden rise after the advancement in the technologies. This has revolutionised the way singles meet. With many struggles of dating in today sphere, More people these days are turning their heads towards the internet.

Even for arranged your marriage, Older dating and stuff like that, People are changing themselves and adopting the platform with all open heart and mind. Beside several advantages these platforms possess, Location and religion are the factors that set it apart. You can find your life partner from everywhere and out of any religion. These religions and castes barely matters when it comes to online dating sites.

really fast, Easy and convenient: At the first, Dating platforms might seem to be a daunting process, [url=https://charmdatescamreviews.wordpress.com/2018/03/19/charmdate-review-why-should-i-date-ukrainian-girls-how-charmdate-protects-me/]online dating ukraine[/url] But in real it is a simple and efficient process to register and connect with people all around the globe. The only thing you need to do is creating an eye catchy and appealing profile mentioning all needed details about you, Your hobbies and your fascinates. Its speedy and convenient access makes it a must have platform for those busy corporate out there.

Less pressing: This is one of the best platforms especially for the people who are shy or nervous as they can connect with people they find interesting via chats unless they get familiar enough with them to either have a verbal talk or start with dating in person. It gives you relaxed atmosphere, Where you can take out efficient time to think to understand and what you want to say to proceed ahead with the conversation.

Meet lots more people: This platform gives user a numerous choices to select from and also it is possible that you can connect with many people altogether and you can find sometimes a person who can be a great friend and a person who can be eligible to be your partner in the future time. This platform will allow you to pick the best one out of many with whom you think you can share your interests with.

add on a Deeper Level: This online site help you know a person back to front. The only appearance you have of your mate is his profile picture, Else you have to know it with the help of the chat you have with him. this can help you evaluate a person behind his face, numerous experts judge who these persons are truly are. Such dating platforms leave you unbiased to be love someone you share similar interests with.

Full Disclosure: The online dating sites allow you to specify whatever is your expectation and intention, Right right from the start so that you can find people looking for the same things and interests as of yours. The major benefit of all such platform is that it helps preventing misunderstandings and disappointments.

fee: Last but not the cheapest, Cost saving is the most appealing benefit of online dating because real life dates are expensive. You need to hangout with your partner every weekend and you further have to spend money either on food or night-life or both.

These platforms therefore gives you enable you to get to know the person well in advance through these online portals and thereafter you should spend money on real dates. different, On the very first meet even you must spend money with no surety that you will like the person as a partner or not.
2
Report this comment as inappropriate
Sep 19, 2021 @ 12:12 pm
Hey ... Im looking a man..
I love sex. Here are my erotic photos - is.gd/bPOmnQ
3
Report this comment as inappropriate
Sep 27, 2021 @ 2:14 pm
Hello !! Im looking a lover...
I dream of hard sex! Write me - tinyurl.com/yjefe8mm
4
Report this comment as inappropriate
Sep 30, 2021 @ 5:17 pm
Hey baby.. my name is Kelly...
I love oral sex! Write me - is.gd/sTu2T8
5
Report this comment as inappropriate
Oct 3, 2021 @ 5:05 am
Fill out the form and win a FREE $500 or $1000 voucher! - tinyurl.com/ygvk79dz
6
Report this comment as inappropriate
Oct 4, 2021 @ 6:18 pm
Hello dear!!! Im looking a man!
I dream of hard sex! Write me - is.gd/gPWpcf
7
Report this comment as inappropriate
Oct 7, 2021 @ 6:18 pm
Hi baby... my name Diane!!!
I love sex. Here are my erotic photos - tinyurl.com/yghegmr2
8
Report this comment as inappropriate
Oct 10, 2021 @ 7:07 am
Hey dear.. Im looking a man...
I want sex! Write me - is.gd/MvyEhY
9
Report this comment as inappropriate
Oct 19, 2021 @ 7:07 am
hallo baby!! Im looking a lover!!
I dream of hard sex! Write me - tinyurl.com/yjs64g36
10
Report this comment as inappropriate
Oct 27, 2021 @ 3:15 pm
Hello dear! my name is Evelyn..
If you want to meet me, I'm here - tinyurl.com/ydun3hp3
11
Report this comment as inappropriate
Nov 5, 2021 @ 9:09 am
hey baby. my name Lauren!!!
I love oral sex! Write me - tinyurl.com/ygqxenap
12
Report this comment as inappropriate
Nov 14, 2021 @ 10:22 pm
hallo baby.. my name Mary!
Do you want to see a beautiful female body? Here are my erotic photos - tinyurl.com/yhtxq5lg
13
Report this comment as inappropriate
Dec 19, 2021 @ 11:23 pm
Hi dear!! Im looking a man!
I want sex! Here are my photos - is.gd/M7qBFG
14
Report this comment as inappropriate
Dec 27, 2021 @ 2:02 am
hallo !! my name Megan!
I love oral sex! Write me - chilp.it/d870a00
15
Report this comment as inappropriate
Jan 12, 2022 @ 10:10 am
Hey baby!!! Im looking a man!
I love sex. Write me - u.to/j-bKGw
16
Report this comment as inappropriate
Jan 22, 2022 @ 1:13 pm
I can agree with Judyhb in this topic. Her name may really be Evelyn, but she choosed another nickname out of shame. Kind of sad, Evelyn is a nice name..
Also, I want sex! Don't have any photos though :/

Comment about this article, ask questions, or add new information about this topic:




Top Document: comp.compression Frequently Asked Questions (part 2/3)
Previous Document: [77] Introduction to Fractal compression (long)

Part1 - Part2 - Part3 - Single Page

[ Usenet FAQs | Web FAQs | Documents | RFC Index ]

Send corrections/additions to the FAQ Maintainer:
jloup@gzip.OmitThis.org





Last Update March 27 2014 @ 02:11 PM