LZ77 compression

from The Free On-line Dictionary of Computing (8 July 2008)
LZ77 compression

   The first {algorithm} to use the {Lempel-Ziv} {substitutional
   compression} schemes, proposed in 1977.  LZ77 compression
   keeps track of the last n bytes of data seen, and when a
   phrase is encountered that has already been seen, it outputs a
   pair of values corresponding to the position of the phrase in
   the previously-seen buffer of data, and the length of the
   phrase.  In effect the compressor moves a fixed-size "window"
   over the data (generally referred to as a "sliding window"),
   with the position part of the (position, length) pair
   referring to the position of the phrase within the window.

   The most commonly used {algorithms} are derived from the
   {LZSS} scheme described by James Storer and Thomas Szymanski
   in 1982.  In this the compressor maintains a window of size N
   bytes and a "lookahead buffer", the contents of which it tries
   to find a match for in the window:

    while (lookAheadBuffer not empty)
    {
        get a pointer (position, match) to the longest match in
        the window for the lookahead buffer;

        if (length > MINIMUM_MATCH_LENGTH)
        {
          output a (position, length) pair;
          shift the window length characters along;
        }
        else
        {
          output the first character in the lookahead buffer;
          shift the window 1 character along;
        }
     }

   Decompression is simple and fast: whenever a (POSITION,
   LENGTH) pair is encountered, go to that POSITION in the window
   and copy LENGTH bytes to the output.

   Sliding-window-based schemes can be simplified by numbering
   the input text characters mod N, in effect creating a circular
   buffer.  The sliding window approach automatically creates the
   {LRU} effect which must be done explicitly in {LZ78} schemes.
   Variants of this method apply additional compression to the
   output of the LZSS compressor, which include a simple
   variable-length code ({LZB}), dynamic {Huffman coding}
   ({LZH}), and {Shannon-Fano} coding ({ZIP} 1.x), all of which
   result in a certain degree of improvement over the basic
   scheme, especially when the data are rather random and the
   LZSS compressor has little effect.  An algorithm was developed
   which combines the ideas behind LZ77 and LZ78 to produce a
   hybrid called {LZFG}.  LZFG uses the standard sliding window,
   but stores the data in a modified {trie} data structure and
   produces as output the position of the text in the trie.
   Since LZFG only inserts complete *phrases* into the
   dictionary, it should run faster than other LZ77-based
   compressors.

   All popular archivers ({arj}, {lha}, {zip}, {zoo}) are
   variations on LZ77.

   [comp.compression {FAQ}].

   (1995-04-07)