xdelta3.c - platform/external/xdelta3 - Git at Google

 /* xdelta 3 - delta compression tools and library
  * Copyright (C) 2001, 2003, 2004, 2005, 2006, 2007.  Joshua P. MacDonald
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
  *  the Free Software Foundation; either version 2 of the License, or
  *  (at your option) any later version.
  *
  *  This program is distributed in the hope that it will be useful,
  *  but WITHOUT ANY WARRANTY; without even the implied warranty of
  *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  *  GNU General Public License for more details.
  *
  *  You should have received a copy of the GNU General Public License
  *  along with this program; if not, write to the Free Software
  *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

    -------------------------------------------------------------------

 			       Xdelta 3

    The goal of this library is to to implement both the (stand-alone)
    data-compression and delta-compression aspects of VCDIFF encoding, and
    to support a programming interface that works like Zlib
    (http://www.gzip.org/zlib.html). See RFC3284: The VCDIFF Generic
    Differencing and Compression Data Format.

    VCDIFF is a unified encoding that combines data-compression and
    delta-encoding ("differencing").

    VCDIFF has a detailed byte-code instruction set with many features.
    The instruction format supports an immediate size operand for small
    COPYs and ADDs (e.g., under 18 bytes).  There are also instruction
    "modes", which are used to compress COPY addresses by using two
    address caches.  An instruction mode refers to slots in the NEAR
    and SAME caches for recent addresses.  NEAR remembers the
    previous 4 (by default) COPY addresses, and SAME catches
    frequent re-uses of the same address using a 3-way (by default)
    256-entry associative cache of [ADDR mod 256], the encoded byte.
    A hit in the NEAR/SAME cache requires 0/1 ADDR bytes.

    VCDIFF has a default instruction table, but an alternate
    instruction tables may themselves be be delta-compressed and
    included in the encoding header.  This allows even more freedom.
    There are 9 instruction modes in the default code table, 4 near, 3
    same, VCD_SELF (absolute encoding) and VCD_HERE (relative to the
    current position).

    ----------------------------------------------------------------------

   			      Algorithms

    Aside from the details of encoding and decoding, there are a bunch
    of algorithms needed.

    1. STRING-MATCH.  A two-level fingerprinting approach is used.  A
    single loop computes the two checksums -- small and large -- at
    successive offsets in the TARGET file.  The large checksum is more
    accurate and is used to discover SOURCE matches, which are
    potentially very long.  The small checksum is used to discover
    copies within the TARGET.  Small matching, which is more expensive,
    usually dominates the large STRING-MATCH costs in this code - the
    more exhaustive the search, the better the results.  Either of the
    two string-matching mechanisms may be disabled.

    2. INSTRUCTION SELECTION.  The IOPT buffer here represents a queue
    used to store overlapping copy instructions.  There are two possible
    optimizations that go beyond a greedy search.  Both of these fall
    into the category of "non-greedy matching" optimizations.

    The first optimization stems from backward SOURCE-COPY matching.
    When a new SOURCE-COPY instruction covers a previous instruction in
    the target completely, it is erased from the queue.  Randal Burns
    originally analyzed these algorithms and did a lot of related work
    (\cite the 1.5-pass algorithm).

    The second optimization comes by the encoding of common very-small
    COPY and ADD instructions, for which there are special DOUBLE-code
    instructions, which code two instructions in a single byte.

    The cost of bad instruction-selection overhead is relatively high
    for data-compression, relative to delta-compression, so this second
    optimization is fairly important.  With "lazy" matching (the name
    used in Zlib for a similar optimization), the string-match
    algorithm searches after a match for potential overlapping copy
    instructions.  In Xdelta and by default, VCDIFF, the minimum match
    size is 4 bytes, whereas Zlib searches with a 3-byte minimum.  This
    feature, combined with double instructions, provides a nice
    challenge.  Search in this file for "black magic", a heuristic.

    3. STREAM ALIGNMENT.  Stream alignment is needed to compress large
    inputs in constant space.  See xd3_srcwin_move_point().

    4. WINDOW SELECTION.  When the IOPT buffer flushes, in the first call
    to xd3_iopt_finish_encoding containing any kind of copy instruction,
    the parameters of the source window must be decided: the offset into
    the source and the length of the window.  Since the IOPT buffer is
    finite, the program may be forced to fix these values before knowing
    the best offset/length.

    5. SECONDARY COMPRESSION.  VCDIFF supports a secondary encoding to
    be applied to the individual sections of the data format, which are
    ADDRess, INSTruction, and DATA.  Several secondary compressor
    variations are implemented here, although none is standardized yet.

    One is an adaptive huffman algorithm -- the FGK algorithm (Faller,
    Gallager, and Knuth, 1985).  This compressor is extremely slow.

    The other is a simple static Huffman routine, which is the base
    case of a semi-adaptive scheme published by D.J. Wheeler and first
    widely used in bzip2 (by Julian Seward).  This is a very
    interesting algorithm, originally published in nearly cryptic form
    by D.J. Wheeler. !!!NOTE!!! Because these are not standardized,
    secondary compression remains off by default.
    ftp://ftp.cl.cam.ac.uk/users/djw3/bred3.{c,ps}
    --------------------------------------------------------------------

 			    Other Features

    1. USER CONVENIENCE

    For user convenience, it is essential to recognize Gzip-compressed
    files and automatically Gzip-decompress them prior to
    delta-compression (or else no delta-compression will be achieved
    unless the user manually decompresses the inputs).  The compressed
    represention competes with Xdelta, and this must be hidden from the
    command-line user interface.  The Xdelta-1.x encoding was simple, not
    compressed itself, so Xdelta-1.x uses Zlib internally to compress the
    representation.

    This implementation supports external compression, which implements
    the necessary fork() and pipe() mechanics.  There is a tricky step
    involved to support automatic detection of a compressed input in a
    non-seekable input.  First you read a bit of the input to detect
    magic headers.  When a compressed format is recognized, exec() the
    external compression program and create a second child process to
    copy the original input stream. [Footnote: There is a difficulty
    related to using Gzip externally. It is not possible to decompress
    and recompress a Gzip file transparently.  If FILE.GZ had a
    cryptographic signature, then, after: (1) Gzip-decompression, (2)
    Xdelta-encoding, (3) Gzip-compression the signature could be
    broken.  The only way to solve this problem is to guess at Gzip's
    compression level or control it by other means.  I recommend that
    specific implementations of any compression scheme store
    information needed to exactly re-compress the input, that way
    external compression is transparent - however, this won't happen
    here until it has stabilized.]

    2. APPLICATION-HEADER

    This feature was introduced in RFC3284.  It allows any application
    to include a header within the VCDIFF file format.  This allows
    general inter-application data exchange with support for
    application-specific extensions to communicate metadata.

    3. VCDIFF CHECKSUM

    An optional checksum value is included with each window, which can
    be used to validate the final result.  This verifies the correct source
    file was used for decompression as well as the obvious advantage:
    checking the implementation (and underlying) correctness.

    4. LIGHT WEIGHT

    The code makes efforts to avoid copying data more than necessary.
    The code delays many initialization tasks until the first use, it
    optimizes for identical (perfectly matching) inputs.  It does not
    compute any checksums until the first lookup misses.  Memory usage
    is reduced.  String-matching is templatized (by slightly gross use
    of CPP) to hard-code alternative compile-time defaults.  The code
    has few outside dependencies.
    ----------------------------------------------------------------------

 		The default rfc3284 instruction table:
 		    (see RFC for the explanation)

            TYPE      SIZE     MODE    TYPE     SIZE     MODE     INDEX
    --------------------------------------------------------------------
        1.  Run         0        0     Noop       0        0        0
        2.  Add    0, [1,17]     0     Noop       0        0      [1,18]
        3.  Copy   0, [4,18]     0     Noop       0        0     [19,34]
        4.  Copy   0, [4,18]     1     Noop       0        0     [35,50]
        5.  Copy   0, [4,18]     2     Noop       0        0     [51,66]
        6.  Copy   0, [4,18]     3     Noop       0        0     [67,82]
        7.  Copy   0, [4,18]     4     Noop       0        0     [83,98]
        8.  Copy   0, [4,18]     5     Noop       0        0     [99,114]
        9.  Copy   0, [4,18]     6     Noop       0        0    [115,130]
       10.  Copy   0, [4,18]     7     Noop       0        0    [131,146]
       11.  Copy   0, [4,18]     8     Noop       0        0    [147,162]
       12.  Add       [1,4]      0     Copy     [4,6]      0    [163,174]
       13.  Add       [1,4]      0     Copy     [4,6]      1    [175,186]
       14.  Add       [1,4]      0     Copy     [4,6]      2    [187,198]
       15.  Add       [1,4]      0     Copy     [4,6]      3    [199,210]
       16.  Add       [1,4]      0     Copy     [4,6]      4    [211,222]
       17.  Add       [1,4]      0     Copy     [4,6]      5    [223,234]
       18.  Add       [1,4]      0     Copy       4        6    [235,238]
       19.  Add       [1,4]      0     Copy       4        7    [239,242]
       20.  Add       [1,4]      0     Copy       4        8    [243,246]
       21.  Copy        4      [0,8]   Add        1        0    [247,255]
    --------------------------------------------------------------------

 		     Reading the source: Overview

    This file includes itself in several passes to macro-expand certain
    sections with variable forms.  Just read ahead, there's only a
    little confusion.  I know this sounds ugly, but hard-coding some of
    the string-matching parameters results in a 10-15% increase in
    string-match performance.  The only time this hurts is when you have
    unbalanced #if/endifs.

    A single compilation unit tames the Makefile.  In short, this is to
    allow the above-described hack without an explodingMakefile.  The
    single compilation unit includes the core library features,
    configurable string-match templates, optional main() command-line
    tool, misc optional features, and a regression test.  Features are
    controled with CPP #defines, see Makefile.am.

    The initial __XDELTA3_C_HEADER_PASS__ starts first, the _INLINE_ and
    _TEMPLATE_ sections follow.  Easy stuff first, hard stuff last.

    Optional features include:

      xdelta3-main.h     The command-line interface, external compression
                         support, POSIX-specific, info & VCDIFF-debug tools.
      xdelta3-second.h   The common secondary compression routines.
      xdelta3-decoder.h  All decoding routines.
      xdelta3-djw.h      The semi-adaptive huffman secondary encoder.
      xdelta3-fgk.h      The adaptive huffman secondary encoder.
      xdelta3-test.h     The unit test covers major algorithms,
                         encoding and decoding.  There are single-bit
                         error decoding tests.  There are 32/64-bit file size
                         boundary tests.  There are command-line tests.
                         There are compression tests.  There are external
                         compression tests.  There are string-matching tests.
 			There should be more tests...

    Additional headers include:

      xdelta3.h          The public header file.
      xdelta3-cfgs.h     The default settings for default, built-in
                         encoders.  These are hard-coded at
                         compile-time.  There is also a single
                         soft-coded string matcher for experimenting
                         with arbitrary values.
      xdelta3-list.h     A cyclic list template

    Misc little debug utilities:

      badcopy.c          Randomly modifies an input file based on two
                         parameters: (1) the probability that a byte in
                         the file is replaced with a pseudo-random value,
                         and (2) the mean change size.  Changes are
                         generated using an expoential distribution
                         which approximates the expected error_prob
 			distribution.
    --------------------------------------------------------------------

    This file itself is unusually large.  I hope to defend this layout
    with lots of comments.  Everything in this file is related to
    encoding and decoding.  I like it all together - the template stuff
    is just a hack. */

 #ifndef __XDELTA3_C_HEADER_PASS__
 #define __XDELTA3_C_HEADER_PASS__

 #include <errno.h>
 #include <string.h>

 #include "xdelta3.h"

 /***********************************************************************
  STATIC CONFIGURATION
  ***********************************************************************/

 #ifndef XD3_MAIN                  /* the main application */
 #define XD3_MAIN 0
 #endif

 #ifndef VCDIFF_TOOLS
 #define VCDIFF_TOOLS XD3_MAIN
 #endif

 #ifndef SECONDARY_FGK             /* one from the algorithm preservation department: */
 #define SECONDARY_FGK 0           /* adaptive Huffman routines */
 #endif

 #ifndef SECONDARY_DJW             /* semi-adaptive/static Huffman for the eventual */
 #define SECONDARY_DJW 0           /* standardization, off by default until such time. */
 #endif

 #ifndef GENERIC_ENCODE_TABLES    /* These three are the RFC-spec'd app-specific */
 #define GENERIC_ENCODE_TABLES 0  /* code features.  This is tested but not recommended */
 #endif  			 /* unless there's a real application. */
 #ifndef GENERIC_ENCODE_TABLES_COMPUTE
 #define GENERIC_ENCODE_TABLES_COMPUTE 0
 #endif
 #ifndef GENERIC_ENCODE_TABLES_COMPUTE_PRINT
 #define GENERIC_ENCODE_TABLES_COMPUTE_PRINT 0
 #endif

 #if XD3_ENCODER
 #define IF_ENCODER(x) x
 #else
 #define IF_ENCODER(x)
 #endif

 /***********************************************************************/

 typedef enum {

   /* header indicator bits */
   VCD_SECONDARY  = (1 << 0),  /* uses secondary compressor */
   VCD_CODETABLE  = (1 << 1),  /* supplies code table data */
   VCD_APPHEADER  = (1 << 2),  /* supplies application data */
   VCD_INVHDR     = ~7U,

   /* window indicator bits */
   VCD_SOURCE     = (1 << 0),  /* copy window in source file */
   VCD_TARGET     = (1 << 1),  /* copy window in target file */
   VCD_ADLER32    = (1 << 2),  /* has adler32 checksum */
   VCD_INVWIN     = ~7U,

   VCD_SRCORTGT   = VCD_SOURCE | VCD_TARGET,

   /* delta indicator bits */
   VCD_DATACOMP   = (1 << 0),
   VCD_INSTCOMP   = (1 << 1),
   VCD_ADDRCOMP   = (1 << 2),
   VCD_INVDEL     = ~0x7U,

 } xd3_indicator;

 typedef enum {
   VCD_DJW_ID    = 1,
   VCD_FGK_ID    = 16, /* Note: these are not standard IANA-allocated IDs! */
 } xd3_secondary_ids;

 typedef enum {
   SEC_NOFLAGS     = 0,

   /* Note: SEC_COUNT_FREQS Not implemented (to eliminate 1st Huffman pass) */
   SEC_COUNT_FREQS = (1 << 0),
 } xd3_secondary_flags;

 typedef enum {
   DATA_SECTION, /* These indicate which section to the secondary
                  * compressor. */
   INST_SECTION, /* The header section is not compressed, therefore not
                  * listed here. */
   ADDR_SECTION,
 } xd3_section_type;

 typedef enum
 {
   XD3_NOOP = 0,
   XD3_ADD  = 1,
   XD3_RUN  = 2,
   XD3_CPY  = 3, /* XD3_CPY rtypes are represented as (XD3_CPY +
                  * copy-mode value) */
 } xd3_rtype;

 /***********************************************************************/

 #include "xdelta3-list.h"

 XD3_MAKELIST(xd3_rlist, xd3_rinst, link);

 /***********************************************************************/

 #define SECONDARY_MIN_SAVINGS 2  /* Secondary compression has to save
 				    at least this many bytes. */
 #define SECONDARY_MIN_INPUT   10 /* Secondary compression needs at
 				    least this many bytes. */

 #define VCDIFF_MAGIC1  0xd6  /* 1st file byte */
 #define VCDIFF_MAGIC2  0xc3  /* 2nd file byte */
 #define VCDIFF_MAGIC3  0xc4  /* 3rd file byte */
 #define VCDIFF_VERSION 0x00  /* 4th file byte */

 #define VCD_SELF       0     /* 1st address mode */
 #define VCD_HERE       1     /* 2nd address mode */

 #define CODE_TABLE_STRING_SIZE (6 * 256) /* Should fit a code table string. */
 #define CODE_TABLE_VCDIFF_SIZE (6 * 256) /* Should fit a compressed code
 					  * table string */

 #define SECONDARY_ANY (SECONDARY_DJW || SECONDARY_FGK)

 #define ALPHABET_SIZE      256  /* Used in test code--size of the secondary
 				 * compressor alphabet. */

 #define HASH_PERMUTE       1    /* The input is permuted by random nums */
 #define ADLER_LARGE_CKSUM  1    /* Adler checksum vs. RK checksum */

 #define HASH_CKOFFSET      1U   /* Table entries distinguish "no-entry" from
 				 * offset 0 using this offset. */

 #define MIN_SMALL_LOOK    2U    /* Match-optimization stuff. */
 #define MIN_LARGE_LOOK    2U
 #define MIN_MATCH_OFFSET  1U
 #define MAX_MATCH_SPLIT   18U   /* VCDIFF code table: 18 is the default limit
 				 * for direct-coded ADD sizes */

 #define LEAST_MATCH_INCR  0   /* The least number of bytes an overlapping
 			       * match must beat the preceding match by.  This
 			       * is a bias for the lazy match optimization.  A
 			       * non-zero value means that an adjacent match
 			       * has to be better by more than the step
 			       * between them.  0. */

 #define MIN_MATCH         4U  /* VCDIFF code table: MIN_MATCH=4 */
 #define MIN_ADD           1U  /* 1 */
 #define MIN_RUN           8U  /* The shortest run, if it is shorter than this
 			       * an immediate add/copy will be just as good.
 			       * ADD1/COPY6 = 1I+1D+1A bytes, RUN18 =
 			       * 1I+1D+1A. */

 #define MAX_MODES         9  /* Maximum number of nodes used for
 			      * compression--does not limit decompression. */

 #define ENC_SECTS         4  /* Number of separate output sections. */

 #define HDR_TAIL(s)  ((s)->enc_tails[0])
 #define DATA_TAIL(s) ((s)->enc_tails[1])
 #define INST_TAIL(s) ((s)->enc_tails[2])
 #define ADDR_TAIL(s) ((s)->enc_tails[3])

 #define HDR_HEAD(s)  ((s)->enc_heads[0])
 #define DATA_HEAD(s) ((s)->enc_heads[1])
 #define INST_HEAD(s) ((s)->enc_heads[2])
 #define ADDR_HEAD(s) ((s)->enc_heads[3])

 #define SIZEOF_ARRAY(x) (sizeof(x) / sizeof(x[0]))

 #define TOTAL_MODES(x) (2+(x)->acache.s_same+(x)->acache.s_near)

 /* Template instances. */
 #if XD3_BUILD_SLOW
 #define IF_BUILD_SLOW(x) x
 #else
 #define IF_BUILD_SLOW(x)
 #endif
 #if XD3_BUILD_FAST
 #define IF_BUILD_FAST(x) x
 #else
 #define IF_BUILD_FAST(x)
 #endif
 #if XD3_BUILD_FASTER
 #define IF_BUILD_FASTER(x) x
 #else
 #define IF_BUILD_FASTER(x)
 #endif
 #if XD3_BUILD_FASTEST
 #define IF_BUILD_FASTEST(x) x
 #else
 #define IF_BUILD_FASTEST(x)
 #endif
 #if XD3_BUILD_SOFT
 #define IF_BUILD_SOFT(x) x
 #else
 #define IF_BUILD_SOFT(x)
 #endif
 #if XD3_BUILD_DEFAULT
 #define IF_BUILD_DEFAULT(x) x
 #else
 #define IF_BUILD_DEFAULT(x)
 #endif

 /* Consume N bytes of input, only used by the decoder. */
 #define DECODE_INPUT(n)             \
   do {                              \
   stream->total_in += (xoff_t) (n); \
   stream->avail_in -= (n);          \
   stream->next_in  += (n);          \
   } while (0)

 /* Update the run-length state */
 #define NEXTRUN(c) do { if ((c) == run_c) { run_l += 1; } \
   else { run_c = (c); run_l = 1; } } while (0)

 /* This CPP-conditional stuff can be cleaned up... */
 #if XD3_DEBUG
 #define IF_DEBUG(x) x
 #else
 #define IF_DEBUG(x)
 #endif
 #if XD3_DEBUG > 1
 #define IF_DEBUG1(x) x
 #else
 #define IF_DEBUG1(x)
 #endif
 #if XD3_DEBUG > 2
 #define IF_DEBUG2(x) x
 #else
 #define IF_DEBUG2(x)
 #endif
 #if REGRESSION_TEST
 #define IF_REGRESSION(x) x
 #else
 #define IF_REGRESSION(x)
 #endif

 /***********************************************************************/

 #if XD3_ENCODER
 static void*       xd3_alloc0 (xd3_stream *stream,
 			       usize_t      elts,
 			       usize_t      size);


 static xd3_output* xd3_alloc_output (xd3_stream *stream,
 				     xd3_output *old_output);

 static int         xd3_alloc_iopt (xd3_stream *stream, int elts);

 static void        xd3_free_output (xd3_stream *stream,
 				    xd3_output *output);

 static int         xd3_emit_byte (xd3_stream  *stream,
 				  xd3_output **outputp,
 				  uint8_t      code);

 static int         xd3_emit_bytes (xd3_stream     *stream,
 				   xd3_output    **outputp,
 				   const uint8_t  *base,
 				   usize_t          size);

 static int         xd3_emit_double (xd3_stream *stream, xd3_rinst *first,
 				    xd3_rinst *second, usize_t code);
 static int         xd3_emit_single (xd3_stream *stream, xd3_rinst *single,
 				    usize_t code);

 static usize_t      xd3_sizeof_output (xd3_output *output);
 static void        xd3_encode_reset (xd3_stream *stream);

 static int         xd3_source_match_setup (xd3_stream *stream, xoff_t srcpos);
 static int         xd3_source_extend_match (xd3_stream *stream);
 static int         xd3_srcwin_setup (xd3_stream *stream);
 static usize_t     xd3_iopt_last_matched (xd3_stream *stream);
 static int         xd3_emit_uint32_t (xd3_stream *stream, xd3_output **output,
 				      uint32_t num);

 static usize_t xd3_smatch (xd3_stream *stream,
 			   usize_t base,
 			   usize_t scksum,
 			   usize_t *match_offset);
 static int xd3_string_match_init (xd3_stream *stream);
 static uint32_t xd3_scksum (uint32_t *state, const uint8_t *seg, const int ln);
 static int xd3_comprun (const uint8_t *seg, int slook, uint8_t *run_cp);
 static int xd3_srcwin_move_point (xd3_stream *stream,
 				  usize_t *next_move_point);

 static int xd3_emit_run (xd3_stream *stream, usize_t pos,
 			 usize_t size, uint8_t run_c);
 static usize_t xd3_checksum_hash (const xd3_hash_cfg *cfg,
 				  const usize_t cksum);
 static xoff_t xd3_source_cksum_offset(xd3_stream *stream, usize_t low);
 static void xd3_scksum_insert (xd3_stream *stream,
 			       usize_t inx,
 			       usize_t scksum,
 			       usize_t pos);


 #if XD3_DEBUG
 static void xd3_verify_run_state (xd3_stream    *stream,
 				  const uint8_t *inp,
 				  int            x_run_l,
 				  uint8_t        x_run_c);
 static void xd3_verify_large_state (xd3_stream *stream,
 				    const uint8_t *inp,
 				    uint32_t x_cksum);
 static void xd3_verify_small_state (xd3_stream    *stream,
 				    const uint8_t *inp,
 				    uint32_t          x_cksum);

 #endif /* XD3_DEBUG */
 #endif /* XD3_ENCODER */

 static int         xd3_decode_allocate (xd3_stream *stream, usize_t size,
 					uint8_t **copied1, usize_t *alloc1);

 static void        xd3_compute_code_table_string (const xd3_dinst *code_table,
 						  uint8_t *str);
 static void*       xd3_alloc (xd3_stream *stream, usize_t elts, usize_t size);
 static void        xd3_free  (xd3_stream *stream, void *ptr);

 static int         xd3_read_uint32_t (xd3_stream *stream, const uint8_t **inpp,
 				      const uint8_t *max, uint32_t *valp);

 #if REGRESSION_TEST
 static int         xd3_selftest      (void);
 #endif

 /***********************************************************************/

 #define UINT32_OFLOW_MASK 0xfe000000U
 #define UINT64_OFLOW_MASK 0xfe00000000000000ULL

 #ifndef UINT32_MAX
 #define UINT32_MAX 4294967295U
 #endif

 #ifndef UINT64_MAX
 #define UINT64_MAX 18446744073709551615ULL
 #endif

 #if SIZEOF_USIZE_T == 4
 #define USIZE_T_MAX        UINT32_MAX
 #define xd3_decode_size   xd3_decode_uint32_t
 #define xd3_emit_size     xd3_emit_uint32_t
 #define xd3_sizeof_size   xd3_sizeof_uint32_t
 #define xd3_read_size     xd3_read_uint32_t
 #elif SIZEOF_USIZE_T == 8
 #define USIZE_T_MAX        UINT64_MAX
 #define xd3_decode_size   xd3_decode_uint64_t
 #define xd3_emit_size     xd3_emit_uint64_t
 #define xd3_sizeof_size   xd3_sizeof_uint64_t
 #define xd3_read_size     xd3_read_uint64_t
 #endif

 #if SIZEOF_XOFF_T == 4
 #define XOFF_T_MAX        UINT32_MAX
 #define xd3_decode_offset xd3_decode_uint32_t
 #define xd3_emit_offset   xd3_emit_uint32_t
 #elif SIZEOF_XOFF_T == 8
 #define XOFF_T_MAX        UINT64_MAX
 #define xd3_decode_offset xd3_decode_uint64_t
 #define xd3_emit_offset   xd3_emit_uint64_t
 #endif

 #define USIZE_T_OVERFLOW(a,b) ((USIZE_T_MAX - (usize_t) (a)) < (usize_t) (b))
 #define XOFF_T_OVERFLOW(a,b) ((XOFF_T_MAX - (xoff_t) (a)) < (xoff_t) (b))

 const char* xd3_strerror (int ret)
 {
   switch (ret)
     {
     case XD3_INPUT: return "XD3_INPUT";
     case XD3_OUTPUT: return "XD3_OUTPUT";
     case XD3_GETSRCBLK: return "XD3_GETSRCBLK";
     case XD3_GOTHEADER: return "XD3_GOTHEADER";
     case XD3_WINSTART: return "XD3_WINSTART";
     case XD3_WINFINISH: return "XD3_WINFINISH";
     case XD3_TOOFARBACK: return "XD3_TOOFARBACK";
     case XD3_INTERNAL: return "XD3_INTERNAL";
     case XD3_INVALID_INPUT: return "XD3_INVALID_INPUT";
     }
   return NULL;
 }

 /***********************************************************************/

 #define xd3_sec_data(s) ((s)->sec_stream_d)
 #define xd3_sec_inst(s) ((s)->sec_stream_i)
 #define xd3_sec_addr(s) ((s)->sec_stream_a)

 struct _xd3_sec_type
 {
   int         id;
   const char *name;
   xd3_secondary_flags flags;

   /* xd3_sec_stream is opaque to the generic code */
   xd3_sec_stream* (*alloc)   (xd3_stream     *stream);
   void            (*destroy) (xd3_stream     *stream,
 			      xd3_sec_stream *sec);
   void            (*init)    (xd3_sec_stream *sec);
   int             (*decode)  (xd3_stream     *stream,
 			      xd3_sec_stream *sec_stream,
 			      const uint8_t **input,
 			      const uint8_t  *input_end,
 			      uint8_t       **output,
 			      const uint8_t  *output_end);
 #if XD3_ENCODER
   int             (*encode)  (xd3_stream     *stream,
 			      xd3_sec_stream *sec_stream,
 			      xd3_output     *input,
 			      xd3_output     *output,
 			      xd3_sec_cfg    *cfg);
 #endif
 };

 #define BIT_STATE_ENCODE_INIT { 0, 1 }
 #define BIT_STATE_DECODE_INIT { 0, 0x100 }

 typedef struct _bit_state bit_state;
 struct _bit_state
 {
   usize_t cur_byte;
   usize_t cur_mask;
 };

 #if SECONDARY_ANY == 0
 #define IF_SEC(x)
 #define IF_NSEC(x) x
 #else /* yuck */
 #define IF_SEC(x) x
 #define IF_NSEC(x)
 static int
 xd3_decode_secondary (xd3_stream      *stream,
 		      xd3_desect      *sect,
 		      xd3_sec_stream **sec_streamp);
 #if XD3_ENCODER
 static int
 xd3_encode_secondary (xd3_stream      *stream,
 		      xd3_output     **head,
 		      xd3_output     **tail,
 		      xd3_sec_stream **sec_streamp,
 		      xd3_sec_cfg     *cfg,
 		      int             *did_it);
 #endif
 #endif /* SECONDARY_ANY */

 #if SECONDARY_FGK
 static const xd3_sec_type fgk_sec_type;
 #define IF_FGK(x) x
 #define FGK_CASE(s) \
   s->sec_type = & fgk_sec_type; \
   break;
 #else
 #define IF_FGK(x)
 #define FGK_CASE(s) \
   s->msg = "unavailable secondary compressor: FGK Adaptive Huffman"; \
   return XD3_INTERNAL;
 #endif

 #if SECONDARY_DJW
 static const xd3_sec_type djw_sec_type;
 #define IF_DJW(x) x
 #define DJW_CASE(s) \
   s->sec_type = & djw_sec_type; \
   break;
 #else
 #define IF_DJW(x)
 #define DJW_CASE(s) \
   s->msg = "unavailable secondary compressor: DJW Static Huffman"; \
   return XD3_INTERNAL;
 #endif

 /***********************************************************************/

 #include "xdelta3-hash.h"

 /* Process template passes - this includes xdelta3.c several times. */
 #define __XDELTA3_C_TEMPLATE_PASS__
 #include "xdelta3-cfgs.h"
 #undef __XDELTA3_C_TEMPLATE_PASS__

 /* Process the inline pass. */
 #define __XDELTA3_C_INLINE_PASS__
 #include "xdelta3.c"
 #undef __XDELTA3_C_INLINE_PASS__

 /* Secondary compression */
 #if SECONDARY_ANY
 #include "xdelta3-second.h"
 #endif

 #if SECONDARY_FGK
 #include "xdelta3-fgk.h"
 static const xd3_sec_type fgk_sec_type =
 {
   VCD_FGK_ID,
   "FGK Adaptive Huffman",
   SEC_NOFLAGS,
   (xd3_sec_stream* (*)())  fgk_alloc,
   (void (*)())             fgk_destroy,
   (void (*)())             fgk_init,
   (int (*)())              xd3_decode_fgk,
   IF_ENCODER((int (*)())   xd3_encode_fgk)
 };
 #endif

 #if SECONDARY_DJW
 #include "xdelta3-djw.h"
 static const xd3_sec_type djw_sec_type =
 {
   VCD_DJW_ID,
   "Static Huffman",
   SEC_COUNT_FREQS,
   (xd3_sec_stream* (*)())  djw_alloc,
   (void (*)())             djw_destroy,
   (void (*)())             djw_init,
   (int (*)())              xd3_decode_huff,
   IF_ENCODER((int (*)())   xd3_encode_huff)
 };
 #endif

 #if XD3_MAIN || PYTHON_MODULE || SWIG_MODULE || NOT_MAIN
 #include "xdelta3-main.h"
 #endif

 #if REGRESSION_TEST
 #include "xdelta3-test.h"
 #endif

 #if PYTHON_MODULE
 #include "xdelta3-python.h"
 #endif

 #endif /* __XDELTA3_C_HEADER_PASS__ */
 #ifdef __XDELTA3_C_INLINE_PASS__

 /****************************************************************
  Instruction tables
  *****************************************************************/

 /* The following code implements a parametrized description of the
  * code table given above for a few reasons.  It is not necessary for
  * implementing the standard, to support compression with variable
  * tables, so an implementation is only required to know the default
  * code table to begin decompression.  (If the encoder uses an
  * alternate table, the table is included in compressed form inside
  * the VCDIFF file.)
  *
  * Before adding variable-table support there were two functions which
  * were hard-coded to the default table above.
  * xd3_compute_default_table() would create the default table by
  * filling a 256-elt array of xd3_dinst values.  The corresponding
  * function, xd3_choose_instruction(), would choose an instruction
  * based on the hard-coded parameters of the default code table.
  *
  * Notes: The parametrized code table description here only generates
  * tables of a certain regularity similar to the default table by
  * allowing to vary the distribution of single- and
  * double-instructions and change the number of near and same copy
  * modes.  More exotic tables are only possible by extending this
  * code.
  *
  * For performance reasons, both the parametrized and non-parametrized
  * versions of xd3_choose_instruction remain.  The parametrized
  * version is only needed for testing multi-table decoding support.
  * If ever multi-table encoding is required, this can be optimized by
  * compiling static functions for each table.
  */

 /* The XD3_CHOOSE_INSTRUCTION calls xd3_choose_instruction with the
  * table description when GENERIC_ENCODE_TABLES are in use.  The
  * IF_GENCODETBL macro enables generic-code-table specific code. */
 #if GENERIC_ENCODE_TABLES
 #define XD3_CHOOSE_INSTRUCTION(stream,prev,inst) xd3_choose_instruction (stream->code_table_desc, prev, inst)
 #define IF_GENCODETBL(x) x
 #else
 #define XD3_CHOOSE_INSTRUCTION(stream,prev,inst) xd3_choose_instruction (prev, inst)
 #define IF_GENCODETBL(x)
 #endif

 /* This structure maintains information needed by
  * xd3_choose_instruction to compute the code for a double instruction
  * by first indexing an array of code_table_sizes by copy mode, then
  * using (offset + (muliplier * X)) */
 struct _xd3_code_table_sizes {
   uint8_t cpy_max;
   uint8_t offset;
   uint8_t mult;
 };

 /* This contains a complete description of a code table. */
 struct _xd3_code_table_desc
 {
   /* Assumes a single RUN instruction */
   /* Assumes that MIN_MATCH is 4 */

   uint8_t add_sizes;            /* Number of immediate-size single adds (default 17) */
   uint8_t near_modes;           /* Number of near copy modes (default 4) */
   uint8_t same_modes;           /* Number of same copy modes (default 3) */
   uint8_t cpy_sizes;            /* Number of immediate-size single copies (default 15) */

   uint8_t addcopy_add_max;      /* Maximum add size for an add-copy double instruction,
 				   all modes (default 4) */
   uint8_t addcopy_near_cpy_max; /* Maximum cpy size for an add-copy double instruction,
 				   up through VCD_NEAR modes (default 6) */
   uint8_t addcopy_same_cpy_max; /* Maximum cpy size for an add-copy double instruction,
 				   VCD_SAME modes (default 4) */

   uint8_t copyadd_add_max;      /* Maximum add size for a copy-add double instruction,
 				   all modes (default 1) */
   uint8_t copyadd_near_cpy_max; /* Maximum cpy size for a copy-add double instruction,
 				   up through VCD_NEAR modes (default 4) */
   uint8_t copyadd_same_cpy_max; /* Maximum cpy size for a copy-add double instruction,
 				   VCD_SAME modes (default 4) */

   xd3_code_table_sizes addcopy_max_sizes[MAX_MODES];
   xd3_code_table_sizes copyadd_max_sizes[MAX_MODES];
 };

 /* The rfc3284 code table is represented: */
 static const xd3_code_table_desc __rfc3284_code_table_desc = {
   17, /* add sizes */
   4,  /* near modes */
   3,  /* same modes */
   15, /* copy sizes */

   4,  /* add-copy max add */
   6,  /* add-copy max cpy, near */
   4,  /* add-copy max cpy, same */

   1,  /* copy-add max add */
   4,  /* copy-add max cpy, near */
   4,  /* copy-add max cpy, same */

   /* addcopy */
   { {6,163,3},{6,175,3},{6,187,3},{6,199,3},{6,211,3},{6,223,3},{4,235,1},{4,239,1},{4,243,1} },
   /* copyadd */
   { {4,247,1},{4,248,1},{4,249,1},{4,250,1},{4,251,1},{4,252,1},{4,253,1},{4,254,1},{4,255,1} },
 };

 #if GENERIC_ENCODE_TABLES
 /* An alternate code table for testing (5 near, 0 same):
  *
  *         TYPE      SIZE     MODE    TYPE     SIZE     MODE     INDEX
  *        ---------------------------------------------------------------
  *     1.  Run         0        0     Noop       0        0        0
  *     2.  Add    0, [1,23]     0     Noop       0        0      [1,24]
  *     3.  Copy   0, [4,20]     0     Noop       0        0     [25,42]
  *     4.  Copy   0, [4,20]     1     Noop       0        0     [43,60]
  *     5.  Copy   0, [4,20]     2     Noop       0        0     [61,78]
  *     6.  Copy   0, [4,20]     3     Noop       0        0     [79,96]
  *     7.  Copy   0, [4,20]     4     Noop       0        0     [97,114]
  *     8.  Copy   0, [4,20]     5     Noop       0        0    [115,132]
  *     9.  Copy   0, [4,20]     6     Noop       0        0    [133,150]
  *    10.  Add       [1,4]      0     Copy     [4,6]      0    [151,162]
  *    11.  Add       [1,4]      0     Copy     [4,6]      1    [163,174]
  *    12.  Add       [1,4]      0     Copy     [4,6]      2    [175,186]
  *    13.  Add       [1,4]      0     Copy     [4,6]      3    [187,198]
  *    14.  Add       [1,4]      0     Copy     [4,6]      4    [199,210]
  *    15.  Add       [1,4]      0     Copy     [4,6]      5    [211,222]
  *    16.  Add       [1,4]      0     Copy     [4,6]      6    [223,234]
  *    17.  Copy        4      [0,6]   Add      [1,3]      0    [235,255]
  *        --------------------------------------------------------------- */
 static const xd3_code_table_desc __alternate_code_table_desc = {
   23, /* add sizes */
   5,  /* near modes */
   0,  /* same modes */
   17, /* copy sizes */

   4,  /* add-copy max add */
   6,  /* add-copy max cpy, near */
   0,  /* add-copy max cpy, same */

   3,  /* copy-add max add */
   4,  /* copy-add max cpy, near */
   0,  /* copy-add max cpy, same */

   /* addcopy */
   { {6,151,3},{6,163,3},{6,175,3},{6,187,3},{6,199,3},{6,211,3},{6,223,3},{0,0,0},{0,0,0} },
   /* copyadd */
   { {4,235,1},{4,238,1},{4,241,1},{4,244,1},{4,247,1},{4,250,1},{4,253,1},{0,0,0},{0,0,0} },
 };
 #endif

 /* Computes code table entries of TBL using the specified description. */
 static void
 xd3_build_code_table (const xd3_code_table_desc *desc, xd3_dinst *tbl)
 {
   usize_t size1, size2, mode;
   usize_t cpy_modes = 2 + desc->near_modes + desc->same_modes;
   xd3_dinst *d = tbl;

   (d++)->type1 = XD3_RUN;
   (d++)->type1 = XD3_ADD;

   for (size1 = 1; size1 <= desc->add_sizes; size1 += 1, d += 1)
     {
       d->type1 = XD3_ADD;
       d->size1 = size1;
     }

   for (mode = 0; mode < cpy_modes; mode += 1)
     {
       (d++)->type1 = XD3_CPY + mode;

       for (size1 = MIN_MATCH; size1 < MIN_MATCH + desc->cpy_sizes; size1 += 1, d += 1)
 	{
 	  d->type1 = XD3_CPY + mode;
 	  d->size1 = size1;
 	}
     }

   for (mode = 0; mode < cpy_modes; mode += 1)
     {
       for (size1 = 1; size1 <= desc->addcopy_add_max; size1 += 1)
 	{
 	  usize_t max = (mode < 2U + desc->near_modes) ?
 	    desc->addcopy_near_cpy_max :
 	    desc->addcopy_same_cpy_max;

 	  for (size2 = MIN_MATCH; size2 <= max; size2 += 1, d += 1)
 	    {
 	      d->type1 = XD3_ADD;
 	      d->size1 = size1;
 	      d->type2 = XD3_CPY + mode;
 	      d->size2 = size2;
 	    }
 	}
     }

   for (mode = 0; mode < cpy_modes; mode += 1)
     {
       usize_t max = (mode < 2U + desc->near_modes) ?
 	desc->copyadd_near_cpy_max :
 	desc->copyadd_same_cpy_max;

       for (size1 = MIN_MATCH; size1 <= max; size1 += 1)
 	{
 	  for (size2 = 1; size2 <= desc->copyadd_add_max; size2 += 1, d += 1)
 	    {
 	      d->type1 = XD3_CPY + mode;
 	      d->size1 = size1;
 	      d->type2 = XD3_ADD;
 	      d->size2 = size2;
 	    }
 	}
     }

   XD3_ASSERT (d - tbl == 256);
 }

 /* This function generates the static default code table. */
 static const xd3_dinst*
 xd3_rfc3284_code_table (void)
 {
   static xd3_dinst __rfc3284_code_table[256];

   if (__rfc3284_code_table[0].type1 != XD3_RUN)
     {
       xd3_build_code_table (& __rfc3284_code_table_desc, __rfc3284_code_table);
     }

   return __rfc3284_code_table;
 }

 #if XD3_ENCODER
 #if GENERIC_ENCODE_TABLES
 /* This function generates the alternate code table. */
 static const xd3_dinst*
 xd3_alternate_code_table (void)
 {
   static xd3_dinst __alternate_code_table[256];

   if (__alternate_code_table[0].type1 != XD3_RUN)
     {
       xd3_build_code_table (& __alternate_code_table_desc, __alternate_code_table);
     }

   return __alternate_code_table;
 }

 /* This function computes the ideal second instruction INST based on
  * preceding instruction PREV.  If it is possible to issue a double
  * instruction based on this pair it sets PREV->code2, otherwise it
  * sets INST->code1. */
 static void
 xd3_choose_instruction (const xd3_code_table_desc *desc, xd3_rinst *prev, xd3_rinst *inst)
 {
   switch (inst->type)
     {
     case XD3_RUN:
       /* The 0th instruction is RUN */
       inst->code1 = 0;
       break;

     case XD3_ADD:

       if (inst->size > desc->add_sizes)
 	{
 	  /* The first instruction is non-immediate ADD */
 	  inst->code1 = 1;
 	}
       else
 	{
 	  /* The following ADD_SIZES instructions are immediate ADDs */
 	  inst->code1 = 1 + inst->size;

 	  /* Now check for a possible COPY-ADD double instruction */
 	  if (prev != NULL)
 	    {
 	      int prev_mode = prev->type - XD3_CPY;

 	      /* If previous is a copy.  Note: as long as the previous
 	       * is not a RUN instruction, it should be a copy because
 	       * it cannot be an add.  This check is more clear. */
 	      if (prev_mode >= 0 && inst->size <= desc->copyadd_add_max)
 		{
 		  const xd3_code_table_sizes *sizes = & desc->copyadd_max_sizes[prev_mode];

 		  /* This check and the inst->size-<= above are == in
 		     the default table. */
 		  if (prev->size <= sizes->cpy_max)
 		    {
 		      /* The second and third exprs are 0 in the
 			 default table. */
 		      prev->code2 = sizes->offset +
 			(sizes->mult * (prev->size - MIN_MATCH)) +
 			(inst->size - MIN_ADD);
 		    }
 		}
 	    }
 	}
       break;

     default:
       {
 	int mode = inst->type - XD3_CPY;

 	/* The large copy instruction is offset by the run, large add,
 	 * and immediate adds, then multipled by the number of
 	 * immediate copies plus one (the large copy) (i.e., if there
 	 * are 15 immediate copy instructions then there are 16 copy
 	 * instructions per mode). */
 	inst->code1 = 2 + desc->add_sizes + (1 + desc->cpy_sizes) * mode;

 	/* Now if the copy is short enough for an immediate instruction. */
 	if (inst->size < MIN_MATCH + desc->cpy_sizes &&
 	    /* TODO: there needs to be a more comprehensive test for this
 	     * boundary condition, merge is now exercising code in which
 	     * size < MIN_MATCH is possible and it's unclear if the above
 	     * size < (MIN_MATCH + cpy_sizes) should be a <= from inspection
 	     * of the default table version below. */
 	    inst->size >= MIN_MATCH)
 	  {
 	    inst->code1 += inst->size + 1 - MIN_MATCH;

 	    /* Now check for a possible ADD-COPY double instruction. */
 	    if ( (prev != NULL) &&
 		 (prev->type == XD3_ADD) &&
 		 (prev->size <= desc->addcopy_add_max) )
 	      {
 		const xd3_code_table_sizes *sizes = & desc->addcopy_max_sizes[mode];

 		if (inst->size <= sizes->cpy_max)
 		  {
 		    prev->code2 = sizes->offset +
 		      (sizes->mult * (prev->size - MIN_ADD)) +
 		      (inst->size - MIN_MATCH);
 		  }
 	      }
 	  }
       }
     }
 }
 #else /* GENERIC_ENCODE_TABLES */

 /* This version of xd3_choose_instruction is hard-coded for the default
    table. */
 static void
 xd3_choose_instruction (xd3_rinst *prev, xd3_rinst *inst)
 {
   switch (inst->type)
     {
     case XD3_RUN:
       inst->code1 = 0;
       break;

     case XD3_ADD:
       inst->code1 = 1;

       if (inst->size <= 17)
 	{
 	  inst->code1 += inst->size;

 	  if ( (inst->size == 1) &&
 	       (prev != NULL) &&
 	       (prev->size == 4) &&
 	       (prev->type >= XD3_CPY) )
 	    {
 	      prev->code2 = 247 + (prev->type - XD3_CPY);
 	    }
 	}

       break;

     default:
       {
 	int mode = inst->type - XD3_CPY;

 	XD3_ASSERT (inst->type >= XD3_CPY && inst->type < 12);

 	inst->code1 = 19 + 16 * mode;

 	if (inst->size <= 18 && inst->size >= 4)
 	  {
 	    inst->code1 += inst->size - 3;

 	    if ( (prev != NULL) &&
 		 (prev->type == XD3_ADD) &&
 		 (prev->size <= 4) )
 	      {
 		if ( (inst->size <= 6) &&
 		     (mode       <= 5) )
 		  {
 		    prev->code2 = 163 + (mode * 12) + (3 * (prev->size - 1)) + (inst->size - 4);

 		    XD3_ASSERT (prev->code2 <= 234);
 		  }
 		else if ( (inst->size == 4) &&
 			  (mode       >= 6) )
 		  {
 		    prev->code2 = 235 + ((mode - 6) * 4) + (prev->size - 1);

 		    XD3_ASSERT (prev->code2 <= 246);
 		  }
 	      }
 	  }

 	XD3_ASSERT (inst->code1 <= 162);
       }
       break;
     }
 }
 #endif /* GENERIC_ENCODE_TABLES */

 /***********************************************************************
  Instruction table encoder/decoder
  ***********************************************************************/

 #if GENERIC_ENCODE_TABLES
 #if GENERIC_ENCODE_TABLES_COMPUTE == 0

 /* In this case, we hard-code the result of
  * compute_code_table_encoding for each alternate code table,
  * presuming that saves time/space.  This has been 131 bytes, but
  * secondary compression was turned off. */
 static const uint8_t __alternate_code_table_compressed[178] =
 {0xd6,0xc3,0xc4,0x00,0x00,0x01,0x8a,0x6f,0x40,0x81,0x27,0x8c,0x00,0x00,0x4a,0x4a,0x0d,0x02,0x01,0x03,
 0x01,0x03,0x00,0x01,0x00,0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0a,0x0b,0x0c,0x0d,0x0e,
 0x0f,0x10,0x11,0x12,0x13,0x14,0x15,0x16,0x17,0x00,0x01,0x01,0x01,0x02,0x02,0x02,0x03,0x03,0x03,0x04,
 0x04,0x04,0x04,0x00,0x04,0x05,0x06,0x01,0x02,0x03,0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x05,0x05,0x05,
 0x06,0x06,0x06,0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x00,0x02,0x00,0x18,0x13,0x63,0x00,0x1b,0x00,0x54,
 0x00,0x15,0x23,0x6f,0x00,0x28,0x13,0x54,0x00,0x15,0x01,0x1a,0x31,0x23,0x6c,0x0d,0x23,0x48,0x00,0x15,
 0x93,0x6f,0x00,0x28,0x04,0x23,0x51,0x04,0x32,0x00,0x2b,0x00,0x12,0x00,0x12,0x00,0x12,0x00,0x12,0x00,
 0x12,0x00,0x12,0x53,0x57,0x9c,0x07,0x43,0x6f,0x00,0x34,0x00,0x0c,0x00,0x0c,0x00,0x0c,0x00,0x0c,0x00,
 0x0c,0x00,0x0c,0x00,0x15,0x00,0x82,0x6f,0x00,0x15,0x12,0x0c,0x00,0x03,0x03,0x00,0x06,0x00,};

 static int
 xd3_compute_alternate_table_encoding (xd3_stream *stream, const uint8_t **data, usize_t *size)
 {
   (*data) = __alternate_code_table_compressed;
   (*size) = sizeof (__alternate_code_table_compressed);
   return 0;
 }

 #else

 /* The alternate code table will be computed and stored here. */
 static uint8_t __alternate_code_table_compressed[CODE_TABLE_VCDIFF_SIZE];
 static usize_t  __alternate_code_table_compressed_size;

 /* This function generates a delta describing the code table for
  * encoding within a VCDIFF file.  This function is NOT thread safe
  * because it is only intended that this function is used to generate
  * statically-compiled strings. */
 int xd3_compute_code_table_encoding (xd3_stream *in_stream,
 				     const xd3_dinst *code_table,
 				     uint8_t *comp_string,
 				     usize_t *comp_string_size)
 {
   /* TODO: use xd3_encode_memory() */
   uint8_t dflt_string[CODE_TABLE_STRING_SIZE];
   uint8_t code_string[CODE_TABLE_STRING_SIZE];
   xd3_stream stream;
   xd3_source source;
   xd3_config config;
   int ret;

   memset (& source, 0, sizeof (source));

   xd3_compute_code_table_string (xd3_rfc3284_code_table (), dflt_string);
   xd3_compute_code_table_string (code_table, code_string);

   /* Use DJW secondary compression if it is on by default.  This saves
    * about 20 bytes. */
   xd3_init_config (& config, XD3_FLUSH | (SECONDARY_DJW ? XD3_SEC_DJW : 0));

   /* Be exhaustive. */
   config.sprevsz = 1<<11;
   config.srcwin_maxsz = CODE_TABLE_STRING_SIZE;

   config.smatch_cfg = XD3_SMATCH_SOFT;
   config.smatcher_soft.large_look    = 4;
   config.smatcher_soft.large_step    = 1;
   config.smatcher_soft.small_look    = 4;
   config.smatcher_soft.small_chain   = CODE_TABLE_STRING_SIZE;
   config.smatcher_soft.small_lchain  = CODE_TABLE_STRING_SIZE;
   config.smatcher_soft.max_lazy      = CODE_TABLE_STRING_SIZE;
   config.smatcher_soft.long_enough   = CODE_TABLE_STRING_SIZE;

   if ((ret = xd3_config_stream (& stream, & config))) { goto fail; }

   source.size     = CODE_TABLE_STRING_SIZE;
   source.blksize  = CODE_TABLE_STRING_SIZE;
   source.onblk    = CODE_TABLE_STRING_SIZE;
   source.name     = "";
   source.curblk   = dflt_string;
   source.curblkno = 0;

   if ((ret = xd3_set_source (& stream, & source))) { goto fail; }

   if ((ret = xd3_encode_stream (& stream, code_string, CODE_TABLE_STRING_SIZE,
 				comp_string, comp_string_size, CODE_TABLE_VCDIFF_SIZE))) { goto fail; }

  fail:

   in_stream->msg = stream.msg;
   xd3_free_stream (& stream);
   return ret;
 }

 /* Compute a delta between alternate and rfc3284 tables.  As soon as
  * another alternate table is added, this code should become generic.
  * For now there is only one alternate table for testing. */
 static int
 xd3_compute_alternate_table_encoding (xd3_stream *stream, const uint8_t **data, usize_t *size)
 {
   int ret;

   if (__alternate_code_table_compressed[0] == 0)
     {
       if ((ret = xd3_compute_code_table_encoding (stream, xd3_alternate_code_table (),
 						  __alternate_code_table_compressed,
 						  & __alternate_code_table_compressed_size)))
 	{
 	  return ret;
 	}

       /* During development of a new code table, enable this variable to print
        * the new static contents and determine its size.  At run time the
        * table will be filled in appropriately, but at least it should have
        * the proper size beforehand. */
 #if GENERIC_ENCODE_TABLES_COMPUTE_PRINT
       {
 	int i;

 	DP(RINT, "\nstatic const usize_t __alternate_code_table_compressed_size = %u;\n",
 		 __alternate_code_table_compressed_size);

 	DP(RINT, "static const uint8_t __alternate_code_table_compressed[%u] =\n{",
 		 __alternate_code_table_compressed_size);

 	for (i = 0; i < __alternate_code_table_compressed_size; i += 1)
 	  {
 	    DP(RINT, "0x%02x,", __alternate_code_table_compressed[i]);
 	    if ((i % 20) == 19) { DP(RINT, "\n"); }
 	  }

 	DP(RINT, "};\n");
       }
 #endif
     }

   (*data) = __alternate_code_table_compressed;
   (*size) = __alternate_code_table_compressed_size;

   return 0;
 }
 #endif /* GENERIC_ENCODE_TABLES_COMPUTE != 0 */
 #endif /* GENERIC_ENCODE_TABLES */

 #endif /* XD3_ENCODER */

 /* This function generates the 1536-byte string specified in sections 5.4 and
  * 7 of rfc3284, which is used to represent a code table within a VCDIFF
  * file. */
 void xd3_compute_code_table_string (const xd3_dinst *code_table, uint8_t *str)
 {
   int i, s;

   XD3_ASSERT (CODE_TABLE_STRING_SIZE == 6 * 256);

   for (s = 0; s < 6; s += 1)
     {
       for (i = 0; i < 256; i += 1)
 	{
 	  switch (s)
 	    {
 	    case 0: *str++ = (code_table[i].type1 >= XD3_CPY ? XD3_CPY : code_table[i].type1); break;
 	    case 1: *str++ = (code_table[i].type2 >= XD3_CPY ? XD3_CPY : code_table[i].type2); break;
 	    case 2: *str++ = (code_table[i].size1); break;
 	    case 3: *str++ = (code_table[i].size2); break;
 	    case 4: *str++ = (code_table[i].type1 >= XD3_CPY ? code_table[i].type1 - XD3_CPY : 0); break;
 	    case 5: *str++ = (code_table[i].type2 >= XD3_CPY ? code_table[i].type2 - XD3_CPY : 0); break;
 	    }
 	}
     }
 }

 /* This function translates the code table string into the internal representation.  The
  * stream's near and same-modes should already be set. */
 static int
 xd3_apply_table_string (xd3_stream *stream, const uint8_t *code_string)
 {
   int i, s;
   int modes = TOTAL_MODES (stream);
   xd3_dinst *code_table;

   if ((code_table = stream->code_table_alloc =
        (xd3_dinst*) xd3_alloc (stream, sizeof (xd3_dinst), 256)) == NULL)
     {
       return ENOMEM;
     }

   for (s = 0; s < 6; s += 1)
     {
       for (i = 0; i < 256; i += 1)
 	{
 	  switch (s)
 	    {
 	    case 0:
 	      if (*code_string > XD3_CPY)
 		{
 		  stream->msg = "invalid code-table opcode";
 		  return XD3_INTERNAL;
 		}
 	      code_table[i].type1 = *code_string++;
 	      break;
 	    case 1:
 	      if (*code_string > XD3_CPY)
 		{
 		  stream->msg = "invalid code-table opcode";
 		  return XD3_INTERNAL;
 		}
 	      code_table[i].type2 = *code_string++;
 	      break;
 	    case 2:
 	      if (*code_string != 0 && code_table[i].type1 == XD3_NOOP)
 		{
 		  stream->msg = "invalid code-table size";
 		  return XD3_INTERNAL;
 		}
 	      code_table[i].size1 = *code_string++;
 	      break;
 	    case 3:
 	      if (*code_string != 0 && code_table[i].type2 == XD3_NOOP)
 		{
 		  stream->msg = "invalid code-table size";
 		  return XD3_INTERNAL;
 		}
 	      code_table[i].size2 = *code_string++;
 	      break;
 	    case 4:
 	      if (*code_string >= modes)
 		{
 		  stream->msg = "invalid code-table mode";
 		  return XD3_INTERNAL;
 		}
 	      if (*code_string != 0 && code_table[i].type1 != XD3_CPY)
 		{
 		  stream->msg = "invalid code-table mode";
 		  return XD3_INTERNAL;
 		}
 	      code_table[i].type1 += *code_string++;
 	      break;
 	    case 5:
 	      if (*code_string >= modes)
 		{
 		  stream->msg = "invalid code-table mode";
 		  return XD3_INTERNAL;
 		}
 	      if (*code_string != 0 && code_table[i].type2 != XD3_CPY)
 		{
 		  stream->msg = "invalid code-table mode";
 		  return XD3_INTERNAL;
 		}
 	      code_table[i].type2 += *code_string++;
 	      break;
 	    }
 	}
     }

   stream->code_table = code_table;
   return 0;
 }

 /* This function applies a code table delta and returns an actual code table. */
 static int
 xd3_apply_table_encoding (xd3_stream *in_stream, const uint8_t *data, usize_t size)
 {
   uint8_t dflt_string[CODE_TABLE_STRING_SIZE];
   uint8_t code_string[CODE_TABLE_STRING_SIZE];
   usize_t code_size;
   xd3_stream stream;
   xd3_source source;
   int ret;

   /* The default code table string can be cached if alternate code tables ever become
    * popular. */
   xd3_compute_code_table_string (xd3_rfc3284_code_table (), dflt_string);

   source.size     = CODE_TABLE_STRING_SIZE;
   source.blksize  = CODE_TABLE_STRING_SIZE;
   source.onblk    = CODE_TABLE_STRING_SIZE;
   source.name     = "rfc3284 code table";
   source.curblk   = dflt_string;
   source.curblkno = 0;

   if ((ret = xd3_config_stream (& stream, NULL)) ||
       (ret = xd3_set_source (& stream, & source)) ||
       (ret = xd3_decode_stream (& stream, data, size, code_string, & code_size, sizeof (code_string))))
     {
       in_stream->msg = stream.msg;
       goto fail;
     }

   if (code_size != sizeof (code_string))
     {
       stream.msg = "corrupt code-table encoding";
       ret = XD3_INTERNAL;
       goto fail;
     }

   if ((ret = xd3_apply_table_string (in_stream, code_string))) { goto fail; }

  fail:

   xd3_free_stream (& stream);
   return ret;
 }

 /***********************************************************************/

 static inline void
 xd3_swap_uint8p (uint8_t** p1, uint8_t** p2)
 {
   uint8_t *t = (*p1);
   (*p1) = (*p2);
   (*p2) = t;
 }

 static inline void
 xd3_swap_usize_t (usize_t* p1, usize_t* p2)
 {
   usize_t t = (*p1);
   (*p1) = (*p2);
   (*p2) = t;
 }

 /* It's not constant time, but it computes the log. */
 static int
 xd3_check_pow2 (usize_t value, usize_t *logof)
 {
   usize_t x = 1;
   usize_t nolog;
   if (logof == NULL) {
     logof = &nolog;
   }

   *logof = 0;

   for (; x != 0; x <<= 1, *logof += 1)
     {
       if (x == value)
 	{
 	  return 0;
 	}
     }

   return XD3_INTERNAL;
 }

 static usize_t
 xd3_pow2_roundup (usize_t x)
 {
   usize_t i = 1;
   while (x > i) {
     i <<= 1;
   }
   return i;
 }

 static usize_t
 xd3_round_blksize (usize_t sz, usize_t blksz)
 {
   usize_t mod = sz & (blksz-1);

   XD3_ASSERT (xd3_check_pow2 (blksz, NULL) == 0);

   return mod ? (sz + (blksz - mod)) : sz;
 }

 /***********************************************************************
  Adler32 stream function: code copied from Zlib, defined in RFC1950
  ***********************************************************************/

 #define A32_BASE 65521L /* Largest prime smaller than 2^16 */
 #define A32_NMAX 5552   /* NMAX is the largest n such that 255n(n+1)/2 + (n+1)(BASE-1) <= 2^32-1 */

 #define A32_DO1(buf,i)  {s1 += buf[i]; s2 += s1;}
 #define A32_DO2(buf,i)  A32_DO1(buf,i); A32_DO1(buf,i+1);
 #define A32_DO4(buf,i)  A32_DO2(buf,i); A32_DO2(buf,i+2);
 #define A32_DO8(buf,i)  A32_DO4(buf,i); A32_DO4(buf,i+4);
 #define A32_DO16(buf)   A32_DO8(buf,0); A32_DO8(buf,8);

 static unsigned long adler32 (unsigned long adler, const uint8_t *buf, usize_t len)
 {
     unsigned long s1 = adler & 0xffff;
     unsigned long s2 = (adler >> 16) & 0xffff;
     int k;

     while (len > 0)
       {
         k    = (len < A32_NMAX) ? len : A32_NMAX;
         len -= k;

 	while (k >= 16)
 	  {
 	    A32_DO16(buf);
 	    buf += 16;
             k -= 16;
 	  }

 	if (k != 0)
 	  {
 	    do
 	      {
 		s1 += *buf++;
 		s2 += s1;
 	      }
 	    while (--k);
 	  }

         s1 %= A32_BASE;
         s2 %= A32_BASE;
     }

     return (s2 << 16) | s1;
 }

 /***********************************************************************
  Run-length function
  ***********************************************************************/

 #if XD3_ENCODER
 static int
 xd3_comprun (const uint8_t *seg, int slook, uint8_t *run_cp)
 {
   int i;
   int     run_l = 0;
   uint8_t run_c = 0;

   for (i = 0; i < slook; i += 1)
     {
       NEXTRUN(seg[i]);
     }

   (*run_cp) = run_c;

   return run_l;
 }
 #endif

 /***********************************************************************
  Basic encoder/decoder functions
  ***********************************************************************/

 static inline int
 xd3_decode_byte (xd3_stream *stream, usize_t *val)
 {
   if (stream->avail_in == 0)
     {
       stream->msg = "further input required";
       return XD3_INPUT;
     }

   (*val) = stream->next_in[0];

   DECODE_INPUT (1);
   return 0;
 }

 static inline int
 xd3_decode_bytes (xd3_stream *stream, uint8_t *buf, usize_t *pos, usize_t size)
 {
   usize_t want;
   usize_t take;

   /* Note: The case where (*pos == size) happens when a zero-length appheader or code
    * table is transmitted, but there is nothing in the standard against that. */

   while (*pos < size)
     {
       if (stream->avail_in == 0)
 	{
 	  stream->msg = "further input required";
 	  return XD3_INPUT;
 	}

       want = size - *pos;
       take = min (want, stream->avail_in);

       memcpy (buf + *pos, stream->next_in, take);

       DECODE_INPUT (take);
       (*pos) += take;
     }

   return 0;
 }

 #if XD3_ENCODER
 static inline int
 xd3_emit_byte (xd3_stream  *stream,
 	       xd3_output **outputp,
 	       uint8_t      code)
 {
   xd3_output *output = (*outputp);

   if (output->next == output->avail)
     {
       xd3_output *aoutput;

       if ((aoutput = xd3_alloc_output (stream, output)) == NULL)
 	{
 	  return ENOMEM;
 	}

       output = (*outputp) = aoutput;
     }

   output->base[output->next++] = code;

   return 0;
 }

 static inline int
 xd3_emit_bytes (xd3_stream     *stream,
 		xd3_output    **outputp,
 		const uint8_t  *base,
 		usize_t          size)
 {
   xd3_output *output = (*outputp);

   do
     {
       usize_t take;

       if (output->next == output->avail)
 	{
 	  xd3_output *aoutput;

 	  if ((aoutput = xd3_alloc_output (stream, output)) == NULL)
 	    {
 	      return ENOMEM;
 	    }

 	  output = (*outputp) = aoutput;
 	}

       take = min (output->avail - output->next, size);

       memcpy (output->base + output->next, base, take);

       output->next += take;
       size -= take;
       base += take;
     }
   while (size > 0);

   return 0;
 }
 #endif /* XD3_ENCODER */

 /*********************************************************************
  Integer encoder/decoder functions
  **********************************************************************/

 #define DECODE_INTEGER_TYPE(PART,OFLOW)                                \
   while (stream->avail_in != 0)                                        \
     {                                                                  \
       usize_t next = stream->next_in[0];                               \
                                                                        \
       DECODE_INPUT(1);                                                 \
                                                                        \
       if (PART & OFLOW)                                                \
 	{                                                              \
 	  stream->msg = "overflow in decode_integer";                  \
 	  return XD3_INVALID_INPUT;                                    \
 	}                                                              \
                                                                        \
       PART = (PART << 7) | (next & 127);                               \
                                                                        \
       if ((next & 128) == 0)                                           \
 	{                                                              \
 	  (*val) = PART;                                               \
 	  PART = 0;                                                    \
 	  return 0;                                                    \
 	}                                                              \
     }                                                                  \
                                                                        \
   stream->msg = "further input required";                              \
   return XD3_INPUT

 #define READ_INTEGER_TYPE(TYPE, OFLOW)                                 \
   TYPE val = 0;                                                        \
   const uint8_t *inp = (*inpp);                                        \
   usize_t next;                                                        \
                                                                        \
   do                                                                   \
     {                                                                  \
       if (inp == max)                                                  \
 	{                                                              \
 	  stream->msg = "end-of-input in read_integer";                \
 	  return XD3_INVALID_INPUT;                                    \
 	}                                                              \
                                                                        \
       if (val & OFLOW)                                                 \
 	{                                                              \
 	  stream->msg = "overflow in read_intger";                     \
 	  return XD3_INVALID_INPUT;                                    \
 	}                                                              \
                                                                        \
       next = (*inp++);                                                 \
       val  = (val << 7) | (next & 127);                                \
     }                                                                  \
   while (next & 128);                                                  \
                                                                        \
   (*valp) = val;                                                       \
   (*inpp) = inp;                                                       \
                                                                        \
   return 0

 #define EMIT_INTEGER_TYPE()                                            \
   /* max 64-bit value in base-7 encoding is 9.1 bytes */               \
   uint8_t buf[10];                                                     \
   usize_t  bufi = 10;                                                  \
                                                                        \
   XD3_ASSERT (num >= 0);                                               \
                                                                        \
   /* This loop performs division and turns on all MSBs. */             \
   do                                                                   \
     {                                                                  \
       buf[--bufi] = (num & 127) | 128;                                 \
       num >>= 7;                                                       \
     }                                                                  \
   while (num != 0);                                                    \
                                                                        \
   /* Turn off MSB of the last byte. */                                 \
   buf[9] &= 127;                                                       \
                                                                        \
   XD3_ASSERT (bufi >= 0);                                              \
                                                                        \
   return xd3_emit_bytes (stream, output, buf + bufi, 10 - bufi)

 #define IF_SIZEOF32(x) if (num < (1U   << (7 * (x)))) return (x);
 #define IF_SIZEOF64(x) if (num < (1ULL << (7 * (x)))) return (x);

 #if USE_UINT32
 static inline uint32_t
 xd3_sizeof_uint32_t (uint32_t num)
 {
   IF_SIZEOF32(1);
   IF_SIZEOF32(2);
   IF_SIZEOF32(3);
   IF_SIZEOF32(4);
   return 5;
 }

 static inline int
 xd3_decode_uint32_t (xd3_stream *stream, uint32_t *val)
 { DECODE_INTEGER_TYPE (stream->dec_32part, UINT32_OFLOW_MASK); }

 static inline int
 xd3_read_uint32_t (xd3_stream *stream, const uint8_t **inpp,
 		   const uint8_t *max, uint32_t *valp)
 { READ_INTEGER_TYPE (uint32_t, UINT32_OFLOW_MASK); }

 #if XD3_ENCODER
 static inline int
 xd3_emit_uint32_t (xd3_stream *stream, xd3_output **output, uint32_t num)
 { EMIT_INTEGER_TYPE (); }
 #endif
 #endif

 #if USE_UINT64
 static inline int
 xd3_decode_uint64_t (xd3_stream *stream, uint64_t *val)
 { DECODE_INTEGER_TYPE (stream->dec_64part, UINT64_OFLOW_MASK); }

 #if XD3_ENCODER
 static inline int
 xd3_emit_uint64_t (xd3_stream *stream, xd3_output **output, uint64_t num)
 { EMIT_INTEGER_TYPE (); }
 #endif

 /* These are tested but not used */
 #if REGRESSION_TEST
 static int
 xd3_read_uint64_t (xd3_stream *stream, const uint8_t **inpp,
 		   const uint8_t *max, uint64_t *valp)
 { READ_INTEGER_TYPE (uint64_t, UINT64_OFLOW_MASK); }

 static uint32_t
 xd3_sizeof_uint64_t (uint64_t num)
 {
   IF_SIZEOF64(1);
   IF_SIZEOF64(2);
   IF_SIZEOF64(3);
   IF_SIZEOF64(4);
   IF_SIZEOF64(5);
   IF_SIZEOF64(6);
   IF_SIZEOF64(7);
   IF_SIZEOF64(8);
   IF_SIZEOF64(9);

   return 10;
 }
 #endif

 #endif

 /***********************************************************************
  Address cache stuff
  ***********************************************************************/

 static int
 xd3_alloc_cache (xd3_stream *stream)
 {
   if (stream->acache.near_array != NULL)
     {
       xd3_free (stream, stream->acache.near_array);
     }

   if (stream->acache.same_array != NULL)
     {
       xd3_free (stream, stream->acache.same_array);
     }

   if (((stream->acache.s_near > 0) &&
        (stream->acache.near_array = (usize_t*)
 	xd3_alloc (stream, stream->acache.s_near, sizeof (usize_t)))
        == NULL) ||
       ((stream->acache.s_same > 0) &&
        (stream->acache.same_array = (usize_t*)
 	xd3_alloc (stream, stream->acache.s_same * 256, sizeof (usize_t)))
        == NULL))
     {
       return ENOMEM;
     }

   return 0;
 }

 void
 xd3_init_cache (xd3_addr_cache* acache)
 {
   if (acache->s_near > 0)
     {
       memset (acache->near_array, 0, acache->s_near * sizeof (usize_t));
       acache->next_slot = 0;
     }

   if (acache->s_same > 0)
     {
       memset (acache->same_array, 0, acache->s_same * 256 * sizeof (usize_t));
     }
 }

 static void
 xd3_update_cache (xd3_addr_cache* acache, usize_t addr)
 {
   if (acache->s_near > 0)
     {
       acache->near_array[acache->next_slot] = addr;
       acache->next_slot = (acache->next_slot + 1) % acache->s_near;
     }

   if (acache->s_same > 0)
     {
       acache->same_array[addr % (acache->s_same*256)] = addr;
     }
 }

 #if XD3_ENCODER
 /* OPT: this gets called a lot, can it be optimized? */
 static int
 xd3_encode_address (xd3_stream *stream, usize_t addr, usize_t here, uint8_t* mode)
 {
   usize_t d, bestd;
   usize_t i, bestm, ret;
   xd3_addr_cache* acache = & stream->acache;

 #define SMALLEST_INT(x) do { if (((x) & ~127) == 0) { goto good; } } while (0)

   /* Attempt to find the address mode that yields the smallest integer value
    * for "d", the encoded address value, thereby minimizing the encoded size
    * of the address. */
   bestd = addr;
   bestm = VCD_SELF;

   XD3_ASSERT (addr < here);

   SMALLEST_INT (bestd);

   if ((d = here-addr) < bestd)
     {
       bestd = d;
       bestm = VCD_HERE;

       SMALLEST_INT (bestd);
     }

   for (i = 0; i < acache->s_near; i += 1)
     {
       d = addr - acache->near_array[i];

       if (d >= 0 && d < bestd)
 	{
 	  bestd = d;
 	  bestm = i+2; /* 2 counts the VCD_SELF, VCD_HERE modes */

 	  SMALLEST_INT (bestd);
 	}
     }

   if (acache->s_same > 0 && acache->same_array[d = addr%(acache->s_same*256)] == addr)
     {
       bestd = d%256;
       bestm = acache->s_near + 2 + d/256; /* 2 + s_near offsets past the VCD_NEAR modes */

       if ((ret = xd3_emit_byte (stream, & ADDR_TAIL (stream), bestd))) { return ret; }
     }
   else
     {
     good:

       if ((ret = xd3_emit_size (stream, & ADDR_TAIL (stream), bestd))) { return ret; }
     }

   xd3_update_cache (acache, addr);

   (*mode) += bestm;

   return 0;
 }
 #endif

 static int
 xd3_decode_address (xd3_stream *stream, usize_t here,
 		    usize_t mode, const uint8_t **inpp,
 		    const uint8_t *max, uint32_t *valp)
 {
   int ret;
   usize_t same_start = 2 + stream->acache.s_near;

   if (mode < same_start)
     {
       if ((ret = xd3_read_size (stream, inpp, max, valp))) { return ret; }

       switch (mode)
 	{
 	case VCD_SELF:
 	  break;
 	case VCD_HERE:
 	  (*valp) = here - (*valp);
 	  break;
 	default:
 	  (*valp) += stream->acache.near_array[mode - 2];
 	  break;
 	}
     }
   else
     {
       if (*inpp == max)
 	{
 	  stream->msg = "address underflow";
 	  return XD3_INVALID_INPUT;
 	}

       mode -= same_start;

       (*valp) = stream->acache.same_array[mode*256 + (**inpp)];

       (*inpp) += 1;
     }

   xd3_update_cache (& stream->acache, *valp);

   return 0;
 }

 /***********************************************************************
  Alloc/free
 ***********************************************************************/

 static void*
 __xd3_alloc_func (void* opaque, usize_t items, usize_t size)
 {
   return malloc (items * size);
 }

 static void
 __xd3_free_func (void* opaque, void* address)
 {
   free (address);
 }

 static void*
 xd3_alloc (xd3_stream *stream,
 	   usize_t      elts,
 	   usize_t      size)
 {
   void *a = stream->alloc (stream->opaque, elts, size);

   if (a != NULL)
     {
       IF_DEBUG (stream->alloc_cnt += 1);
       IF_DEBUG2 (DP(RINT "[stream %p malloc] size %u ptr %p\n",
 		    stream, elts * size, a));
     }
   else
     {
       stream->msg = "out of memory";
     }

   return a;
 }

 static void
 xd3_free (xd3_stream *stream,
 	  void       *ptr)
 {
   if (ptr != NULL)
     {
       IF_DEBUG (stream->free_cnt += 1);
       XD3_ASSERT (stream->free_cnt <= stream->alloc_cnt);
       IF_DEBUG2 (DP(RINT "[stream %p free] %p\n",
 		    stream, ptr));
       stream->free (stream->opaque, ptr);
     }
 }

 #if XD3_ENCODER
 static void*
 xd3_alloc0 (xd3_stream *stream,
 	    usize_t      elts,
 	    usize_t      size)
 {
   void *a = xd3_alloc (stream, elts, size);

   if (a != NULL)
     {
       memset (a, 0, elts * size);
     }

   return a;
 }

 static xd3_output*
 xd3_alloc_output (xd3_stream *stream,
 		  xd3_output *old_output)
 {
   xd3_output *output;
   uint8_t    *base;

   if (stream->enc_free != NULL)
     {
       output = stream->enc_free;
       stream->enc_free = output->next_page;
     }
   else
     {
       if ((output = (xd3_output*) xd3_alloc (stream, 1, sizeof (xd3_output))) == NULL)
 	{
 	  return NULL;
 	}

       if ((base = (uint8_t*) xd3_alloc (stream, XD3_ALLOCSIZE, sizeof (uint8_t))) == NULL)
 	{
 	  xd3_free (stream, output);
 	  return NULL;
 	}

       output->base  = base;
       output->avail = XD3_ALLOCSIZE;
     }

   output->next = 0;

   if (old_output)
     {
       old_output->next_page = output;
     }

   output->next_page = NULL;

   return output;
 }

 static usize_t
 xd3_sizeof_output (xd3_output *output)
 {
   usize_t s = 0;

   for (; output; output = output->next_page)
     {
       s += output->next;
     }

   return s;
 }

 static void
 xd3_freelist_output (xd3_stream *stream,
 		     xd3_output *output)
 {
   xd3_output *tmp;

   while (output)
     {
       tmp    = output;
       output = output->next_page;

       tmp->next = 0;
       tmp->next_page = stream->enc_free;
       stream->enc_free = tmp;
     }
 }

 static void
 xd3_free_output (xd3_stream *stream,
 		 xd3_output *output)
 {
   xd3_output *next;

  again:
   if (output == NULL)
     {
       return;
     }

   next = output->next_page;

   xd3_free (stream, output->base);
   xd3_free (stream, output);

   output = next;
   goto again;
 }
 #endif /* XD3_ENCODER */

 void
 xd3_free_stream (xd3_stream *stream)
 {
   xd3_iopt_buflist *blist = stream->iopt_alloc;

   while (blist != NULL)
     {
       xd3_iopt_buflist *tmp = blist;
       blist = blist->next;
       xd3_free (stream, tmp->buffer);
       xd3_free (stream, tmp);
     }

   xd3_free (stream, stream->large_table);
   xd3_free (stream, stream->small_table);
   xd3_free (stream, stream->small_prev);

 #if XD3_ENCODER
   {
     int i;
     for (i = 0; i < ENC_SECTS; i += 1)
       {
 	xd3_free_output (stream, stream->enc_heads[i]);
       }
     xd3_free_output (stream, stream->enc_free);
   }
 #endif

   xd3_free (stream, stream->acache.near_array);
   xd3_free (stream, stream->acache.same_array);

   xd3_free (stream, stream->inst_sect.copied1);
   xd3_free (stream, stream->addr_sect.copied1);
   xd3_free (stream, stream->data_sect.copied1);

   xd3_free (stream, stream->dec_buffer);
   xd3_free (stream, (uint8_t*) stream->dec_lastwin);

   xd3_free (stream, stream->buf_in);
   xd3_free (stream, stream->dec_appheader);
   xd3_free (stream, stream->dec_codetbl);
   xd3_free (stream, stream->code_table_alloc);

 #if SECONDARY_ANY
   xd3_free (stream, stream->inst_sect.copied2);
   xd3_free (stream, stream->addr_sect.copied2);
   xd3_free (stream, stream->data_sect.copied2);

   if (stream->sec_type != NULL)
     {
       stream->sec_type->destroy (stream, stream->sec_stream_d);
       stream->sec_type->destroy (stream, stream->sec_stream_i);
       stream->sec_type->destroy (stream, stream->sec_stream_a);
     }
 #endif

   xd3_free (stream, stream->whole_target.adds);
   xd3_free (stream, stream->whole_target.inst);
   xd3_free (stream, stream->whole_target.wininfo);

   XD3_ASSERT (stream->alloc_cnt == stream->free_cnt);

   memset (stream, 0, sizeof (xd3_stream));
 }

 #if (XD3_DEBUG > 1 || VCDIFF_TOOLS)
 static const char*
 xd3_rtype_to_string (xd3_rtype type, int print_mode)
 {
   switch (type)
     {
     case XD3_NOOP:
       return "NOOP ";
     case XD3_RUN:
       return "RUN  ";
     case XD3_ADD:
       return "ADD  ";
     default: break;
     }
   if (! print_mode)
     {
       return "CPY  ";
     }
   switch (type)
     {
     case XD3_CPY + 0: return "CPY_0";
     case XD3_CPY + 1: return "CPY_1";
     case XD3_CPY + 2: return "CPY_2";
     case XD3_CPY + 3: return "CPY_3";
     case XD3_CPY + 4: return "CPY_4";
     case XD3_CPY + 5: return "CPY_5";
     case XD3_CPY + 6: return "CPY_6";
     case XD3_CPY + 7: return "CPY_7";
     case XD3_CPY + 8: return "CPY_8";
     case XD3_CPY + 9: return "CPY_9";
     default:          return "CPY>9";
     }
 }
 #endif

 /****************************************************************
  Stream configuration
  ******************************************************************/

 int
 xd3_config_stream(xd3_stream *stream,
 		  xd3_config *config)
 {
   int ret;
   xd3_config defcfg;
   xd3_smatcher *smatcher = &stream->smatcher;

   if (config == NULL)
     {
       config = & defcfg;
       memset (config, 0, sizeof (*config));
     }

   /* Initial setup: no error checks yet */
   memset (stream, 0, sizeof (*stream));

   stream->winsize = config->winsize ? config->winsize : XD3_DEFAULT_WINSIZE;
   stream->sprevsz = config->sprevsz ? config->sprevsz : XD3_DEFAULT_SPREVSZ;
   stream->srcwin_maxsz = config->srcwin_maxsz ?
     config->srcwin_maxsz : XD3_DEFAULT_SRCWINSZ;

   if (config->iopt_size == 0)
     {
       stream->iopt_size = XD3_ALLOCSIZE / sizeof(xd3_rinst);
       stream->iopt_unlimited = 1;
     }
   else
     {
       stream->iopt_size = config->iopt_size;
     }

   stream->getblk    = config->getblk;
   stream->alloc     = config->alloc ? config->alloc : __xd3_alloc_func;
   stream->free      = config->freef ? config->freef : __xd3_free_func;
   stream->opaque    = config->opaque;
   stream->flags     = config->flags;

   /* Secondary setup. */
   stream->sec_data  = config->sec_data;
   stream->sec_inst  = config->sec_inst;
   stream->sec_addr  = config->sec_addr;

   stream->sec_data.data_type = DATA_SECTION;
   stream->sec_inst.data_type = INST_SECTION;
   stream->sec_addr.data_type = ADDR_SECTION;

   /* Check static sizes. */
   if (sizeof (usize_t) != SIZEOF_USIZE_T ||
       sizeof (xoff_t) != SIZEOF_XOFF_T ||
       (ret = xd3_check_pow2(XD3_ALLOCSIZE, NULL)))
     {
       stream->msg = "incorrect compilation: wrong integer sizes";
       return XD3_INTERNAL;
     }

   /* Check/set secondary compressor. */
   switch (stream->flags & XD3_SEC_TYPE)
     {
     case 0:
       if (stream->flags & XD3_SEC_NOALL)
 	{
 	  stream->msg = "XD3_SEC flags require a secondary compressor type";
 	  return XD3_INTERNAL;
 	}
       break;
     case XD3_SEC_FGK:
       FGK_CASE (stream);
     case XD3_SEC_DJW:
       DJW_CASE (stream);
     default:
       stream->msg = "too many secondary compressor types set";
       return XD3_INTERNAL;
     }

   /* Check/set encoder code table. */
   switch (stream->flags & XD3_ALT_CODE_TABLE) {
   case 0:
     stream->code_table_desc = & __rfc3284_code_table_desc;
     stream->code_table_func = xd3_rfc3284_code_table;
     break;
 #if GENERIC_ENCODE_TABLES
   case XD3_ALT_CODE_TABLE:
     stream->code_table_desc = & __alternate_code_table_desc;
     stream->code_table_func = xd3_alternate_code_table;
     stream->comp_table_func = xd3_compute_alternate_table_encoding;
     break;
 #endif
   default:
     stream->msg = "alternate code table support was not compiled";
     return XD3_INTERNAL;
   }

   /* Check sprevsz */
   if (smatcher->small_chain == 1 &&
       smatcher->small_lchain == 1)
     {
       stream->sprevsz = 0;
     }
   else
     {
       if ((ret = xd3_check_pow2 (stream->sprevsz, NULL)))
 	{
 	  stream->msg = "sprevsz is required to be a power of two";
 	  return XD3_INTERNAL;
 	}

       stream->sprevmask = stream->sprevsz - 1;
     }

   /* Default scanner settings. */
 #if XD3_ENCODER
   switch (config->smatch_cfg)
     {
       IF_BUILD_SOFT(case XD3_SMATCH_SOFT:
       {
 	*smatcher = config->smatcher_soft;
 	smatcher->string_match = __smatcher_soft.string_match;
 	smatcher->name = __smatcher_soft.name;
 	if (smatcher->large_look  < MIN_MATCH ||
 	    smatcher->large_step  < 1         ||
 	    smatcher->small_look  < MIN_MATCH)
 	  {
 	    stream->msg = "invalid soft string-match config";
 	    return XD3_INVALID;
 	  }
 	break;
       })

       IF_BUILD_DEFAULT(case XD3_SMATCH_DEFAULT:
 		    *smatcher = __smatcher_default;
 		    break;)
       IF_BUILD_SLOW(case XD3_SMATCH_SLOW:
 		    *smatcher = __smatcher_slow;
 		    break;)
       IF_BUILD_FASTEST(case XD3_SMATCH_FASTEST:
 		    *smatcher = __smatcher_fastest;
 		    break;)
       IF_BUILD_FASTER(case XD3_SMATCH_FASTER:
 		    *smatcher = __smatcher_faster;
 		    break;)
       IF_BUILD_FAST(case XD3_SMATCH_FAST:
 		    *smatcher = __smatcher_fast;
 		    break;)
     default:
       stream->msg = "invalid string match config type";
       return XD3_INTERNAL;
     }

   if (config->smatch_cfg == XD3_SMATCH_DEFAULT &&
       (stream->flags & XD3_COMPLEVEL_MASK) != 0)
     {
       int level = (stream->flags & XD3_COMPLEVEL_MASK) >> XD3_COMPLEVEL_SHIFT;

       switch (level)
 	{
 	case 1:
 	  IF_BUILD_FASTEST(*smatcher = __smatcher_fastest;
 			   break;)
 	case 2:
 	  IF_BUILD_FASTER(*smatcher = __smatcher_faster;
 			   break;)
 	case 3: case 4: case 5:
 	  IF_BUILD_FAST(*smatcher = __smatcher_fast;
 			break;)
 	case 6:
 	  IF_BUILD_DEFAULT(*smatcher = __smatcher_default;
 			   break;)
 	default:
 	  IF_BUILD_SLOW(*smatcher = __smatcher_slow;
 			break;)
 	  IF_BUILD_DEFAULT(*smatcher = __smatcher_default;
 			   break;)
 	  IF_BUILD_FAST(*smatcher = __smatcher_fast;
 			break;)
 	  IF_BUILD_FASTER(*smatcher = __smatcher_faster;
 			break;)
 	  IF_BUILD_FASTEST(*smatcher = __smatcher_fastest;
 			   break;)
 	}
     }
 #endif

   return 0;
 }

 /*****************************************************************
  Getblk interface
  ***********************************************************/

 /* This function interfaces with the client getblk function, checks
  * its results, etc. */
 static int
 xd3_getblk (xd3_stream *stream, xoff_t blkno)
 {
   int ret;
   xd3_source *source = stream->src;

   if (source->curblk == NULL ||
       blkno != source->curblkno)
     {
       if (blkno >= source->blocks)
 	{
 	  stream->msg = "source file too short";
 	  return XD3_INTERNAL;
 	}

       XD3_ASSERT (source->curblk != NULL || blkno != source->curblkno);

       source->getblkno = blkno;

       if (stream->getblk == NULL)
 	{
 	  stream->msg = "getblk source input";
 	  return XD3_GETSRCBLK;
 	}
       else if ((ret = stream->getblk (stream, source, blkno)) != 0)
 	{
 	  stream->msg = "getblk failed";
 	  return ret;
 	}

       XD3_ASSERT (source->curblk != NULL);
     }

   if (source->onblk != (blkno == source->blocks - 1 ?
 			source->onlastblk : source->blksize))
     {
       stream->msg = "getblk returned short block";
       return XD3_INTERNAL;
     }

   return 0;
 }

 /***********************************************************
  Stream open/close
  ***************************************************************/

 int
 xd3_set_source (xd3_stream *stream,
 		xd3_source *src)
 {
   xoff_t blk_num;
   usize_t tail_size, shiftby;

   IF_DEBUG1 (DP(RINT "[set source] size %"Q"u\n", src->size));

   if (src == NULL || src->size < stream->smatcher.large_look) { return 0; }

   stream->src  = src;

   // If src->blksize is a power-of-two, xd3_blksize_div() will use
   // shift and mask rather than divide.  Check that here.
   if (xd3_check_pow2 (src->blksize, &shiftby) == 0)
     {
       src->shiftby = shiftby;
       src->maskby = (1 << shiftby) - 1;
     }
   else if (src->size <= src->blksize)
     {
       int x = xd3_pow2_roundup (src->blksize);
       xd3_check_pow2 (x, &shiftby);
       src->shiftby = shiftby;
       src->maskby = (1 << shiftby) - 1;
     }
   else
     {
       src->shiftby = 0;
       src->maskby = 0;
     }

   xd3_blksize_div (src->size, src, &blk_num, &tail_size);
   src->blocks  = blk_num + (tail_size > 0);
   src->onlastblk = xd3_bytes_on_srcblk (src, src->blocks - 1);
   src->srclen  = 0;
   src->srcbase = 0;

   return 0;
 }

 void
 xd3_abort_stream (xd3_stream *stream)
 {
   stream->dec_state = DEC_ABORTED;
   stream->enc_state = ENC_ABORTED;
 }

 int
 xd3_close_stream (xd3_stream *stream)
 {
   if (stream->enc_state != 0 && stream->enc_state != ENC_ABORTED)
     {
       if (stream->buf_leftover != NULL)
 	{
 	  stream->msg = "encoding is incomplete";
 	  return XD3_INTERNAL;
 	}

       if (stream->enc_state == ENC_POSTWIN)
 	{
 #if XD3_ENCODER
 	  xd3_encode_reset (stream);
 #endif
 	  stream->current_window += 1;
 	  stream->enc_state = ENC_INPUT;
 	}

       /* If encoding, should be ready for more input but not actually
 	 have any. */
       if (stream->enc_state != ENC_INPUT || stream->avail_in != 0)
 	{
 	  stream->msg = "encoding is incomplete";
 	  return XD3_INTERNAL;
 	}
     }
   else
     {
       switch (stream->dec_state)
 	{
 	case DEC_VCHEAD:
 	case DEC_WININD:
 	  /* TODO: Address the zero-byte ambiguity.  Does the encoder
 	   * emit a window or not?  If so, then catch an error here.
 	   * If not, need another routine to say
 	   * decode_at_least_one_if_empty. */
 	case DEC_ABORTED:
 	  break;
 	default:
 	  /* If decoding, should be ready for the next window. */
 	  stream->msg = "EOF in decode";
 	  return XD3_INTERNAL;
 	}
     }

   return 0;
 }

 /**************************************************************
  Application header
  ****************************************************************/

 int
 xd3_get_appheader (xd3_stream  *stream,
 		   uint8_t    **data,
 		   usize_t      *size)
 {
   if (stream->dec_state < DEC_WININD)
     {
       stream->msg = "application header not available";
       return XD3_INTERNAL;
     }

   (*data) = stream->dec_appheader;
   (*size) = stream->dec_appheadsz;
   return 0;
 }

 /**********************************************************
  Decoder stuff
  *************************************************/

 #include "xdelta3-decode.h"

 /****************************************************************
  Encoder stuff
  *****************************************************************/

 #if XD3_ENCODER
 void
 xd3_set_appheader (xd3_stream    *stream,
 		   const uint8_t *data,
 		   usize_t         size)
 {
   stream->enc_appheader = data;
   stream->enc_appheadsz = size;
 }

 #if XD3_DEBUG
 static int
 xd3_iopt_check (xd3_stream *stream)
 {
   usize_t ul = xd3_rlist_length (& stream->iopt_used);
   usize_t fl = xd3_rlist_length (& stream->iopt_free);

   return (ul + fl + (stream->iout ? 1 : 0)) == stream->iopt_size;
 }
 #endif

 static xd3_rinst*
 xd3_iopt_free (xd3_stream *stream, xd3_rinst *i)
 {
   xd3_rinst *n = xd3_rlist_remove (i);
   xd3_rlist_push_back (& stream->iopt_free, i);
   return n;
 }

 static void
 xd3_iopt_free_nonadd (xd3_stream *stream, xd3_rinst *i)
 {
   if (i->type != XD3_ADD)
     {
       xd3_rlist_push_back (& stream->iopt_free, i);
     }
 }

 /* When an instruction is ready to flush from the iopt buffer, this
  * function is called to produce an encoding.  It writes the
  * instruction plus size, address, and data to the various encoding
  * sections. */
 static int
 xd3_iopt_finish_encoding (xd3_stream *stream, xd3_rinst *inst)
 {
   int ret;

   /* Check for input overflow. */
   XD3_ASSERT (inst->pos + inst->size <= stream->avail_in);

   switch (inst->type)
     {
     case XD3_CPY:
       {
 	/* the address may have an offset if there is a source window. */
 	usize_t addr;
 	xd3_source *src = stream->src;

 	if (src != NULL)
 	  {
 	    /* If there is a source copy, the source must have its
 	     * source window decided before we can encode.  This can
 	     * be bad -- we have to make this decision even if no
 	     * source matches have been found. */
 	    if (stream->srcwin_decided == 0)
 	      {
 		if ((ret = xd3_srcwin_setup (stream))) { return ret; }
 	      }

 	    /* xtra field indicates the copy is from the source */
 	    if (inst->xtra)
 	      {
 		XD3_ASSERT (inst->addr >= src->srcbase);
 		XD3_ASSERT (inst->addr + inst->size <= src->srcbase + src->srclen);
 		addr = (inst->addr - src->srcbase);
 		stream->n_scpy += 1;
 		stream->l_scpy += inst->size;
 	      }
 	    else
 	      {
 		/* with source window: target copy address is offset by taroff. */
 		addr = stream->taroff + (usize_t) inst->addr;
 		stream->n_tcpy += 1;
 		stream->l_tcpy += inst->size;
 	      }
 	  }
 	else
 	  {
 	    addr = (usize_t) inst->addr;
 	    stream->n_tcpy += 1;
 	    stream->l_tcpy += inst->size;
 	  }

 	/* Note: used to assert inst->size >= MIN_MATCH, but not true
 	 * for merge operations & identical match heuristics. */
 	/* the "here" position is always offset by taroff */
 	if ((ret = xd3_encode_address (stream, addr, inst->pos + stream->taroff,
 				       & inst->type)))
 	  {
 	    return ret;
 	  }

 	IF_DEBUG1 ({
 	  static int cnt;
 	  DP(RINT "[iopt copy:%d] pos %"Q"u-%"Q"u addr %"Q"u-%"Q"u size %u\n",
 		   cnt++,
 		   stream->total_in + inst->pos,
 		   stream->total_in + inst->pos + inst->size,
 		   inst->addr, inst->addr + inst->size, inst->size);
 	});
 	break;
       }
     case XD3_RUN:
       {
 	XD3_ASSERT (inst->size >= MIN_MATCH);

 	if ((ret = xd3_emit_byte (stream, & DATA_TAIL (stream), inst->xtra))) { return ret; }

 	stream->n_run += 1;
 	stream->l_run += inst->size;

 	IF_DEBUG1 ({
 	  static int cnt;
 	  DP(RINT "[iopt run:%d] pos %"Q"u size %u\n", cnt++, stream->total_in + inst->pos, inst->size);
 	});
 	break;
       }
     case XD3_ADD:
       {
 	if ((ret = xd3_emit_bytes (stream, & DATA_TAIL (stream),
 				   stream->next_in + inst->pos, inst->size))) { return ret; }

 	stream->n_add += 1;
 	stream->l_add += inst->size;

 	IF_DEBUG1 ({
 	  static int cnt;
 	  DP(RINT "[iopt add:%d] pos %"Q"u size %u\n", cnt++, stream->total_in + inst->pos, inst->size);
 	});

 	break;
       }
     }

   /* This is the only place stream->unencoded_offset is incremented. */
   XD3_ASSERT (stream->unencoded_offset == inst->pos);
   stream->unencoded_offset += inst->size;

   inst->code2 = 0;

   XD3_CHOOSE_INSTRUCTION (stream, stream->iout, inst);

   if (stream->iout != NULL)
     {
       if (stream->iout->code2 != 0)
 	{
 	  if ((ret = xd3_emit_double (stream, stream->iout, inst, stream->iout->code2))) { return ret; }

 	  xd3_iopt_free_nonadd (stream, stream->iout);
 	  xd3_iopt_free_nonadd (stream, inst);
 	  stream->iout = NULL;
 	  return 0;
 	}
       else
 	{
 	  if ((ret = xd3_emit_single (stream, stream->iout, stream->iout->code1))) { return ret; }

 	  xd3_iopt_free_nonadd (stream, stream->iout);
 	}
     }

   stream->iout = inst;

   return 0;
 }

 /* This possibly encodes an add instruction, iadd, which must remain
  * on the stack until the following call to
  * xd3_iopt_finish_encoding. */
 static int
 xd3_iopt_add (xd3_stream *stream, usize_t pos, xd3_rinst *iadd)
 {
   int ret;
   usize_t off = stream->unencoded_offset;

   if (pos > off)
     {
       iadd->type = XD3_ADD;
       iadd->pos  = off;
       iadd->size = pos - off;

       if ((ret = xd3_iopt_finish_encoding (stream, iadd))) { return ret; }
     }

   return 0;
 }

 /* This function calls xd3_iopt_finish_encoding to finish encoding an
  * instruction, and it may also produce an add instruction for an
  * unmatched region. */
 static int
 xd3_iopt_add_encoding (xd3_stream *stream, xd3_rinst *inst)
 {
   int ret;
   xd3_rinst iadd;

   if ((ret = xd3_iopt_add (stream, inst->pos, & iadd))) { return ret; }

   if ((ret = xd3_iopt_finish_encoding (stream, inst))) { return ret; }

   return 0;
 }

 /* Generates a final add instruction to encode the remaining input. */
 static int
 xd3_iopt_add_finalize (xd3_stream *stream)
 {
   int ret;
   xd3_rinst iadd;

   if ((ret = xd3_iopt_add (stream, stream->avail_in, & iadd))) { return ret; }

   if (stream->iout)
     {
       if ((ret = xd3_emit_single (stream, stream->iout, stream->iout->code1))) { return ret; }

       xd3_iopt_free_nonadd (stream, stream->iout);
       stream->iout = NULL;
     }

   return 0;
 }

 /* Compact the instruction buffer by choosing the best non-overlapping
  * instructions when lazy string-matching.  There are no ADDs in the
  * iopt buffer because those are synthesized in xd3_iopt_add_encoding
  * and during xd3_iopt_add_finalize. */
 static int
 xd3_iopt_flush_instructions (xd3_stream *stream, int force)
 {
   xd3_rinst *r1 = xd3_rlist_front (& stream->iopt_used);
   xd3_rinst *r2;
   xd3_rinst *r3;
   usize_t r1end;
   usize_t r2end;
   usize_t r2off;
   usize_t r2moff;
   usize_t gap;
   usize_t flushed;
   int ret;

   XD3_ASSERT (xd3_iopt_check (stream));

   /* Note: once tried to skip this step if it's possible to assert
    * there are no overlapping instructions.  Doesn't work because
    * xd3_opt_erase leaves overlapping instructions. */
   while (! xd3_rlist_end (& stream->iopt_used, r1) &&
 	 ! xd3_rlist_end (& stream->iopt_used, r2 = xd3_rlist_next (r1)))
     {
       r1end = r1->pos + r1->size;

       /* If the instructions do not overlap, continue. */
       if (r1end <= r2->pos)
 	{
 	  r1 = r2;
 	  continue;
 	}

       r2end = r2->pos + r2->size;

       /* The min_match adjustments prevent this. */
       XD3_ASSERT (r2end > (r1end + LEAST_MATCH_INCR));

       /* If r3 is available... */
       if (! xd3_rlist_end (& stream->iopt_used, r3 = xd3_rlist_next (r2)))
 	{
 	  /* If r3 starts before r1 finishes or just about, r2 is irrelevant */
 	  if (r3->pos <= r1end + 1)
 	    {
 	      xd3_iopt_free (stream, r2);
 	      continue;
 	    }
 	}
       else if (! force)
 	{
 	  /* Unless force, end the loop when r3 is not available. */
 	  break;
 	}

       r2off  = r2->pos - r1->pos;
       r2moff = r2end - r1end;
       gap    = r2end - r1->pos;

       /* If the two matches overlap almost entirely, choose the better match
        * and discard the other.  The else branch can still create inefficient
        * copies, e.g., a 4-byte copy that takes 4 bytes to encode, which
        * xd3_smatch() wouldn't allow by its crude efficiency check.  However,
        * in this case there are adjacent copies which mean the add would cost
        * one extra byte.  Allow the inefficiency here. */
       if (gap < 2*MIN_MATCH || r2moff <= 2 || r2off <= 2)
 	{
 	  /* Only one match should be used, choose the longer one. */
 	  if (r1->size < r2->size)
 	    {
 	      xd3_iopt_free (stream, r1);
 	      r1 = r2;
 	    }
 	  else
 	    {
 	      /* We are guaranteed that r1 does not overlap now, so advance past r2 */
 	      r1 = xd3_iopt_free (stream, r2);
 	    }
 	  continue;
 	}
       else
 	{
 	  /* Shorten one of the instructions -- could be optimized
 	   * based on the address cache. */
 	  usize_t average;
 	  usize_t newsize;
 	  usize_t adjust1;

 	  XD3_ASSERT (r1end > r2->pos && r2end > r1->pos);

 	  /* Try to balance the length of both instructions, but avoid
 	   * making both longer than MAX_MATCH_SPLIT . */
 	  average = gap / 2;
 	  newsize = min (MAX_MATCH_SPLIT, gap - average);

 	  /* Should be possible to simplify this code. */
 	  if (newsize > r1->size)
 	    {
 	      /* shorten r2 */
 	      adjust1 = r1end - r2->pos;
 	    }
 	  else if (newsize > r2->size)
 	    {
 	      /* shorten r1 */
 	      adjust1 = r1end - r2->pos;

 	      XD3_ASSERT (r1->size > adjust1);

 	      r1->size -= adjust1;

 	      /* don't shorten r2 */
 	      adjust1 = 0;
 	    }
 	  else
 	    {
 	      /* shorten r1 */
 	      adjust1 = r1->size - newsize;

 	      if (r2->pos > r1end - adjust1)
 		{
 		  adjust1 -= r2->pos - (r1end - adjust1);
 		}

 	      XD3_ASSERT (r1->size > adjust1);

 	      r1->size -= adjust1;

 	      /* shorten r2 */
 	      XD3_ASSERT (r1->pos + r1->size >= r2->pos);

 	      adjust1 = r1->pos + r1->size - r2->pos;
 	    }

 	  /* Fallthrough above if-else, shorten r2 */
 	  XD3_ASSERT (r2->size > adjust1);

 	  r2->size -= adjust1;
 	  r2->pos  += adjust1;
 	  r2->addr += adjust1;

 	  XD3_ASSERT (r1->size >= MIN_MATCH);
 	  XD3_ASSERT (r2->size >= MIN_MATCH);

 	  r1 = r2;
 	}
     }

   XD3_ASSERT (xd3_iopt_check (stream));

   /* If forcing, pick instructions until the list is empty, otherwise
    * this empties 50% of the queue. */
   for (flushed = 0; ! xd3_rlist_empty (& stream->iopt_used); )
     {
       xd3_rinst *renc = xd3_rlist_pop_front (& stream->iopt_used);
       if ((ret = xd3_iopt_add_encoding (stream, renc)))
 	{
 	  return ret;
 	}

       if (! force)
 	{
 	  if (++flushed > stream->iopt_size / 2)
 	    {
 	      break;
 	    }

 	  /* If there are only two instructions remaining, break,
 	   * because they were not optimized.  This means there were
 	   * more than 50% eliminated by the loop above. */
  	  r1 = xd3_rlist_front (& stream->iopt_used);
  	  if (xd3_rlist_end(& stream->iopt_used, r1) ||
  	      xd3_rlist_end(& stream->iopt_used, r2 = xd3_rlist_next (r1)) ||
  	      xd3_rlist_end(& stream->iopt_used, r3 = xd3_rlist_next (r2)))
  	    {
  	      break;
  	    }
 	}
     }

   XD3_ASSERT (xd3_iopt_check (stream));

   XD3_ASSERT (!force || xd3_rlist_length (& stream->iopt_used) == 0);

   return 0;
 }

 static int
 xd3_iopt_get_slot (xd3_stream *stream, xd3_rinst** iptr)
 {
   xd3_rinst *i;
   int ret;

   if (xd3_rlist_empty (& stream->iopt_free))
     {
       if (stream->iopt_unlimited)
 	{
 	  int elts = XD3_ALLOCSIZE / sizeof(xd3_rinst);

 	  if ((ret = xd3_alloc_iopt (stream, elts)))
 	    {
 	      return ret;
 	    }

 	  stream->iopt_size += elts;
 	}
       else
 	{
 	  if ((ret = xd3_iopt_flush_instructions (stream, 0))) { return ret; }

 	  XD3_ASSERT (! xd3_rlist_empty (& stream->iopt_free));
 	}
     }

   i = xd3_rlist_pop_back (& stream->iopt_free);

   xd3_rlist_push_back (& stream->iopt_used, i);

   (*iptr) = i;

   ++stream->i_slots_used;

   return 0;
 }

 /* A copy is about to be emitted that extends backwards to POS,
  * therefore it may completely cover some existing instructions in the
  * buffer.  If an instruction is completely covered by this new match,
  * erase it.  If the new instruction is covered by the previous one,
  * return 1 to skip it. */
 static void
 xd3_iopt_erase (xd3_stream *stream, usize_t pos, usize_t size)
 {
   while (! xd3_rlist_empty (& stream->iopt_used))
     {
       xd3_rinst *r = xd3_rlist_back (& stream->iopt_used);

       /* Verify that greedy is working.  The previous instruction
        * should end before the new one begins. */
       XD3_ASSERT ((stream->flags & XD3_BEGREEDY) == 0 || (r->pos + r->size <= pos));
       /* Verify that min_match is working.  The previous instruction
        * should end before the new one ends. */
       XD3_ASSERT ((stream->flags & XD3_BEGREEDY) != 0 || (r->pos + r->size < pos + size));

       /* See if the last instruction starts before the new
        * instruction.  If so, there is nothing to erase. */
       if (r->pos < pos)
 	{
 	  return;
 	}

       /* Otherwise, the new instruction covers the old one, delete it
 	 and repeat. */
       xd3_rlist_remove (r);
       xd3_rlist_push_back (& stream->iopt_free, r);
       --stream->i_slots_used;
     }
 }

 /* This function tells the last matched input position. */
 static usize_t
 xd3_iopt_last_matched (xd3_stream *stream)
 {
   xd3_rinst *r;

   if (xd3_rlist_empty (& stream->iopt_used))
     {
       return 0;
     }

   r = xd3_rlist_back (& stream->iopt_used);

   return r->pos + r->size;
 }

 /*********************************************************
  Emit routines
  ***********************************************************/

 static int
 xd3_emit_single (xd3_stream *stream, xd3_rinst *single, usize_t code)
 {
   int has_size = stream->code_table[code].size1 == 0;
   int ret;

   IF_DEBUG1 (DP(RINT "[emit1] %u %s (%u) code %u\n",
 	       single->pos,
 		xd3_rtype_to_string ((xd3_rtype) single->type, 0),
 	       single->size,
 	       code));

   if ((ret = xd3_emit_byte (stream, & INST_TAIL (stream), code)))
     {
       return ret;
     }

   if (has_size)
     {
       if ((ret = xd3_emit_size (stream, & INST_TAIL (stream), single->size)))
         {
           return ret;
         }
     }

   return 0;
 }

 static int
 xd3_emit_double (xd3_stream *stream, xd3_rinst *first,
                  xd3_rinst *second, usize_t code)
 {
   int ret;

   /* All double instructions use fixed sizes, so all we need to do is
    * output the instruction code, no sizes. */
   XD3_ASSERT (stream->code_table[code].size1 != 0 &&
 	      stream->code_table[code].size2 != 0);

   if ((ret = xd3_emit_byte (stream, & INST_TAIL (stream), code)))
     {
       return ret;
     }

   IF_DEBUG1 (DP(RINT "[emit2]: %u %s (%u) %s (%u) code %u\n",
 	       first->pos,
 		xd3_rtype_to_string ((xd3_rtype) first->type, 0),
 	       first->size,
 		xd3_rtype_to_string ((xd3_rtype) second->type, 0),
 	       second->size,
 	       code));

   return 0;
 }

 /* This enters a potential run instruction into the iopt buffer.  The
  * position argument is relative to the target window. */
 static int
 xd3_emit_run (xd3_stream *stream, usize_t pos, usize_t size, uint8_t run_c)
 {
   xd3_rinst* ri;
   int ret;

   if ((ret = xd3_iopt_get_slot (stream, & ri))) { return ret; }

   ri->type = XD3_RUN;
   ri->xtra = run_c;
   ri->pos  = pos;
   ri->size = size;

   return 0;
 }

 /* This enters a potential copy instruction into the iopt buffer.  The
  * position argument is relative to the target window.. */
 int
 xd3_found_match (xd3_stream *stream, usize_t pos,
 		 usize_t size, xoff_t addr, int is_source)
 {
   xd3_rinst* ri;
   int ret;

   if ((ret = xd3_iopt_get_slot (stream, & ri))) { return ret; }

   ri->type = XD3_CPY;
   ri->xtra = is_source;
   ri->pos  = pos;
   ri->size = size;
   ri->addr = addr;

   return 0;
 }

 static int
 xd3_emit_hdr (xd3_stream *stream)
 {
   int  ret;
   int  use_secondary = stream->sec_type != NULL;
   int  use_adler32   = stream->flags & (XD3_ADLER32 | XD3_ADLER32_RECODE);
   int  vcd_source    = xd3_encoder_used_source (stream);
   usize_t win_ind = 0;
   usize_t del_ind = 0;
   usize_t enc_len;
   usize_t tgt_len;
   usize_t data_len;
   usize_t inst_len;
   usize_t addr_len;

   if (stream->current_window == 0)
     {
       usize_t hdr_ind = 0;
       int use_appheader  = stream->enc_appheader != NULL;
       int use_gencodetbl = GENERIC_ENCODE_TABLES &&
 	(stream->code_table_desc != & __rfc3284_code_table_desc);

       if (use_secondary)  { hdr_ind |= VCD_SECONDARY; }
       if (use_gencodetbl) { hdr_ind |= VCD_CODETABLE; }
       if (use_appheader)  { hdr_ind |= VCD_APPHEADER; }

       if ((ret = xd3_emit_byte (stream, & HDR_TAIL (stream),
 				VCDIFF_MAGIC1)) != 0 ||
 	  (ret = xd3_emit_byte (stream, & HDR_TAIL (stream),
 				VCDIFF_MAGIC2)) != 0 ||
 	  (ret = xd3_emit_byte (stream, & HDR_TAIL (stream),
 				VCDIFF_MAGIC3)) != 0 ||
 	  (ret = xd3_emit_byte (stream, & HDR_TAIL (stream),
 				VCDIFF_VERSION)) != 0 ||
 	  (ret = xd3_emit_byte (stream, & HDR_TAIL (stream), hdr_ind)) != 0)
 	{
 	  return ret;
 	}

       /* Secondary compressor ID */
 #if SECONDARY_ANY
       if (use_secondary &&
 	  (ret = xd3_emit_byte (stream, & HDR_TAIL (stream),
 				stream->sec_type->id)))
 	{
 	  return ret;
 	}
 #endif

       /* Compressed code table */
       if (use_gencodetbl)
 	{
 	  usize_t code_table_size;
 	  const uint8_t *code_table_data;

 	  if ((ret = stream->comp_table_func (stream, & code_table_data,
 					      & code_table_size)))
 	    {
 	      return ret;
 	    }

 	  if ((ret = xd3_emit_size (stream, & HDR_TAIL (stream),
 				    code_table_size + 2)) ||
 	      (ret = xd3_emit_byte (stream, & HDR_TAIL (stream),
 				    stream->code_table_desc->near_modes)) ||
 	      (ret = xd3_emit_byte (stream, & HDR_TAIL (stream),
 				    stream->code_table_desc->same_modes)) ||
 	      (ret = xd3_emit_bytes (stream, & HDR_TAIL (stream),
 				     code_table_data, code_table_size)))
 	    {
 	      return ret;
 	    }
 	}

       /* Application header */
       if (use_appheader)
 	{
 	  if ((ret = xd3_emit_size (stream, & HDR_TAIL (stream),
 				    stream->enc_appheadsz)) ||
 	      (ret = xd3_emit_bytes (stream, & HDR_TAIL (stream),
 				     stream->enc_appheader,
 				     stream->enc_appheadsz)))
 	    {
 	      return ret;
 	    }
 	}
     }

   /* try to compress this window */
 #if SECONDARY_ANY
   if (use_secondary)
     {
       int data_sec = 0;
       int inst_sec = 0;
       int addr_sec = 0;

 #     define ENCODE_SECONDARY_SECTION(UPPER,LOWER) \
              ((stream->flags & XD3_SEC_NO ## UPPER) == 0 && \
               (ret = xd3_encode_secondary (stream, \
 					   & UPPER ## _HEAD (stream), \
 					   & UPPER ## _TAIL (stream), \
 					& xd3_sec_ ## LOWER (stream), \
 				        & stream->sec_ ## LOWER, \
 					   & LOWER ## _sec)))

       if (ENCODE_SECONDARY_SECTION (DATA, data) ||
 	  ENCODE_SECONDARY_SECTION (INST, inst) ||
 	  ENCODE_SECONDARY_SECTION (ADDR, addr))
 	{
 	  return ret;
 	}

       del_ind |= (data_sec ? VCD_DATACOMP : 0);
       del_ind |= (inst_sec ? VCD_INSTCOMP : 0);
       del_ind |= (addr_sec ? VCD_ADDRCOMP : 0);
     }
 #endif

   /* if (vcd_target) { win_ind |= VCD_TARGET; } */
   if (vcd_source)  { win_ind |= VCD_SOURCE; }
   if (use_adler32) { win_ind |= VCD_ADLER32; }

   /* window indicator */
   if ((ret = xd3_emit_byte (stream, & HDR_TAIL (stream), win_ind)))
     {
       return ret;
     }

   /* source window */
   if (vcd_source)
     {
       /* or (vcd_target) { ... } */
       if ((ret = xd3_emit_size (stream, & HDR_TAIL (stream),
 				stream->src->srclen)) ||
 	  (ret = xd3_emit_offset (stream, & HDR_TAIL (stream),
 				  stream->src->srcbase))) { return ret; }
     }

   tgt_len  = stream->avail_in;
   data_len = xd3_sizeof_output (DATA_HEAD (stream));
   inst_len = xd3_sizeof_output (INST_HEAD (stream));
   addr_len = xd3_sizeof_output (ADDR_HEAD (stream));

   /* The enc_len field is a redundency for future extensions.*/
   enc_len = (1 + (xd3_sizeof_size (tgt_len) +
 		  xd3_sizeof_size (data_len) +
 		  xd3_sizeof_size (inst_len) +
 		  xd3_sizeof_size (addr_len)) +
 	     data_len +
 	     inst_len +
 	     addr_len +
 	     (use_adler32 ? 4 : 0));

   if ((ret = xd3_emit_size (stream, & HDR_TAIL (stream), enc_len)) ||
       (ret = xd3_emit_size (stream, & HDR_TAIL (stream), tgt_len)) ||
       (ret = xd3_emit_byte (stream, & HDR_TAIL (stream), del_ind)) ||
       (ret = xd3_emit_size (stream, & HDR_TAIL (stream), data_len)) ||
       (ret = xd3_emit_size (stream, & HDR_TAIL (stream), inst_len)) ||
       (ret = xd3_emit_size (stream, & HDR_TAIL (stream), addr_len)))
     {
       return ret;
     }

   if (use_adler32)
     {
       uint8_t  send[4];
       uint32_t a32;

       if (stream->flags & XD3_ADLER32)
 	{
 	  a32 = adler32 (1L, stream->next_in, stream->avail_in);
 	}
       else
 	{
 	  a32 = stream->recode_adler32;
 	}

       send[0] = (a32 >> 24);
       send[1] = (a32 >> 16);
       send[2] = (a32 >> 8);
       send[3] = (a32 & 0xff);

       if ((ret = xd3_emit_bytes (stream, & HDR_TAIL (stream), send, 4)))
 	{
 	  return ret;
 	}
     }

   return 0;
 }

 /****************************************************************
  Encode routines
  ****************************************************************/

 static int
 xd3_encode_buffer_leftover (xd3_stream *stream)
 {
   usize_t take;
   usize_t room;

   /* Allocate the buffer. */
   if (stream->buf_in == NULL &&
       (stream->buf_in = (uint8_t*) xd3_alloc (stream, stream->winsize, 1)) == NULL)
     {
       return ENOMEM;
     }

   /* Take leftover input first. */
   if (stream->buf_leftover != NULL)
     {
       XD3_ASSERT (stream->buf_avail == 0);
       XD3_ASSERT (stream->buf_leftavail < stream->winsize);

       IF_DEBUG1 (DP(RINT "[leftover] previous %u avail %u\n", stream->buf_leftavail, stream->avail_in));

       memcpy (stream->buf_in, stream->buf_leftover, stream->buf_leftavail);

       stream->buf_leftover = NULL;
       stream->buf_avail    = stream->buf_leftavail;
     }

   /* Copy into the buffer. */
   room = stream->winsize - stream->buf_avail;
   take = min (room, stream->avail_in);

   memcpy (stream->buf_in + stream->buf_avail, stream->next_in, take);

   stream->buf_avail += take;

   if (take < stream->avail_in)
     {
       /* Buffer is full */
       stream->buf_leftover  = stream->next_in  + take;
       stream->buf_leftavail = stream->avail_in - take;

       IF_DEBUG1 (DP(RINT "[leftover] take %u remaining %u\n", take, stream->buf_leftavail));
     }
   else if ((stream->buf_avail < stream->winsize) && !(stream->flags & XD3_FLUSH))
     {
       /* Buffer has space */
       IF_DEBUG1 (DP(RINT "[leftover] %u emptied\n", take));
       return XD3_INPUT;
     }

   /* Use the buffer: */
   stream->next_in   = stream->buf_in;
   stream->avail_in  = stream->buf_avail;
   stream->buf_avail = 0;

   return 0;
 }

 /* Allocates one block of xd3_rlist elements */
 static int
 xd3_alloc_iopt (xd3_stream *stream, int elts)
 {
   int i;
   xd3_iopt_buflist* last =
     (xd3_iopt_buflist*) xd3_alloc (stream, sizeof (xd3_iopt_buflist), 1);

   if (last == NULL ||
       (last->buffer = (xd3_rinst*) xd3_alloc (stream, sizeof (xd3_rinst), elts)) == NULL)
     {
       return ENOMEM;
     }

   last->next = stream->iopt_alloc;
   stream->iopt_alloc = last;

   for (i = 0; i < elts; i += 1)
     {
       xd3_rlist_push_back (& stream->iopt_free, & last->buffer[i]);
     }

   return 0;
 }

 /* This function allocates all memory initially used by the encoder. */
 static int
 xd3_encode_init (xd3_stream *stream, int full_init)
 {
   int i;

   if (full_init)
     {
       int large_comp = (stream->src != NULL);
       int small_comp = ! (stream->flags & XD3_NOCOMPRESS);

       /* Memory allocations for checksum tables are delayed until
        * xd3_string_match_init in the first call to string_match--that way
        * identical or short inputs require no table allocation. */
       if (large_comp)
 	{
 	  usize_t hash_values = (stream->srcwin_maxsz / stream->smatcher.large_step);

 	  xd3_size_hashtable (stream,
 			      hash_values,
 			      & stream->large_hash);
 	}

       if (small_comp)
 	{
 	  /* TODO: This is under devel: used to have min(sprevsz) here, which sort
 	   * of makes sense, but observed fast performance w/ larger tables, which
 	   * also sort of makes sense. @@@ */
 	  usize_t hash_values = stream->winsize;

 	  xd3_size_hashtable (stream,
 			      hash_values,
 			      & stream->small_hash);
 	}
     }

   /* data buffers */
   for (i = 0; i < ENC_SECTS; i += 1)
     {
       if ((stream->enc_heads[i] =
 	   stream->enc_tails[i] =
 	   xd3_alloc_output (stream, NULL)) == NULL)
 	{
 	  return ENOMEM;
 	}
     }

   /* iopt buffer */
   xd3_rlist_init (& stream->iopt_used);
   xd3_rlist_init (& stream->iopt_free);

   if (xd3_alloc_iopt (stream, stream->iopt_size) != 0) { goto fail; }

   XD3_ASSERT (xd3_rlist_length (& stream->iopt_free) == stream->iopt_size);
   XD3_ASSERT (xd3_rlist_length (& stream->iopt_used) == 0);

   /* address cache, code table */
   stream->acache.s_near = stream->code_table_desc->near_modes;
   stream->acache.s_same = stream->code_table_desc->same_modes;
   stream->code_table    = stream->code_table_func ();

   return xd3_alloc_cache (stream);

  fail:

   return ENOMEM;
 }

 int
 xd3_encode_init_full (xd3_stream *stream)
 {
   return xd3_encode_init (stream, 1);
 }

 int
 xd3_encode_init_partial (xd3_stream *stream)
 {
   return xd3_encode_init (stream, 0);
 }

 /* Called after the ENC_POSTOUT state, this puts the output buffers
  * back into separate lists and re-initializes some variables.  (The
  * output lists were spliced together during the ENC_FLUSH state.) */
 static void
 xd3_encode_reset (xd3_stream *stream)
 {
   int i;
   xd3_output *olist;

   stream->avail_in     = 0;
   stream->small_reset  = 1;
   stream->i_slots_used = 0;

   if (stream->src != NULL)
     {
       stream->src->srcbase   = 0;
       stream->src->srclen    = 0;
       stream->srcwin_decided = 0;
       stream->match_minaddr  = 0;
       stream->match_maxaddr  = 0;
       stream->taroff         = 0;
     }

   /* Reset output chains. */
   olist = stream->enc_heads[0];

   for (i = 0; i < ENC_SECTS; i += 1)
     {
       XD3_ASSERT (olist != NULL);

       stream->enc_heads[i] = olist;
       stream->enc_tails[i] = olist;
       olist = olist->next_page;

       stream->enc_heads[i]->next = 0;
       stream->enc_heads[i]->next_page = NULL;

       stream->enc_tails[i]->next_page = NULL;
       stream->enc_tails[i] = stream->enc_heads[i];
     }

   xd3_freelist_output (stream, olist);
 }

 /* The main encoding routine. */
 int
 xd3_encode_input (xd3_stream *stream)
 {
   int ret, i;

   if (stream->dec_state != 0)
     {
       stream->msg = "encoder/decoder transition";
       return XD3_INTERNAL;
     }

   switch (stream->enc_state)
     {
     case ENC_INIT:
       /* Only reached on first time through: memory setup. */
       if ((ret = xd3_encode_init_full (stream))) { return ret; }

       stream->enc_state = ENC_INPUT;

     case ENC_INPUT:

       /* If there is no input yet, just return.  This checks for
        * next_in == NULL, not avail_in == 0 since zero bytes is a
        * valid input.  There is an assertion in xd3_avail_input() that
        * next_in != NULL for this reason.  By returning right away we
        * avoid creating an input buffer before the caller has supplied
        * its first data.  It is possible for xd3_avail_input to be
        * called both before and after the first call to
        * xd3_encode_input(). */
       if (stream->next_in == NULL)
 	{
 	  return XD3_INPUT;
 	}

     enc_flush:
       /* See if we should buffer the input: either if there is already
        * a leftover buffer, or if the input is short of winsize
        * without flush.  The label at this point is reached by a goto
        * below, when there is leftover input after postout. */
       if ((stream->buf_leftover != NULL) ||
 	  (stream->avail_in < stream->winsize && ! (stream->flags & XD3_FLUSH)))
 	{
 	  if ((ret = xd3_encode_buffer_leftover (stream))) { return ret; }
 	}

       /* Initalize the address cache before each window. */
       xd3_init_cache (& stream->acache);

       stream->input_position    = 0;
       stream->min_match = MIN_MATCH;
       stream->unencoded_offset = 0;

       stream->enc_state = ENC_SEARCH;

       IF_DEBUG1 (DP(RINT "[WINSTART:%"Q"u] input bytes %u offset %"Q"u\n",
 		    stream->current_window, stream->avail_in,
 		    stream->total_in));
       return XD3_WINSTART;

     case ENC_SEARCH:

       /* Reentrant matching. */
       if (stream->src != NULL)
 	{
 	  switch (stream->match_state)
 	    {
 	    case MATCH_TARGET:
 	      /* Try matching forward at the start of the target.
 	       * This is entered the first time through, to check for
 	       * a perfect match, and whenever there is a source match
 	       * that extends to the end of the previous window.  The
 	       * match_srcpos field is initially zero and later set
 	       * during xd3_source_extend_match. */

 	      if (stream->avail_in > 0)
 		{
 		  /* This call can't fail because the source window is
 		     unrestricted. */
 		  ret = xd3_source_match_setup (stream, stream->match_srcpos);
 		  XD3_ASSERT (ret == 0);
 		  stream->match_state = MATCH_FORWARD;
 		}
 	      else
 		{
 		  stream->match_state = MATCH_SEARCHING;
 		  stream->match_fwd = 0;
 		}
 	      XD3_ASSERT (stream->match_fwd == 0);

 	    case MATCH_FORWARD:
 	    case MATCH_BACKWARD:
 	      if (stream->avail_in != 0)
 		{
 		  if ((ret = xd3_source_extend_match (stream)) != 0)
 		    {
 		      return ret;
 		    }

 		  /* The search has to make forward progress here
 		   * or else it can get stuck in a match-backward
 		   * (getsrcblk) then match-forward (getsrcblk),
 		   * find insufficient match length, then repeat
 		   * exactly the same search.
 		   */
 		  stream->input_position += stream->match_fwd;
 		}

 	    case MATCH_SEARCHING:
 	      /* Continue string matching.  (It's possible that the
 	       * initial match continued through the entire input, in
 	       * which case we're still in MATCH_FORWARD and should
 	       * remain so for the next input window.) */
 	      break;
 	    }
 	}

       /* String matching... */
       if (stream->avail_in != 0 &&
 	  (ret = stream->smatcher.string_match (stream)))
 	{
 	  return ret;
 	}

       stream->enc_state = ENC_INSTR;

     case ENC_INSTR:
       /* Note: Jump here to encode VCDIFF deltas w/o using this
        * string-matching code. */

       /* Flush the instrution buffer, then possibly add one more
        * instruction, then emit the header. */
       if ((ret = xd3_iopt_flush_instructions (stream, 1)) ||
           (ret = xd3_iopt_add_finalize (stream)))
 	{
 	  return ret;
 	}

       stream->enc_state = ENC_FLUSH;

     case ENC_FLUSH:
       /* Note: main_recode_func() bypasses string-matching by setting
        * ENC_FLUSH. */
       if ((ret = xd3_emit_hdr (stream)))
 	{
 	  return ret;
 	}

       /* Begin output. */
       stream->enc_current = HDR_HEAD (stream);

       /* Chain all the outputs together.  After doing this, it looks
        * as if there is only one section.  The other enc_heads are set
        * to NULL to avoid freeing them more than once. */
        for (i = 1; i < ENC_SECTS; i += 1)
 	{
 	  stream->enc_tails[i-1]->next_page = stream->enc_heads[i];
 	  stream->enc_heads[i] = NULL;
 	}

     enc_output:

       stream->enc_state  = ENC_POSTOUT;
       stream->next_out   = stream->enc_current->base;
       stream->avail_out  = stream->enc_current->next;
       stream->total_out += (xoff_t) stream->avail_out;

       /* If there is any output in this buffer, return it, otherwise
        * fall through to handle the next buffer or finish the window
        * after all buffers have been output. */
       if (stream->avail_out > 0)
 	{
 	  /* This is the only place xd3_encode returns XD3_OUTPUT */
 	  return XD3_OUTPUT;
 	}

     case ENC_POSTOUT:

       if (stream->avail_out != 0)
 	{
 	  stream->msg = "missed call to consume output";
 	  return XD3_INTERNAL;
 	}

       /* Continue outputting one buffer at a time, until the next is NULL. */
       if ((stream->enc_current = stream->enc_current->next_page) != NULL)
 	{
 	  goto enc_output;
 	}

       stream->total_in += (xoff_t) stream->avail_in;
       stream->enc_state = ENC_POSTWIN;

       IF_DEBUG1 (DP(RINT "[WINFINISH:%"Q"u] in=%"Q"u\n",
 		    stream->current_window,
 		    stream->total_in));
       return XD3_WINFINISH;

     case ENC_POSTWIN:

       xd3_encode_reset (stream);

       stream->current_window += 1;
       stream->enc_state = ENC_INPUT;

       /* If there is leftover input to flush, repeat. */
       if ((stream->buf_leftover != NULL) && (stream->flags & XD3_FLUSH))
 	{
 	  goto enc_flush;
 	}

       /* Ready for more input. */
       return XD3_INPUT;

     default:
       stream->msg = "invalid state";
       return XD3_INTERNAL;
     }
 }
 #endif /* XD3_ENCODER */

 /*****************************************************************
  Client convenience functions
  ******************************************************************/

 static int
 xd3_process_stream (int            is_encode,
 		    xd3_stream    *stream,
 		    int          (*func) (xd3_stream *),
 		    int            close_stream,
 		    const uint8_t *input,
 		    usize_t        input_size,
 		    uint8_t       *output,
 		    usize_t       *output_size,
 		    usize_t        output_size_max)
 {
   usize_t ipos = 0;
   usize_t n = min(stream->winsize, input_size);

   (*output_size) = 0;

   stream->flags |= XD3_FLUSH;

   xd3_avail_input (stream, input + ipos, n);
   ipos += n;

   for (;;)
     {
       int ret;
       switch((ret = func (stream)))
 	{
 	case XD3_OUTPUT: { /* memcpy below */ break; }
 	case XD3_INPUT: {
 	  n = min(stream->winsize, input_size - ipos);
 	  if (n == 0) {
 	    goto done;
 	  }
 	  xd3_avail_input (stream, input + ipos, n);
 	  ipos += n;
 	  continue;
 	}
 	case XD3_GOTHEADER: { /* ignore */ continue; }
 	case XD3_WINSTART: { /* ignore */ continue; }
 	case XD3_WINFINISH: { /* ignore */ continue; }
 	case XD3_GETSRCBLK:
 	  {
 	    stream->msg = "stream requires source input";
 	    return XD3_INTERNAL;
 	  }
 	case 0:
 	  {
 	    /* xd3_encode_input/xd3_decode_input never return 0 */
 	    stream->msg = "invalid return: 0";
 	    return XD3_INTERNAL;
 	  }
 	default:
 	  return ret;
 	}

       if (*output_size + stream->avail_out > output_size_max)
 	{
 	  stream->msg = "insufficient output space";
 	  return ENOSPC;
 	}

       memcpy (output + *output_size, stream->next_out, stream->avail_out);

       *output_size += stream->avail_out;

       xd3_consume_output (stream);
     }
  done:
   return (close_stream == 0) ? 0 : xd3_close_stream (stream);
 }

 static int
 xd3_process_memory (int            is_encode,
 		    int          (*func) (xd3_stream *),
 		    int            close_stream,
 		    const uint8_t *input,
 		    usize_t        input_size,
 		    const uint8_t *source,
 		    usize_t        source_size,
 		    uint8_t       *output,
 		    usize_t       *output_size,
 		    usize_t        output_size_max,
 		    int            flags) {
   xd3_stream stream;
   xd3_config config;
   xd3_source src;
   int ret;

   memset (& stream, 0, sizeof (stream));
   memset (& config, 0, sizeof (config));

   if (input == NULL || output == NULL) {
     stream.msg = "invalid input/output buffer";
     ret = XD3_INTERNAL;
     goto exit;
   }

   config.flags = flags;

   if (is_encode)
     {
       config.srcwin_maxsz = source_size;
       config.winsize = min(input_size, (usize_t) XD3_DEFAULT_WINSIZE);
       config.iopt_size = min(input_size / 32, XD3_DEFAULT_IOPT_SIZE);
       config.iopt_size = max(config.iopt_size, 128U);
       config.sprevsz = xd3_pow2_roundup (config.winsize);
     }

   if ((ret = xd3_config_stream (&stream, &config)) != 0)
     {
       goto exit;
     }

   if (source != NULL)
     {
       memset (& src, 0, sizeof (src));
       src.size = source_size;

       src.blksize = source_size;
       src.onblk = source_size;
       src.curblk = source;
       src.curblkno = 0;

       if ((ret = xd3_set_source (&stream, &src)) != 0)
 	{
 	  goto exit;
 	}
     }

   if ((ret = xd3_process_stream (is_encode,
 				 & stream,
 				 func, 1,
 				 input, input_size,
 				 output,
 				 output_size,
 				 output_size_max)) != 0)
     {
       goto exit;
     }

  exit:
   if (ret != 0) {
     IF_DEBUG1 (DP(RINT "process_memory: %d: %s", ret, stream.msg));
   }
   xd3_free_stream(&stream);
   return ret;
 }

 int
 xd3_decode_stream (xd3_stream    *stream,
 		   const uint8_t *input,
 		   usize_t        input_size,
 		   uint8_t       *output,
 		   usize_t       *output_size,
 		   usize_t        output_size_max)
 {
   return xd3_process_stream (0, stream, & xd3_decode_input, 1,
 			     input, input_size,
 			     output, output_size, output_size_max);
 }

 int
 xd3_decode_memory (const uint8_t *input,
 		   usize_t        input_size,
 		   const uint8_t *source,
 		   usize_t        source_size,
 		   uint8_t       *output,
 		   usize_t       *output_size,
 		   usize_t        output_size_max,
 		   int            flags) {
   return xd3_process_memory (0, & xd3_decode_input, 1,
 			     input, input_size,
 			     source, source_size,
 			     output, output_size, output_size_max,
 			     flags);
 }


 #if XD3_ENCODER
 int
 xd3_encode_stream (xd3_stream    *stream,
 		   const uint8_t *input,
 		   usize_t         input_size,
 		   uint8_t       *output,
 		   usize_t        *output_size,
 		   usize_t         output_size_max)
 {
   return xd3_process_stream (1, stream, & xd3_encode_input, 1,
 			     input, input_size,
 			     output, output_size, output_size_max);
 }

 int
 xd3_encode_memory (const uint8_t *input,
 		   usize_t        input_size,
 		   const uint8_t *source,
 		   usize_t        source_size,
 		   uint8_t       *output,
 		   usize_t        *output_size,
 		   usize_t        output_size_max,
 		   int            flags) {
   return xd3_process_memory (1, & xd3_encode_input, 1,
 			     input, input_size,
 			     source, source_size,
 			     output, output_size, output_size_max,
 			     flags);
 }
 #endif


 /*************************************************************
  String matching helpers
  *************************************************************/

 #if XD3_ENCODER
 /* Do the initial xd3_string_match() checksum table setup.
  * Allocations are delayed until first use to avoid allocation
  * sometimes (e.g., perfect matches, zero-length inputs). */
 static int
 xd3_string_match_init (xd3_stream *stream)
 {
   const int DO_SMALL = ! (stream->flags & XD3_NOCOMPRESS);
   const int DO_LARGE = (stream->src != NULL);

   if (DO_LARGE && stream->large_table == NULL)
     {
       if ((stream->large_table =
 	   (usize_t*) xd3_alloc0 (stream, stream->large_hash.size, sizeof (usize_t))) == NULL)
 	{
 	  return ENOMEM;
 	}
     }

   if (DO_SMALL)
     {
       /* Subsequent calls can return immediately after checking reset. */
       if (stream->small_table != NULL)
 	{
 	  /* The target hash table is reinitialized once per window. */
 	  /* TODO: This would not have to be reinitialized if absolute
 	   * offsets were being stored. */
 	  if (stream->small_reset)
 	    {
 	      stream->small_reset = 0;
 	      memset (stream->small_table, 0,
 		      sizeof (usize_t) * stream->small_hash.size);
 	    }

 	  return 0;
 	}

       if ((stream->small_table =
 	   (usize_t*) xd3_alloc0 (stream,
 				  stream->small_hash.size,
 				  sizeof (usize_t))) == NULL)
 	{
 	  return ENOMEM;
 	}

       /* If there is a previous table needed. */
       if (stream->smatcher.small_lchain > 1 ||
 	  stream->smatcher.small_chain > 1)
 	{
 	  if ((stream->small_prev =
 	       (xd3_slist*) xd3_alloc (stream,
 				       stream->sprevsz,
 				       sizeof (xd3_slist))) == NULL)
 	    {
 	      return ENOMEM;
 	    }
 	}
     }

   return 0;
 }

 #if XD3_USE_LARGEFILE64
 /* This function handles the 32/64bit ambiguity -- file positions are 64bit
  * but the hash table for source-offsets is 32bit. */
 static xoff_t
 xd3_source_cksum_offset(xd3_stream *stream, usize_t low)
 {
   xoff_t scp = stream->srcwin_cksum_pos;
   xoff_t s0 = scp >> 32;

   usize_t sr = (usize_t) scp;

   if (s0 == 0) {
     return low;
   }

   /* This should not be >= because srcwin_cksum_pos is the next
    * position to index. */
   if (low > sr) {
     return (--s0 << 32) | low;
   }

   return (s0 << 32) | low;
 }
 #else
 static xoff_t
 xd3_source_cksum_offset(xd3_stream *stream, usize_t low)
 {
   return (xoff_t) low;
 }
 #endif

 /* This function sets up the stream->src fields srcbase, srclen.  The
  * call is delayed until these values are needed to encode a copy
  * address.  At this point the decision has to be made. */
 static int
 xd3_srcwin_setup (xd3_stream *stream)
 {
   xd3_source *src = stream->src;
   xoff_t length, x;

   /* Check the undecided state. */
   XD3_ASSERT (src->srclen == 0 && src->srcbase == 0);

   /* Avoid repeating this call. */
   stream->srcwin_decided = 1;

   /* If the stream is flushing, then the iopt buffer was able to
    * contain the complete encoding.  If no copies were issued no
    * source window is actually needed.  This prevents the VCDIFF
    * header from including source base/len.  xd3_emit_hdr checks for
    * srclen == 0. */
   if (stream->enc_state == ENC_INSTR && stream->match_maxaddr == 0)
     {
       goto done;
     }

   /* Check for overflow, srclen is usize_t - this can't happen unless
    * XD3_DEFAULT_SRCBACK and related parameters are extreme - should
    * use smaller windows. */
   length = stream->match_maxaddr - stream->match_minaddr;

   x = (xoff_t) USIZE_T_MAX;
   if (length > x)
     {
       stream->msg = "source window length overflow (not 64bit)";
       return XD3_INTERNAL;
     }

   /* If ENC_INSTR, then we know the exact source window to use because
    * no more copies can be issued. */
   if (stream->enc_state == ENC_INSTR)
     {
       src->srcbase = stream->match_minaddr;
       src->srclen  = (usize_t) length;
       XD3_ASSERT (src->srclen);
       goto done;
     }

   /* Otherwise, we have to make a guess.  More copies may still be
    * issued, but we have to decide the source window base and length
    * now.  */
   src->srcbase = stream->match_minaddr;
   src->srclen  = max ((usize_t) length, stream->avail_in + (stream->avail_in >> 2));
   if (src->size < src->srcbase + (xoff_t) src->srclen)
     {
       /* Could reduce srcbase, as well. */
       src->srclen = src->size - src->srcbase;
     }

   XD3_ASSERT (src->srclen);
  done:
   /* Set the taroff.  This convenience variable is used even when
      stream->src == NULL. */
   stream->taroff = src->srclen;
   return 0;
 }

 /* Sets the bounding region for a newly discovered source match, prior
  * to calling xd3_source_extend_match().  This sets the match_maxfwd,
  * match_maxback variables.  Note: srcpos is an absolute position
  * (xoff_t) but the match_maxfwd, match_maxback variables are usize_t.
  * Returns 0 if the setup succeeds, or 1 if the source position lies
  * outside an already-decided srcbase/srclen window. */
 static int
 xd3_source_match_setup (xd3_stream *stream, xoff_t srcpos)
 {
   xd3_source *src = stream->src;
   usize_t greedy_or_not;

   stream->match_maxback = 0;
   stream->match_maxfwd  = 0;
   stream->match_back    = 0;
   stream->match_fwd     = 0;

   /* This avoids a non-blocking endless loop caused by scanning
    * backwards across a block boundary, only to find not enough
    * matching bytes to beat the current min_match due to a better lazy
    * target match: the re-entry to xd3_string_match() repeats the same
    * long match because the input position hasn't changed.  TODO: if
    * ever duplicates are added to the source hash table, this logic
    * won't suffice to avoid loops.  See testing/regtest.cc's
    * TestNonBlockingProgress test! */
   if (srcpos != 0 && srcpos == stream->match_last_srcpos)
     {
       goto bad;
     }

   /* Going backwards, the 1.5-pass algorithm allows some
    * already-matched input may be covered by a longer source match.
    * The greedy algorithm does not allow this. */
   if (stream->flags & XD3_BEGREEDY)
     {
       /* The greedy algorithm allows backward matching to the last
 	 matched position. */
       greedy_or_not = xd3_iopt_last_matched (stream);
     }
   else
     {
       /* The 1.5-pass algorithm allows backward matching to go back as
        * far as the unencoded offset, which is updated as instructions
        * pass out of the iopt buffer.  If this (default) is chosen, it
        * means xd3_iopt_erase may be called to eliminate instructions
        * when a covering source match is found. */
       greedy_or_not = stream->unencoded_offset;
     }

   /* Backward target match limit. */
   XD3_ASSERT (stream->input_position >= greedy_or_not);
   stream->match_maxback = stream->input_position - greedy_or_not;

   /* Forward target match limit. */
   XD3_ASSERT (stream->avail_in > stream->input_position);
   stream->match_maxfwd = stream->avail_in - stream->input_position;

   /* Now we take the source position into account.  It depends whether
    * the srclen/srcbase have been decided yet. */
   if (stream->srcwin_decided == 0)
     {
       /* Unrestricted case: the match can cover the entire source,
        * 0--src->size.  We compare the usize_t
        * match_maxfwd/match_maxback against the xoff_t
        * src->size/srcpos values and take the min. */
       xoff_t srcavail;

       if (srcpos < (xoff_t) stream->match_maxback)
 	{
 	  stream->match_maxback = srcpos;
 	}

       srcavail = src->size - srcpos;
       if (srcavail < (xoff_t) stream->match_maxfwd)
 	{
 	  stream->match_maxfwd = srcavail;
 	}

       goto good;
     }

   /* Decided some source window. */
   XD3_ASSERT (src->srclen > 0);

   /* Restricted case: fail if the srcpos lies outside the source window */
   if ((srcpos < src->srcbase) || (srcpos > (src->srcbase + (xoff_t) src->srclen)))
     {
       goto bad;
     }
   else
     {
       usize_t srcavail;

       srcavail = (usize_t) (srcpos - src->srcbase);
       if (srcavail < stream->match_maxback)
 	{
 	  stream->match_maxback = srcavail;
 	}

       srcavail = (usize_t) (src->srcbase + (xoff_t) src->srclen - srcpos);
       if (srcavail < stream->match_maxfwd)	{
 	  stream->match_maxfwd = srcavail;
 	}

       goto good;
     }

  good:
   stream->match_state  = MATCH_BACKWARD;
   stream->match_srcpos = srcpos;
   stream->match_last_srcpos = srcpos;
   return 0;

  bad:
   stream->match_state  = MATCH_SEARCHING;
   return 1;
 }

 /* This code is experimental, and I'm having trouble benchmarking
  * it reliably. */
 #if 0
 static inline int
 xd3_forward_match(const uint8_t *s1c, const uint8_t *s2c, size_t n)
 {
   size_t i = 0;
 #if UNALIGNED_OK
   size_t nint = n / sizeof(int);

   if (nint >> 3)
     {
       size_t j = 0;
       const int *s1 = (const int*)s1c;
       const int *s2 = (const int*)s2c;
       size_t nint_8 = nint - 8;

       while (i <= nint_8 &&
 	     s1[i++] == s2[j++] &&
 	     s1[i++] == s2[j++] &&
 	     s1[i++] == s2[j++] &&
 	     s1[i++] == s2[j++] &&
 	     s1[i++] == s2[j++] &&
 	     s1[i++] == s2[j++] &&
 	     s1[i++] == s2[j++] &&
 	     s1[i++] == s2[j++]) { }

       i = (i - 1) * sizeof(int);
     }
 #endif

   while (i < n && s1c[i] == s2c[i])
     {
       i++;
     }
   return i;
 }
 #else
 static inline usize_t
 xd3_forward_match(const uint8_t *s1c,
 		  const uint8_t *s2c,
 		  usize_t n) {
   usize_t i = 0;
   while (i < n && s1c[i] == s2c[i])
     {
       i++;
     }
   return i;
 }
 #endif


 /* This function expands the source match backward and forward.  It is
  * reentrant, since xd3_getblk may return XD3_GETSRCBLK, so most
  * variables are kept in xd3_stream.  There are two callers of this
  * function, the string_matching routine when a checksum match is
  * discovered, and xd3_encode_input whenever a continuing (or initial)
  * match is suspected.  The two callers do different things with the
  * input_position, thus this function leaves that variable untouched.
  * If a match is taken the resulting stream->match_fwd is left
  * non-zero. */
 static int
 xd3_source_extend_match (xd3_stream *stream)
 {
   int ret;
   xd3_source *src = stream->src;
   xoff_t matchoff;  /* matchoff is the current right/left-boundary of
 		       the source match being tested. */
   usize_t streamoff; /* streamoff is the current right/left-boundary
 			of the input match being tested. */
   xoff_t tryblk;    /* tryblk, tryoff are the block, offset position
 		       of matchoff */
   usize_t tryoff;
   usize_t tryrem;    /* tryrem is the number of matchable bytes */
   usize_t matched;

   XD3_ASSERT (src != NULL);

   /* Does it make sense to compute backward match AFTER forward match? */
   if (stream->match_state == MATCH_BACKWARD)
     {
       /* Note: this code is practically duplicated below, substituting
        * match_fwd/match_back and direction.  Consolidate? */
       matchoff  = stream->match_srcpos - stream->match_back;
       streamoff = stream->input_position - stream->match_back;
       xd3_blksize_div (matchoff, src, &tryblk, &tryoff);

       /* this loops backward over source blocks */
       while (stream->match_back < stream->match_maxback)
 	{
 	  /* see if we're backing across a source block boundary */
 	  if (tryoff == 0)
 	    {
 	      tryoff  = src->blksize;
 	      tryblk -= 1;
 	    }

 	  if ((ret = xd3_getblk (stream, tryblk)))
 	    {
 	      /* if search went too far back, continue forward. */
 	      if (ret == XD3_TOOFARBACK)
 		{
 		  break;
 		}

 	      /* could be a XD3_GETSRCBLK failure. */
 	      return ret;
 	    }

 	  /* TODO: This code can be optimized similar to xd3_match_forward() */
 	  for (tryrem = min (tryoff, stream->match_maxback -
 			     stream->match_back);
 	       tryrem != 0;
 	       tryrem -= 1, stream->match_back += 1)
 	    {
 	      if (src->curblk[tryoff-1] != stream->next_in[streamoff-1])
 		{
 		  goto doneback;
 		}

 	      tryoff    -= 1;
 	      streamoff -= 1;
 	    }
 	}

     doneback:
       stream->match_state = MATCH_FORWARD;
     }

   XD3_ASSERT (stream->match_state == MATCH_FORWARD);

   matchoff  = stream->match_srcpos + stream->match_fwd;
   streamoff = stream->input_position + stream->match_fwd;
   xd3_blksize_div (matchoff, src, & tryblk, & tryoff);

   /* Note: practically the same code as backwards case above: same comments */
   while (stream->match_fwd < stream->match_maxfwd)
     {
       if (tryoff == src->blksize)
 	{
 	  tryoff  = 0;
 	  tryblk += 1;
 	}

       if ((ret = xd3_getblk (stream, tryblk)))
 	{
 	  /* if search went too far back, continue forward. */
 	  if (ret == XD3_TOOFARBACK)
 	    {
 	      break;
 	    }

 	  /* could be a XD3_GETSRCBLK failure. */
 	  return ret;
 	}

       tryrem = min(stream->match_maxfwd - stream->match_fwd,
 		   src->blksize - tryoff);

       matched = xd3_forward_match(src->curblk + tryoff,
 				  stream->next_in + streamoff,
 				  tryrem);
       tryoff += matched;
       streamoff += matched;
       stream->match_fwd += matched;

       if (tryrem != matched)
 	{
 	  break;
 	}
     }

   stream->match_state = MATCH_SEARCHING;

   /* If the match ends short of the last instruction end, we probably
    * don't want it.  There is the possibility that a copy ends short
    * of the last copy but also goes further back, in which case we
    * might want it.  This code does not implement such: if so we would
    * need more complicated xd3_iopt_erase logic. */
   if (stream->match_fwd < stream->min_match)
     {
       stream->match_fwd = 0;
     }
   else
     {
       usize_t total  = stream->match_fwd + stream->match_back;

       /* Correct the variables to remove match_back from the equation. */
       usize_t target_position = stream->input_position - stream->match_back;
       usize_t match_length   = stream->match_back      + stream->match_fwd;
       xoff_t match_position  = stream->match_srcpos    - stream->match_back;
       xoff_t match_end       = stream->match_srcpos    + stream->match_fwd;

       /* At this point we may have to erase any iopt-buffer
        * instructions that are fully covered by a backward-extending
        * copy. */
       if (stream->match_back > 0)
 	{
 	  xd3_iopt_erase (stream, target_position, total);
 	}

       stream->match_back = 0;

       /* Update ranges.  The first source match occurs with both
 	 values set to 0. */
       if (stream->match_maxaddr == 0 ||
 	  match_position < stream->match_minaddr)
 	{
 	  stream->match_minaddr = match_position;
 	}

       if (match_end > stream->match_maxaddr)
 	{
 	  /* Note: per-window */
 	  stream->match_maxaddr = match_end;
 	}

       if (match_end > stream->maxsrcaddr)
 	{
 	  /* Note: across windows */
 	  stream->maxsrcaddr = match_end;
 	}

       IF_DEBUG1 ({
 	static int x = 0;
 	DP(RINT "[source match:%d] <inp %"Q"u %"Q"u>  <src %"Q"u %"Q"u> (%s) [ %u bytes ]\n",
 	   x++,
 	   stream->total_in + target_position,
 	   stream->total_in + target_position + match_length,
 	   match_position,
 	   match_position + match_length,
 	   (stream->total_in + target_position == match_position) ? "same" : "diff",
 	   match_length);
       });

       if ((ret = xd3_found_match (stream,
 				  /* decoder position */ target_position,
 				  /* length */ match_length,
 				  /* address */ match_position,
 				  /* is_source */ 1)))
 	{
 	  return ret;
 	}

       /* If the match ends with the available input: */
       if (target_position + match_length == stream->avail_in)
 	{
 	  /* Setup continuing match for the next window. */
 	  stream->match_state  = MATCH_TARGET;
 	  stream->match_srcpos = match_end;
 	}
     }

   return 0;
 }

 /* Update the small hash.  Values in the small_table are offset by
  * HASH_CKOFFSET (1) to distinguish empty buckets from real offsets. */
 static void
 xd3_scksum_insert (xd3_stream *stream,
 		   usize_t inx,
 		   usize_t scksum,
 		   usize_t pos)
 {
   /* If we are maintaining previous duplicates. */
   if (stream->small_prev)
     {
       usize_t    last_pos = stream->small_table[inx];
       xd3_slist *pos_list = & stream->small_prev[pos & stream->sprevmask];

       /* Note last_pos is offset by HASH_CKOFFSET. */
       pos_list->last_pos = last_pos;
     }

   /* Enter the new position into the hash bucket. */
   stream->small_table[inx] = pos + HASH_CKOFFSET;
 }

 #if XD3_DEBUG
 static int
 xd3_check_smatch (const uint8_t *ref0, const uint8_t *inp0,
 		  const uint8_t *inp_max, usize_t cmp_len)
 {
   usize_t i;

   for (i = 0; i < cmp_len; i += 1)
     {
       XD3_ASSERT (ref0[i] == inp0[i]);
     }

   if (inp0 + cmp_len < inp_max)
     {
       XD3_ASSERT (inp0[i] != ref0[i]);
     }

   return 1;
 }
 #endif /* XD3_DEBUG */

 /* When the hash table indicates a possible small string match, it
  * calls this routine to find the best match.  The first matching
  * position is taken from the small_table, HASH_CKOFFSET is subtracted
  * to get the actual position.  After checking that match, if previous
  * linked lists are in use (because stream->smatcher.small_chain > 1),
  * previous matches are tested searching for the longest match.  If
  * (stream->min_match > MIN_MATCH) then a lazy match is in effect.
  */
 static usize_t
 xd3_smatch (xd3_stream *stream,
 	    usize_t base,
 	    usize_t scksum,
 	    usize_t *match_offset)
 {
   usize_t cmp_len;
   usize_t match_length = 0;
   usize_t chain = (stream->min_match == MIN_MATCH ?
                    stream->smatcher.small_chain :
                    stream->smatcher.small_lchain);
   const uint8_t *inp_max = stream->next_in + stream->avail_in;
   const uint8_t *inp;
   const uint8_t *ref;

   SMALL_HASH_DEBUG1 (stream, stream->next_in + stream->input_position);

   XD3_ASSERT (stream->min_match + stream->input_position <= stream->avail_in);

   base -= HASH_CKOFFSET;

  again:

   IF_DEBUG2 (DP(RINT "smatch at base=%u inp=%u cksum=%u\n", base,
                 stream->input_position, scksum));

   /* For small matches, we can always go to the end-of-input because
    * the matching position must be less than the input position. */
   XD3_ASSERT (base < stream->input_position);

   ref = stream->next_in + base;
   inp = stream->next_in + stream->input_position;

   SMALL_HASH_DEBUG2 (stream, ref);

   /* Expand potential match forward. */
   while (inp < inp_max && *inp == *ref)
     {
       ++inp;
       ++ref;
     }

   cmp_len = inp - (stream->next_in + stream->input_position);

   /* Verify correctness */
   XD3_ASSERT (xd3_check_smatch (stream->next_in + base,
 				stream->next_in + stream->input_position,
 				inp_max, cmp_len));

   /* Update longest match */
   if (cmp_len > match_length)
     {
       ( match_length) = cmp_len;
       (*match_offset) = base;

       /* Stop if we match the entire input or have a long_enough match. */
       if (inp == inp_max || cmp_len >= stream->smatcher.long_enough)
 	{
 	  goto done;
 	}
     }

   /* If we have not reached the chain limit, see if there is another
      previous position. */
   while (--chain != 0)
     {
       /* Calculate the previous offset. */
       usize_t prev_pos = stream->small_prev[base & stream->sprevmask].last_pos;
       usize_t diff_pos;

        if (prev_pos == 0)
  	{
  	  break;
  	}

       prev_pos -= HASH_CKOFFSET;

       if (prev_pos > base)
         {
           break;
         }

       base = prev_pos;

       XD3_ASSERT (stream->input_position > base);
       diff_pos = stream->input_position - base;

       /* Stop searching if we go beyond sprevsz, since those entries
        * are for unrelated checksum entries. */
       if (diff_pos & ~stream->sprevmask)
         {
           break;
         }

       goto again;
     }

  done:
   /* Crude efficiency test: if the match is very short and very far back, it's
    * unlikely to help, but the exact calculation requires knowing the state of
    * the address cache and adjacent instructions, which we can't do here.
    * Rather than encode a probably inefficient copy here and check it later
    * (which complicates the code a lot), do this:
    */
   if (match_length == 4 && stream->input_position - (*match_offset) >= 1<<14)
     {
       /* It probably takes >2 bytes to encode an address >= 2^14 from here */
       return 0;
     }
   if (match_length == 5 && stream->input_position - (*match_offset) >= 1<<21)
     {
       /* It probably takes >3 bytes to encode an address >= 2^21 from here */
       return 0;
     }

   /* It's unlikely that a window is large enough for the (match_length == 6 &&
    * address >= 2^28) check */
   return match_length;
 }

 #if XD3_DEBUG
 static void
 xd3_verify_small_state (xd3_stream    *stream,
 			const uint8_t *inp,
 			uint32_t          x_cksum)
 {
   uint32_t state;
   uint32_t cksum = xd3_scksum (&state, inp, stream->smatcher.small_look);

   XD3_ASSERT (cksum == x_cksum);
 }

 static void
 xd3_verify_large_state (xd3_stream    *stream,
 			const uint8_t *inp,
 			uint32_t          x_cksum)
 {
   uint32_t cksum = xd3_lcksum (inp, stream->smatcher.large_look);
   XD3_ASSERT (cksum == x_cksum);
 }
 static void
 xd3_verify_run_state (xd3_stream    *stream,
 		      const uint8_t *inp,
 		      int            x_run_l,
 		      uint8_t        x_run_c)
 {
   int     slook = stream->smatcher.small_look;
   uint8_t run_c;
   int     run_l = xd3_comprun (inp, slook, &run_c);

   XD3_ASSERT (run_l == 0 || run_c == x_run_c);
   XD3_ASSERT (x_run_l > slook || run_l == x_run_l);
 }
 #endif /* XD3_DEBUG */

 /* This function computes more source checksums to advance the window.
  * Called at every entrance to the string-match loop and each time
  * stream->input_position reaches the value returned as
  * *next_move_point.  NB: this is one of the most expensive functions
  * in this code and also the most critical for good compression.
  *
  * TODO: really would like a good test for this logic. how?
  * Update: testing/regtest.cc has some basic tests, more would be nice.
  * TODO: optimize the inner loop
  */
 static int
 xd3_srcwin_move_point (xd3_stream *stream, usize_t *next_move_point)
 {
   xoff_t logical_input_cksum_pos;

   XD3_ASSERT(stream->srcwin_cksum_pos <= stream->src->size);
   if (stream->srcwin_cksum_pos == stream->src->size)
     {
       *next_move_point = USIZE_T_MAX;
       return 0;
     }

   /* Begin by advancing at twice the input rate, up to half the
    * maximum window size. */
   logical_input_cksum_pos = min((stream->total_in + stream->input_position) * 2,
 				(stream->total_in + stream->input_position) +
 				  (stream->srcwin_maxsz / 2));

   /* If srcwin_cksum_pos is already greater, wait until the difference
    * is met. */
   if (stream->srcwin_cksum_pos > logical_input_cksum_pos)
     {
       *next_move_point = stream->input_position +
 	(usize_t)(stream->srcwin_cksum_pos - logical_input_cksum_pos);
       return 0;
     }

   /* A long match may have extended past srcwin_cksum_pos.  Don't
    * start checksumming already-matched source data. */
   if (stream->maxsrcaddr > stream->srcwin_cksum_pos)
     {
       stream->srcwin_cksum_pos = stream->maxsrcaddr;
     }

   if (logical_input_cksum_pos < stream->srcwin_cksum_pos)
     {
       logical_input_cksum_pos = stream->srcwin_cksum_pos;
     }

   /* Advance at least one source block.  With the command-line
    * defaults this means:
    *
    * if (src->size <= srcwin_maxsz), index the entire source at once
    * using the position of the first non-match.  This is good for
    * small inputs, especially when the content may have moved anywhere
    * in the file (e.g., tar files).
    *
    * if (src->size > srcwin_maxsz), index at least one block (which
    * the command-line sets to 1/32 of srcwin_maxsz) ahead of the
    * logical position.  This is good for different reasons: when a
    * long match spanning several source blocks is encountered, this
    * avoids computing checksums for those blocks.  If the data can
    * move anywhere, this is bad.
    */
   logical_input_cksum_pos += stream->src->blksize;

   IF_DEBUG1 (DP(RINT "[srcwin_move_point] T=%"Q"u S=%"Q"u/%"Q"u\n",
 	       stream->total_in + stream->input_position,
 	       stream->srcwin_cksum_pos,
 	       logical_input_cksum_pos));

   while (stream->srcwin_cksum_pos < logical_input_cksum_pos &&
 	 stream->srcwin_cksum_pos < stream->src->size)
     {
       xoff_t  blkno;
       xoff_t  blkbaseoffset;
       usize_t blkrem;
       ssize_t oldpos;
       ssize_t blkpos;
       int ret;
       xd3_blksize_div (stream->srcwin_cksum_pos,
 		       stream->src, &blkno, &blkrem);
       oldpos = blkrem;
       blkpos = xd3_bytes_on_srcblk_fast (stream->src, blkno);

       if (oldpos + stream->smatcher.large_look > (usize_t) blkpos)
 	{
 	  stream->srcwin_cksum_pos = (blkno + 1) * stream->src->blksize;
 	  continue;
 	}

       if ((ret = xd3_getblk (stream, blkno)))
 	{
 	  /* TOOFARBACK should never occur here, since we read forward. */
 	  if (ret == XD3_TOOFARBACK)
 	    {
  	      ret = XD3_INTERNAL;
 	    }
 	  return ret;
 	}

       /* This inserts checksums for the entire block, in reverse,
        * starting from the end of the block.  This logic does not test
        * stream->srcwin_cksum_pos because it always advances it to the
        * start of the next block.
        *
        * oldpos is the srcwin_cksum_pos within this block.  blkpos is
        * the number of bytes available.  Each iteration inspects
        * large_look bytes then steps back large_step bytes.  The
        * if-stmt above ensures at least one large_look of data. */
       blkpos -= stream->smatcher.large_look;
       blkbaseoffset = stream->src->blksize * blkno;

       do
 	{
 	  uint32_t cksum = xd3_lcksum (stream->src->curblk + blkpos,
 				       stream->smatcher.large_look);
 	  usize_t hval = xd3_checksum_hash (& stream->large_hash, cksum);

 	  stream->large_table[hval] =
 	    (usize_t) (blkbaseoffset +
 		       (xoff_t)(blkpos + HASH_CKOFFSET));

 	  IF_DEBUG (stream->large_ckcnt += 1);

 	  blkpos -= stream->smatcher.large_step;
 	}
       while (blkpos >= oldpos);

       stream->srcwin_cksum_pos = (blkno + 1) * stream->src->blksize;
     }

   if (stream->srcwin_cksum_pos >= stream->src->size)
     {
       /* This invariant is needed for xd3_source_cksum_offset() */
       stream->srcwin_cksum_pos = stream->src->size;
       *next_move_point = USIZE_T_MAX;
       return 0;
     }

   /* How long until this function should be called again. */
   XD3_ASSERT(stream->srcwin_cksum_pos >= logical_input_cksum_pos);
   *next_move_point = stream->input_position + 1 +
     (usize_t)(stream->srcwin_cksum_pos - logical_input_cksum_pos);
   return 0;
 }

 #endif /* XD3_ENCODER */

 /********************************************************************
  TEMPLATE pass
  *********************************************************************/

 #endif /* __XDELTA3_C_INLINE_PASS__ */
 #ifdef __XDELTA3_C_TEMPLATE_PASS__

 #if XD3_ENCODER

 /********************************************************************
  Templates
  *******************************************************************/

 /* Template macros */
 #define XD3_TEMPLATE(x)      XD3_TEMPLATE2(x,TEMPLATE)
 #define XD3_TEMPLATE2(x,n)   XD3_TEMPLATE3(x,n)
 #define XD3_TEMPLATE3(x,n)   x ## n
 #define XD3_STRINGIFY(x)     XD3_STRINGIFY2(x)
 #define XD3_STRINGIFY2(x)    #x

 static int XD3_TEMPLATE(xd3_string_match_) (xd3_stream *stream);

 static const xd3_smatcher XD3_TEMPLATE(__smatcher_) =
 {
   XD3_STRINGIFY(TEMPLATE),
   XD3_TEMPLATE(xd3_string_match_),
 #if SOFTCFG == 1
   0, 0, 0, 0, 0, 0, 0
 #else
   LLOOK, LSTEP, SLOOK, SCHAIN, SLCHAIN, MAXLAZY, LONGENOUGH
 #endif
 };

 static int
 XD3_TEMPLATE(xd3_string_match_) (xd3_stream *stream)
 {
   const int      DO_SMALL = ! (stream->flags & XD3_NOCOMPRESS);
   const int      DO_LARGE = (stream->src != NULL);
   const int      DO_RUN   = (1);

   const uint8_t *inp;
   uint32_t       scksum = 0;
   uint32_t       scksum_state;
   uint32_t       lcksum = 0;
   usize_t         sinx;
   usize_t         linx;
   uint8_t        run_c;
   size_t          run_l;
   int            ret;
   usize_t         match_length;
   usize_t         match_offset = 0;
   usize_t         next_move_point;

   /* If there will be no compression due to settings or short input,
    * skip it entirely. */
   if (! (DO_SMALL || DO_LARGE || DO_RUN) ||
       stream->input_position + SLOOK > stream->avail_in) { goto loopnomore; }

   if ((ret = xd3_string_match_init (stream))) { return ret; }

   /* The restartloop label is reached when the incremental loop state
    * needs to be reset. */
  restartloop:

   /* If there is not enough input remaining for any kind of match,
      skip it. */
   if (stream->input_position + SLOOK > stream->avail_in) { goto loopnomore; }

   /* Now reset the incremental loop state: */

   /* The min_match variable is updated to avoid matching the same lazy
    * match over and over again.  For example, if you find a (small)
    * match of length 9 at one position, you will likely find a match
    * of length 8 at the next position. */
   if (xd3_iopt_last_matched (stream) > stream->input_position)
     {
       stream->min_match = max(MIN_MATCH,
 			      1 + xd3_iopt_last_matched(stream) -
 			      stream->input_position);
     }
   else
     {
       stream->min_match = MIN_MATCH;
     }

   /* The current input byte. */
   inp = stream->next_in + stream->input_position;

   /* Small match state. */
   if (DO_SMALL)
     {
       scksum = xd3_scksum (&scksum_state, inp, SLOOK);
     }

   /* Run state. */
   if (DO_RUN)
     {
       run_l = xd3_comprun (inp, SLOOK, & run_c);
     }

   /* Large match state.  We continue the loop even after not enough
    * bytes for LLOOK remain, so always check stream->input_position in
    * DO_LARGE code. */
   if (DO_LARGE && (stream->input_position + LLOOK <= stream->avail_in))
     {
       /* Source window: next_move_point is the point that
        * stream->input_position must reach before computing more
        * source checksum. */
       if ((ret = xd3_srcwin_move_point (stream, & next_move_point)))
 	{
 	  return ret;
 	}

       lcksum = xd3_lcksum (inp, LLOOK);
     }

   /* TRYLAZYLEN: True if a certain length match should be followed by
    * lazy search.  This checks that LEN is shorter than MAXLAZY and
    * that there is enough leftover data to consider lazy matching.
    * "Enough" is set to 2 since the next match will start at the next
    * offset, it must match two extra characters. */
 #define TRYLAZYLEN(LEN,POS,MAX) ((MAXLAZY) > 0 && (LEN) < (MAXLAZY) \
 				 && (POS) + (LEN) <= (MAX) - 2)

   /* HANDLELAZY: This statement is called each time an instruciton is
    * emitted (three cases).  If the instruction is large enough, the
    * loop is restarted, otherwise lazy matching may ensue. */
 #define HANDLELAZY(mlen) \
   if (TRYLAZYLEN ((mlen), (stream->input_position), (stream->avail_in))) \
     { stream->min_match = (mlen) + LEAST_MATCH_INCR; goto updateone; } \
   else \
     { stream->input_position += (mlen); goto restartloop; }

   /* Now loop over one input byte at a time until a match is found... */
   for (;; inp += 1, stream->input_position += 1)
     {
       /* Now we try three kinds of string match in order of expense:
        * run, large match, small match. */

       /* Expand the start of a RUN.  The test for (run_l == SLOOK)
        * avoids repeating this check when we pass through a run area
        * performing lazy matching.  The run is only expanded once when
        * the min_match is first reached.  If lazy matching is
        * performed, the run_l variable will remain inconsistent until
        * the first non-running input character is reached, at which
        * time the run_l may then again grow to SLOOK. */
       if (DO_RUN && run_l == SLOOK)
 	{
 	  usize_t max_len = stream->avail_in - stream->input_position;

 	  IF_DEBUG (xd3_verify_run_state (stream, inp, run_l, run_c));

 	  while (run_l < max_len && inp[run_l] == run_c) { run_l += 1; }

 	  /* Output a RUN instruction. */
 	  if (run_l >= stream->min_match && run_l >= MIN_RUN)
 	    {
 	      if ((ret = xd3_emit_run (stream, stream->input_position,
 				       run_l, run_c))) { return ret; }

 	      HANDLELAZY (run_l);
 	    }
 	}

       /* If there is enough input remaining. */
       if (DO_LARGE && (stream->input_position + LLOOK <= stream->avail_in))
 	{
 	  if ((stream->input_position >= next_move_point) &&
 	      (ret = xd3_srcwin_move_point (stream, & next_move_point)))
 	    {
 	      return ret;
 	    }

 	  linx = xd3_checksum_hash (& stream->large_hash, lcksum);

 	  IF_DEBUG (xd3_verify_large_state (stream, inp, lcksum));

 	  if (stream->large_table[linx] != 0)
 	    {
 	      /* the match_setup will fail if the source window has
 	       * been decided and the match lies outside it.  You
 	       * could consider forcing a window at this point to
 	       * permit a new source window. */
 	      xoff_t adj_offset =
 		xd3_source_cksum_offset(stream,
 					stream->large_table[linx] -
 					HASH_CKOFFSET);
 	      if (xd3_source_match_setup (stream, adj_offset) == 0)
 		{
 		  if ((ret = xd3_source_extend_match (stream)))
 		    {
 		      return ret;
 		    }

 		  /* Update stream position.  match_fwd is zero if no
 		   * match. */
 		  if (stream->match_fwd > 0)
 		    {
 		      HANDLELAZY (stream->match_fwd);
 		    }
 		}
 	    }
 	}

       /* Small matches. */
       if (DO_SMALL)
 	{
 	  sinx = xd3_checksum_hash (& stream->small_hash, scksum);

 	  /* Verify incremental state in debugging mode. */
 	  IF_DEBUG (xd3_verify_small_state (stream, inp, scksum));

 	  /* Search for the longest match */
 	  if (stream->small_table[sinx] != 0)
 	    {
 	      match_length = xd3_smatch (stream,
 					 stream->small_table[sinx],
 					 scksum,
 					 & match_offset);
 	    }
 	  else
 	    {
 	      match_length = 0;
 	    }

 	  /* Insert a hash for this string. */
 	  xd3_scksum_insert (stream, sinx, scksum, stream->input_position);

 	  /* Maybe output a COPY instruction */
 	  if (match_length >= stream->min_match)
 	    {
 	      IF_DEBUG1 ({
 		static int x = 0;
 		DP(RINT "[target match:%d] <inp %u %u>  <cpy %u %u> "
 		   "(-%d) [ %u bytes ]\n",
 		   x++,
 		   stream->input_position,
 		   stream->input_position + match_length,
 		   match_offset,
 		   match_offset + match_length,
 		   stream->input_position - match_offset,
 		   match_length);
 	      });

 	      if ((ret = xd3_found_match (stream,
 					  /* decoder position */
 					  stream->input_position,
 					  /* length */ match_length,
 					  /* address */ match_offset,
 					  /* is_source */ 0)))
 		{
 		  return ret;
 		}

 	      /* Copy instruction. */
 	      HANDLELAZY (match_length);
 	    }
 	}

       /* The logic above prevents excess work during lazy matching by
        * increasing min_match to avoid smaller matches.  Each time we
        * advance stream->input_position by one, the minimum match
        * shortens as well.  */
       if (stream->min_match > MIN_MATCH)
 	{
 	  stream->min_match -= 1;
 	}

     updateone:

       /* See if there are no more incremental cksums to compute. */
       if (stream->input_position + SLOOK == stream->avail_in)
 	{
 	  goto loopnomore;
 	}

       /* Compute next RUN, CKSUM */
       if (DO_RUN) { NEXTRUN (inp[SLOOK]); }
       if (DO_SMALL)
 	{
 	  scksum = xd3_small_cksum_update (&scksum_state, inp, SLOOK);
 	}
       if (DO_LARGE && (stream->input_position + LLOOK < stream->avail_in))
 	{
 	  lcksum = xd3_large_cksum_update (lcksum, inp, LLOOK);
 	}
     }

  loopnomore:
   return 0;
 }
 #endif /* XD3_ENCODER */
 #endif /* __XDELTA3_C_TEMPLATE_PASS__ */