Hufint

De Lillois Fractale Wiki
Aller à la navigation Aller à la recherche

Purpose

A Hufint is a variable-length-coded integer, with an optimal compression perspective.

Hufint stands for Huffman integer, or Huffamn coded integer. It is related to the general Huffman coding method.

The contract of this coding tool is:

  • allow to encode any positive integer (including 0) a a set of bits
  • allow a non-ambiguous decoding algorithm
  • use less bits for smaller integer
  • allow non-ambiguous encoding/decoding for arrays of integers.

Use

Hufint are used in the Matscape projects, wherever

  • integer values are likely to be small (example: number of child in a chess tree)
  • integer values are allowed to be large (but rather seldom)
  • integer statistical distribution is decreasing
  • encoding size is critical
  • speed for math operations on the values is not critical

Anastomochess is a project where hufints might help a lot.

Optimality

It may be shown (but the demonstration is left to a brave subject) that the Hufint coding is optimal when integer densities are distributed as the inverse of the logarithm of their values...

P(n) = k/log(n)

It may also be tested experimentally.

Hufint slice and base

The Hufint slice is the slice size for the Hufint bitset representation.

2slice-1 is then called Hufint base.

Matscape uses Hufint-6 representations (Hufint slice = 6, Hufint base = 32).

  • numbers from 0 to base-1 (31) are represented on 6 bits
  • numbers from base (32) to base2-1 (1023) are represented on 12 bits
  • numbers from base2 (1024) to base3-1 (32767) are represented on 18 bits
  • etc...

Algorithm

The algorithm is rather starightforward but not documented here.

At least two public functions (in a java lib) are defined:

int f(BitSet) /* conversion of bitSet into a positive integer - decoding */

BitSet f(int) /* conversion of a positive integer into bitSet - encoding */ 

Similar functions are available for arrays of integers.
Various other convenience functions aredefined for reporting and checking.