Joseph K. Myers

Thursday, June 27, 2002

Primer, Primed

Prime numbers are very unpredictable, and provide a serious challenge to compression algorithms.

The function of the primer tools is to reduce prime number lists to the minimum necessary for storage. The list 2, 3, 5, 7 would become 2, 1, 2, 2; recovery would add each number in turn to the previous--the second number becomes 3 (1 + 2), the third becomes 5 (3 + 2), etc.

A recommended method for storage of these lists is simply one prime per line.

Prime number generators:

I recommend primegen by D. J. Bernstein (see http://cr.yp.to/primegen.html)

Prime number lists:

The 82,025 prime numbers from 2 - 2^20 are listed in primer format in ../math/primes-M.gz.

Usage:

primes ... | primed

primer < input

For example, you would read the prime numbers in primes-M with this command:

primer < primes-M

Performance:

Although it is a lack-luster process in terms of complexity, the space-saving benefits of primer are very real. The file primes-M would ordinarily consume 566,684 bytes; after primed's difference-reduction, the size is 209,127, and only 49,067 after gzip -9. Note that further compression is enhanced _after_ the primer process, rather than before. Mere gzip -9 on the original file is 64.8% (566,684 in, 199,119 out); afterwards, the ratio is 76.5%, (209,127 in, 49,067 out).

Note that the overall compression in this example was 91.3% (100* (1 - out/in)).

Download:

primer.tar.gz (388 bytes)

http://www.myersdaily.org/joseph/unix/primer.txt