An Experiment In Data Compression

There are a finite number of compression algorithms. One strategy is to catalog differences in consecutive values and compress a dictionary of the differences.

A simple experiment to perform in Rust or C++20, is as follows.

1. Read the file as an array of unsigned int (16 bits each.) To do this, it is important to understand encoding. I avoid Microsoft's Unicode encoding. An alternative is to read the file in 8-bit bytes, and concatenate every two-byte pair into an array of unsigned int. The numeric value represented is not the purpose. Declare a large enough array space in RAM to accommodate some max_size value.

2. delta_encode the array. There is standardized code to do this. Python3 has a function in module numpy.

3. Golomb-encode the resulting dictionary or array. There is standardized code to do this. This falls into the category of run-length encoding.

Viewing pixels from a raw photo format or an mp3 as unsigned int is the innovation suggested. 

4. Write the result in a file with a ".gcf" compression extension.

5. If the file is already compressed, Golomb-decode first, then delta-decode, dropping any null byte at the end to pad an odd non-pair byte.

6. Save as original file, deleting ".gcf"

This is my attempt to be novel at the compression game.

Comments

Popular posts from this blog

A Question About Erasthmus' Sieve

An Improvement To The Three Second Hold Rule

Notice of corrupted results: Vigenere may yet be found to be a "group."