An Experiment In Data Compression
There are a finite number of compression algorithms. One strategy is to catalog differences in consecutive values and compress a dictionary of the differences.
A simple experiment to perform in Rust or C++20, is as follows.
1. Read the file as an array of unsigned int (16 bits each.) To do this, it is important to understand encoding. I avoid Microsoft's Unicode encoding. An alternative is to read the file in 8-bit bytes, and concatenate every two-byte pair into an array of unsigned int. The numeric value represented is not the purpose. Declare a large enough array space in RAM to accommodate some max_size value.
2. delta_encode the array. There is standardized code to do this. Python3 has a function in module numpy.
3. Golomb-encode the resulting dictionary or array. There is standardized code to do this. This falls into the category of run-length encoding.
Viewing pixels from a raw photo format or an mp3 as unsigned int is the innovation suggested.
4. Write the result in a file with a ".gcf" compression extension.
5. If the file is already compressed, Golomb-decode first, then delta-decode, dropping any null byte at the end to pad an odd non-pair byte.
6. Save as original file, deleting ".gcf"
This is my attempt to be novel at the compression game.
Comments
Post a Comment