So like, mp3, gzip and zstd? Why would you use a LLM for compression??
The research specifically looked at lossless algorithms, so gzip
“For example, the 70-billion parameter Chinchilla model impressively compressed data to 8.3% of its original size, significantly outperforming gzip and LZMA2, which managed 32.3% and 23% respectively.”
However they do say that it’s not especially practical at the moment, given that gzip is a tiny executable compared to the many gigabytes of the LLM’s dataset.
Do you need the dataset to do the compression? Is the trained model not effective on its own?
Well from the article a dataset is required, but not always the heavier one.
Tho it doesn’t solve the speed issue, where the llm will take a lot more time to do the compression.
gzip can compress 1GB of text in less than a minute on a CPU, an LLM with 3.2 million parameters requires an hour to compress
I wonder how consistent is the decompression and how much information is lost in the process.