/u/Equivalent-Gas2856

Language-model-based compression for Python source using n-grams + arithmetic coding (~33% better than zlib on Flask) [P]

/u/Equivalent-Gas2856 / May 3, 2026

I’ve been experimenting with language-model-based compression for source code, using a simple n-gram model combined with arithmetic coding. ill make a repo soo… The setup is straightforward: tokenize Python source, estimate P(xt∣xt−n+1:t−1)P(x_…

Author name: /u/Equivalent-Gas2856

Language-model-based compression for Python source using n-grams + arithmetic coding (~33% better than zlib on Flask) [P]