How Gzip and K-Nearest Neighbors Can Outperform Deep Learning Models
Thank you, Christoph, for this very good explanation.
In particular, I love how you make a concerted effort to also give the intuition behind the points you presented. All too often, it's just "well, the math says X" - but many people have a hard time understanding what X really means, and then what it's implications are, and so on.
I love this connection between compression and prediction, it gives another lens to understand how these models work.
Shamless plug but I covered the Gzip + Knn paper and Language Modeling is Compression papers in two of my articles.