On the Predictive Power of Representation Dispersion in Language Models
arXiv:2506.24106v2 Announce Type: replace
Abstract: We show that a language model’s ability to predict text is tightly linked to the breadth of its embedding space: models that spread their contextual representations more widely tend to achieve lower …