Teaching Metric Distance to Discrete Autoregressive Language Models
arXiv:2503.02379v5 Announce Type: replace
Abstract: Large language models (LLMs) operate as autoregressive predictors over discrete token vocabularies, a formulation that has enabled their adaptation far beyond natural language to vision, robotics, an…