Next-Scale Autoregressive Models for Text-to-Motion Generation
arXiv:2604.03799v1 Announce Type: new
Abstract: Autoregressive (AR) models offer stable and efficient training, but standard next-token prediction is not well aligned with the temporal structure required for text-conditioned motion generation. We intr…