Temporal Contrastive Decoding: A Training-Free Method for Large Audio-Language Models
arXiv:2604.15383v1 Announce Type: cross
Abstract: Large audio-language models (LALMs) generalize across speech, sound, and music, but unified decoders can exhibit a \emph{temporal smoothing bias}: transient acoustic cues may be underutilized in favor …