cs.CL, cs.SD, eess.AS

Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling

arXiv:2602.00594v2 Announce Type: replace
Abstract: A good language model starts with a good tokenizer. Tokenization is especially important for speech modeling, which must handle continuous signals that mix linguistic and non-linguistic information. …