Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling
arXiv:2602.00594v2 Announce Type: replace
Abstract: A good language model starts with a good tokenizer. Tokenization is especially important for speech modeling, which must handle continuous signals that mix linguistic and non-linguistic information. …