I can’t believe text normalization is so underdiscussed in streaming text-to-speech [D]
Kinda suprises me how little discussion there is around about mistakes in streaming TTS models People look for natural readers, high voice quality, expressive speech. And most models don't look dumb here and fail. They fail when you give them basic…