cs.CL

English to Central Kurdish Speech Translation: Corpus Creation, Evaluation, and Orthographic Standardization

arXiv:2604.00613v1 Announce Type: new
Abstract: We present KUTED, a speech-to-text translation (S2TT) dataset for Central Kurdish, derived from TED and TEDx talks. The corpus comprises 91,000 sentence pairs, including 170 hours of English audio, 1.65 …