Pashto Common Voice: Building the First Open Speech Corpus for a 60-Million-Speaker Low-Resource Language
arXiv:2603.27021v1 Announce Type: new
Abstract: We present the Pashto Common Voice corpus — the first large-scale, openly licensed speech resource for Pashto, a language with over 60 million native speakers largely absent from open speech technology….