
BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language
Researchers have released BaltiVoice, the first public automatic speech recognition dataset and model for Balti, a Tibetic language with roughly 100,000 speakers in Pakistan. By fine-tuning OpenAI's Whisper on 16.8 hours of validated audio, the team reduced word error rates from 182% (zero-shot baseline) to 30%, demonstrating how modest-scale language-specific corpora can unlock speech AI for underserved communities. The open release on HuggingFace signals growing momentum in democratizing ASR beyond high-resource languages, though the remaining 30% error rate underscores the gap between frontier models and production-ready systems for low-resource settings.58

























