Citation:J. Rahman Saurav, S. Amin, S. Kibria and M. Shahidur Rahman, “Bangla Speech Recognition for Voice Search,” 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), Sylhet, 2018, pp. 1-4, doi: 10.1109/ICBSLP.2018.8554944.
Abstract: In this work, different Gaussian Mixture Model-Hidden Markov Model(GMM-HMM) based and Deep Neural Network (DNN-HMM) based models have been analyzed for speech recognition in Bangla language to build a voice search module for search engine pipilika . A small corpus of 9 hours of speech recordings from 49 different speakers was prepared for this work consisting of a vocabulary of 500 unique words. The lowest Word Error Rate(WER) for (GMM-HMM) based model was 3.96% and for (DNN-HMM) based model was 5.30%. To our best knowledge, this is the lowest WER for Bangla speech recognition for such vocabulary size.