Continuous Bengali Speech Recognition Based On Deep Neural Network

Citation: M. A. A. Amin, M. T. Islam, S. Kibria and M. S. Rahman, “Continuous Bengali Speech Recognition Based On Deep Neural Network,” 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox’sBazar, Bangladesh, 2019, pp. 1-6, doi: 10.1109/ECACE.2019.8679341.

Abstract: Nowadays, deep learning is the most reliable approaches in the field of speech recognition to do the Acoustic modeling. Working with a language like “Bengali” that is not very resource-rich in terms of availability of parallel data (i.e. speech with aligned text) is a challenging problem. Also, there are lots of approaches going with deep learning to achieve better performance in Bengali Language without benchmarking a specific corpus. So, the achieved results are biased. In this paper, DNN-HMM and GMM-HMM based models have been used, which have been implemented in Kaldi toolkit, for continuous Bengali speech recognition benchmarking on a standard and publicly published corpus called SHRUTI. Previously, the best word error rate (WER) had been achieved on SHRUTI was 15% using eMU-SPHINX based GMM-HMM and this study has been shown that using Kaldi based feature extraction recipes with DNN-HMM and GMM-HMM acoustic models have achieved performances WER 0.92% and WER 2.02% respectively. Another finding of this study is, the WERs of both models are very close because the size of the corpus is small.