Citation: M. M. Hasan, M. A. Islam, S. Kibria and M. S. Rahman, “Towards Lexicon-free Bangla Automatic Speech Recognition System,” 2019 International Conference on Bangla Speech and Language Processing (ICBSLP), Sylhet, Bangladesh, 2019, pp. 1-6, doi: 10.1109/ICBSLP47725.2019.201544.
Abstract: This article presents a lexicon-free Automatic Speech Recognition (ASR) system for the Bangla language and investigates an open-source large Bangla ASR corpus, which proved by OpenSLR. The model has been trained using improved MFCC acoustic features with a deep LSTM as an acoustic model. We have tried two types of decoding techniques in the decoding or the last part of the ASR; one is using a joint decoder of Connectionist Temporal Classification (CTC) and a statistical Language Model (LM) for beam decoding, and another is CTC based greedy decoding. We have trained and investigated the performance of our ASR with non-augmented speech as an input. The achieved results are outstanding compares to the results obtained from past researches that have used the End-to-End approaches for Bangla ASR. On the test dataset, our End-to-End system has obtained different results using two distinct decoders. The obtained results are 39.61% WER and 18.50% CER using the greedy de-coder and 27.89% WER and 12.31% CER, which are a little bit improved results, using the beam decoder. This achievement is state of the art for continuous Bangla ASR.