An encoder-decoder based grapheme-to-phoneme converter for Bangla speech synthesis

Arif Ahmad, Mohammad Reza Selim, Muhammed Zafar Iqbal, Mohammad Shahidur Rahman, An encoder-decoder based grapheme-to-phoneme converter for Bangla speech synthesis, Acoustical Science and Technology, 2019, Volume 40, Issue 6, Pages 374-381, Released November 01, 2019, Online ISSN 1347-5177, Print ISSN 1346-3969.



This paper proposes an encoder-decoder based sequence-to-sequence model for Grapheme-to-Phoneme (G2P) conversion in Bangla (Exonym: Bengali). G2P models are key components in speech recognition and speech synthesis systems as they describe how words are pronounced. Traditional, rule-based models do not perform well in unseen contexts. We propose to adopt a neural machine translation (NMT) model to solve the G2P problem. We used gated recurrent units (GRU) recurrent neural network (RNN) to build our model. In contrast to joint-sequence based G2P models, our encoder-decoder based model has the flexibility of not requiring explicit grapheme-to-phoneme alignment which are not straight forward to perform. We trained our model on a pronunciation dictionary of (approximately) 135,000 entries and obtained a word error rate (WER) of 12.49% which is a significant improvement from the existing rule-based and machine-learning based Bangla G2P models.