A Sequence-to-Sequence Pronunciation Model for Bangla Speech Synthesis


A. Ahmad, M. Raihan Hussain, M. Reza Selim, M. Zafar Iqbal and M. Shahidur Rahman, “A Sequence-to-Sequence Pronunciation Model for Bangla Speech Synthesis,” 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), Sylhet, 2018, pp. 1-4.



Extracting pronunciation from written text is necessary in many application areas, especially in text-to-speech synthesis. `Bangla’ is not completely a phonetic language, meaning there is not always direct mapping from orthography to pronunciation. It mainly suffers from `schwa deletion’ problem, along with some other ambiguous letters and conjuncts. Rule-based approaches cannot completely solve this problem. In this paper, we propose to adopt an Encoder-Decoder based neural machine translation (NMT) model for determining pronunciations of Bangla words. We mapped the pronunciation problem into a sequence-to-sequence problem and used two `Gated Recurrent Unit Recurrent Neural Network’s (GRU-RNNs) for our model. We fed the model with two types of input data. In one model we used `raw’ words and in other model we used `pre-processed’ words (normalized by hand-written rules) as input. Both experiments showed promising results and can be used in any practical application.