Main Article Content

DNN-based Multilingual Acoustic Modeling for Four Ethiopian Languages


Solomon Teferra Abate
Martha Yifiru Tachbelie
Tanja Schultz

Abstract

In this paper, we present the results of experiments conducted on multilingual acoustic modeling in the development of an Automatic Speech Recognition (ASR) system using speech data of phonetically much related Ethiopian languages (Amharic, Tigrigna, Oromo and Wolaytta) with multilingual (ML) mix and multitask approaches. The use of speech data from only phonetically much related languages brought improvement over results reported in a previous work that used 26 languages (including the four languages). A maximum Word Error Rate (WER) reduction from 25.03% (in the previous work) to 21.52% has been achieved for Wolaytta, which is a relative WER reduction of 14.02%. As a result of using multilingual acoustic modeling for the development of an automatic speech recognition (ASR) system, a relative WER reduction of up to 7.36% (a WER reduction from 23.23% to 21.52%) has been achieved over a monolingual ASR. Compared to the ML mix, the multitask approach brought a better performance improvement (a relative WERs reduction of up to 5.9%). Experiments have also been conducted using Amharic and Tigrigna in a pair and Oromo and Wolaytta in another pair. The results of the experiments showed that languages with a relatively better language resources for lexical and language modeling (Amharic and Tigrigna) benefited from the use of speech data from only two languages. Generally, the findings show that the use of speech corpora of phonetically related languages with the multitask multilingual modeling approach for the development of ASR systems for less-resourced languages is a promising solution.


Journal Identifiers


eISSN: 2520-7997
print ISSN: 0379-2897