Amharic Speech Recognition Using Joint Transformer and Connectionist Temporal Classification with Character-Based and Sub-word-Based Acoustic and Language Models

Alemayehu Yilma Demisse; Bisrat Derebssa  Dufera

download PDF

Published:

Oct 28, 2024

Keywords:

Amharic, ASR, CTC, LMs, RNNs, Transformer,

Issue

Vol. 42 No. 1 (2024)

Section

Articles

Alemayehu Yilma Demisse

Bisrat Derebssa Dufera

Abstract

Sequence-to-sequence attention-based models have gained considerable attention in recent times for automatic speech recognition (ASR). The transformer architecture has been extensively employed for a variety of sequence-to-sequence transformation problems, including machine translation and ASR. This architecture avoids sequential computation that is used in recurrent neural networks and leads to improved iteration rate during the training phase. Connectionist temporal classification, on the other hand, is widely employed to accelerate the convergence of the sequenceto-sequence model by explicitly learning a better alignment between the input speech feature and output label sequences. Amharic language, a Semitic language spoken by 57.5 million people in Ethiopia, is a morphologically rich language that poses a challenge for continuous speech recognition as a root word can be conjugated and inflected into thousands of words to reflect subject, object, tense and quantity. In this research, the connectionist temporal classification is integrated with the transformer for continuous Amharic speech recognition. A suitable acoustic modeling unit for Amharic speech recognition system is also investigated by utilizing characterbased and sub word-based models. The results show that a best character error rate of 8.04 % for the character-based model with character-level language model (LM) and a best word error rate of 22.31 % for the sub word-based model with sub word-level LM.

Zede Journal
Journal / Zede Journal / Vol. 42 No. 1 (2024) / Articles

Published:

Keywords:

Amharic Speech Recognition Using Joint Transformer and Connectionist Temporal Classification with Character-Based and Sub-word-Based Acoustic and Language Models

Alemayehu Yilma Demisse

Bisrat Derebssa Dufera

Abstract

Journal Identifiers

Article Sidebar

Published:

Keywords:

Article Details

Main Article Content

Alemayehu Yilma Demisse

Bisrat Derebssa Dufera

Abstract

Journal Identifiers