Main Article Content

Amharic Language Visual Speech Recognition using Hybrid Features


Zelalem Tamrie

Abstract

Lip motion reading is a process of knowing the words spoken from a video with or without an audio signal by observing the motion of the lips of the speaker. In the previous studies its accuracy is limited because of not applying appropriate image enhancement methods and the algorithms used for feature extraction and feature vector generation. In the present study, we propose automatic visual speech recognition machine learning and  computervision techniques for Amharic language lip motion reading. The objective of the study to improve the existing Amharic lip motion reading  and the performance of speech recognition systems operating in noisy environments. The collected the video of Amharic speech by recording  directly using mobile devices. In this study 14 Amharic words that are frequently talked by patients or health professional in the hospital were  recorded. The total number of patients used for the study were 1260 (945 for training and 315 for testing our proposed model. To extract the  features, we used Convolutional Neural Networks (CNN), Histogram of Oriented Gradients (HOG) and their combination methods were employed so  as to extract the features. We feed these features to random forest independently and with combination to recognize the spoken word. Each of  these features were tested by using precision, recall and fl-score classifiers for measuring the performance of our model and to compare the  accuracy of our model with previous related works.Our model system records 66.03%, 75.24% and 76.51% accuracy on HOG, CNN and combined  features (random forest), respectively.


Journal Identifiers


eISSN: 2616-4728
print ISSN: 2616-471X