Main Article Content
Amharic Language Visual Speech Recognition using Hybrid Features
Abstract
Lip motion reading is a process of knowing the words spoken from a video with or without an audio signal by observing the motion of the lips of the speaker. In the previous studies its accuracy is limited because of not applying appropriate image enhancement methods and the algorithms used for feature extraction and feature vector generation. In the present study, we propose automatic visual speech recognition machine learning and computervision techniques for Amharic language lip motion reading. The objective of the study to improve the existing Amharic lip motion reading and the performance of speech recognition systems operating in noisy environments. The collected the video of Amharic speech by recording directly using mobile devices. In this study 14 Amharic words that are frequently talked by patients or health professional in the hospital were recorded. The total number of patients used for the study were 1260 (945 for training and 315 for testing our proposed model. To extract the features, we used Convolutional Neural Networks (CNN), Histogram of Oriented Gradients (HOG) and their combination methods were employed so as to extract the features. We feed these features to random forest independently and with combination to recognize the spoken word. Each of these features were tested by using precision, recall and fl-score classifiers for measuring the performance of our model and to compare the accuracy of our model with previous related works.Our model system records 66.03%, 75.24% and 76.51% accuracy on HOG, CNN and combined features (random forest), respectively.