Main Article Content
Building a named entity recognition model for Ethiopian languages: a comparative analysis of composite feature embedding
Abstract
In this study, we propose a deep learning NER model that effectively represents word tokens through a combinatorial feature embedding design. We conducted a comparative analysis with existing models for Ethiopian languages. The word vectors created for all tokens using an unsupervised learning algorithm are merged with a set of language-independent features specifically developed for this purpose. These combined features are then fed into a neural network model to predict word classes. Empirical results obtained from the Ethiopian lan-guage dataset demonstrate that incorporating character-level word embeddings along with other features in BiLSTM-CRF models yields state-of-the-art performance. In addition to showing the model's ability to generalize to different languages, we evaluated its perfor-mance and achieved remarkable accuracy rates: 92.88% and 82.35% on the AM_NER and Oro_NER datasets, respectively.