Main Article Content
Towards the sense disambiguation of Afan Oromo words using hybrid approach (unsupervised machine learning and rule based)
Abstract
This study was conducted to investigate Afan Oromo Word Sense Disambiguation which is a technique in the field of Natural Language Processing where the main task is to find the appropriate sense in which ambiguous word occurs in a particular context. A word may have multiple senses and the problem is to find out which particular sense is appropriate in a given context. Hence, this study presents a Word Sense Disambiguation strategy which combines an unsupervised approach that exploits sense in a corpus and manually crafted rule. The idea behind the approach is to overcome a bottleneck of training data. In this study, the context of a given word is captured using term co-occurrences within a defined window size of words. The similar contexts of a given senses of ambiguous word are clustered using hierarchical and partitional clustering. Each cluster representing a unique sense. Some ambiguous words have two senses to the five senses. The optimal window sizes for extracting semantic contexts is window 1 and 2 words to the right and left of the ambiguous word. The result argued that WSD yields an accuracy of 56.2% in Unsupervised Machine learning and 65.5% in Hybrid Approach. Based on this, the integration of deep linguistic knowledge with machine learning improves disambiguation accuracy. The achieved result was encouraging; despite it is less resource requirement. Yet; further experiments using different approaches that extend this work are needed for a better performance.
Keywords: Afan Oromo, Ambiguous Word, Hybrid, Rule Based, Word Sense
Disambiguation