Main Article Content

Twitter sentiment analysis for Hausa abbreviations and acronyms


Habeeba Ibraheem Abdullahi
Muhammad Aminu Ahmad
Khalid Haruna

Abstract

The use of natural language processing, to identify, extract and organize sentiment from user generated texts in social networks, blogs or product review of text is known as sentiment analysis or opinion mining. Hausa language belongs to one of the major well-spoken languages in Africa and one of the three major Nigerian languages. Now investigating into such a language will have significant influence on social, economic business political and even educational services and settings. Some of these Hausa texts are abbreviated and some in acronym format which is a challenge to researchers as such comments are in an unstructured format and needs normalization to get further understanding of that text and also there is scarcity of sentiment analysis on Hausa abbreviation and acronym. Abbreviation is a shorten form of a word while acronym is an abbreviation formed from the initial letters of other words and pronounced as a word. This research aims to develop an improved Hausa Sentiment Dataset for the enhancement of sentiment analysis with abbreviation and acronyms. This is achieved by adapting to the approach for Hausa Sentiment Analysis based on Multinomial Naïve Bayes (MNB) and Logistic Regression algorithms using the count vectorizer, along with python libraries for NLP. This research affirmed that the improved dataset with abbreviation and acronym outperforms the plain Hausa dataset by 4% in accuracy using Multinomial Naïve Bayes. The result shows that in addition to normal preprocessing techniques of the social media stream, understanding, interpreting and resolving ambiguity in the usage of abbreviations and acronyms lead to improved accuracy of algorithms with evidence in the experimental result.


Journal Identifiers


eISSN: 1597-6343
print ISSN: 2756-391X