Main Article Content
A corpus-based survey of four electronic swahili–english bilingual dictionaries
Abstract
In this article we survey four different electronic bilingual dictionaries for the lan-guage pair Swahili–English. Aided by a data-driven morphological analyzer and part-of-speech tagger, we quantify the coverage of the dictionaries on large monolingual corpora of Swahili. In a second series of experiments, we investigate how applicable the dictionaries are as a tool in the development of a machine translation system, by evaluating bilingual coverage on the parallel SAWA corpus. At the same time we attempt to consolidate the dictionaries into a unified lexico-graphic database and compare the coverage to that of its composite parts.
Keywords: lexicography, evaluation, morphology, lemmatization, parallel corpora, machine learning, machine translation, swahili (kiswahili), english
Keywords: lexicography, evaluation, morphology, lemmatization, parallel corpora, machine learning, machine translation, swahili (kiswahili), english