Main Article Content
Tokenization rules for the disjunctively written verbal segment of Northern Sotho
Abstract
This article describes the tokenization rules required to analyse the disjunctively written verbal segmentof Northern Sotho correctly. The purpose of such a tokenizer is to isolate verbal segments from runningtext prior to being analysed. The disjunctive elements of the verbal segment that are discussed in thisarticle and for which generic tokenization rules are proposed, are the following: subject and objectconcords, the potential marker, negative markers, tense markers and aspect prefixes. The position ofeach element in a sequence of pre-verbal elements is determined and the collocation restrictions thatapply to certain elements are described and incorporated into the tokenization rules. The rules describedin this article have already been implemented in a prototype tokenizer that is currently being tested.