Main Article Content

Spoken language corpora for the nine official African languages of South Africa


Jens Allwood
AP Hendrikse

Abstract

In this paper we give an outline of a corpus planning project which aims to develop linguistic resources for the nine official African languages of South Africa in the form of corpora, more specifically spoken language corpora. In the course of the article, we will address issues such as spoken language vs. written language, register vs. activity and
normative vs. non-normative approaches to corpus planning. We then give an
outline of the design of a spoken language corpus for the nine official African
languages of South Africa. We consider issues such as representativity and
sampling (urban–rural, dialects, gender, social class and activities),
transcription standards and conventions as well as the problems emanating from
widespread loans and code switching and other forms of language mix
characteristic of spoken language. Finally, we summarise the status of the
project at present and plans for the future.

Southern African Linguistics and
Applied Language Studies 2003, 21(4): 189–201

Journal Identifiers


eISSN: 1727-9461
print ISSN: 1607-3614