Main Article Content

Lexical and sub-lexical frequencies in isiXhosa-medium children’s stories


Robyn Berghoff

Abstract

Visual word recognition in developing readers is affected by several factors, many of which are important in the design of teaching and  testing materials. Among these, one well-studied factor is word frequency: more frequent words are recognised more quickly than less  frequent words. Another is the presence of consonant sequences, where words containing a consonant sequence (e.g. ‘tsh’) are  recognised more slowly than non-consonant-sequence-containing words of equivalent letter and phoneme length. In isiXhosa, there are  currently no publicly available word frequency norms that are based on children’s literacy materials and are thus appropriate for the  testing of developing readers. Publicly available information on the frequencies of particular consonant sequences, which would be  valuable in the design of classroom materials, is also lacking. To address these gaps, this paper reports on the creation of a corpus of  isiXhosa texts aimed at children (321 texts, 125,447 tokens, and 31,216 types), which was subsequently processed to derive (i) word  frequency measures, (ii) descriptive statistics about the characteristics of the most frequent words, and (iii) information on sublexical  frequency, with a focus on two-, three-, and four-consonant sequences. The findings of the article are relevant to scholars and  practitioners working on literacy development and language processing in isiXhosa. 


Journal Identifiers


eISSN: 1727-9461
print ISSN: 1607-3614