Main Article Content
Lexical and sub-lexical frequencies in isiXhosa-medium children’s stories
Abstract
Visual word recognition in developing readers is affected by several factors, many of which are important in the design of teaching and testing materials. Among these, one well-studied factor is word frequency: more frequent words are recognised more quickly than less frequent words. Another is the presence of consonant sequences, where words containing a consonant sequence (e.g. ‘tsh’) are recognised more slowly than non-consonant-sequence-containing words of equivalent letter and phoneme length. In isiXhosa, there are currently no publicly available word frequency norms that are based on children’s literacy materials and are thus appropriate for the testing of developing readers. Publicly available information on the frequencies of particular consonant sequences, which would be valuable in the design of classroom materials, is also lacking. To address these gaps, this paper reports on the creation of a corpus of isiXhosa texts aimed at children (321 texts, 125,447 tokens, and 31,216 types), which was subsequently processed to derive (i) word frequency measures, (ii) descriptive statistics about the characteristics of the most frequent words, and (iii) information on sublexical frequency, with a focus on two-, three-, and four-consonant sequences. The findings of the article are relevant to scholars and practitioners working on literacy development and language processing in isiXhosa.