Penalizing unknown words’ emissions in hmm pos tagger based on Malay affix morphemes

H. Mohamed; N. Omar; M.J.A. Aziz

doi:10.4314/jfas.v9i3s.36

download PDF

Published:

Jan 22, 2018

DOI:

10.4314/jfas.v9i3s.36

Keywords:

Malay POS tagger morpheme-based HMM

Issue

Vol. 9 No. 3S (2017): Special Issue

Section

Articles

The copyright belongs to the journal.

H. Mohamed

N. Omar

M.J.A. Aziz

Abstract

The challenge in unsupervised Hidden Markov Model (HMM) training for a POS tagger is
that the training depends on an untagged corpus; the only supervised data limiting possible tagging of words is a dictionary. Therefore, training cannot properly map possible tags. The exact morphemes of prefixes, suffixes and circumfixes in the agglutinative Malay language is examined to assign unknown words’ probable tags based on linguistically meaningful affixes using a morpheme-based POS guessing algorithm for tagging. The algorithm has been integrated into Viterbi algorithm which uses HMM trained parameters for tagging new sentences. In the experiment, this tagger is first, uses character-based prediction to handle unknown words; next, uses morpheme-based POS guessing algorithm; lastly, combination of the first and second.

Keywords: Malay POS tagger; morpheme-based; HMM.

Journal of Fundamental and Applied Sciences
Journal / Journal of Fundamental and Applied Sciences / Vol. 9 No. 3S (2017): Special Issue / Articles

Published:

DOI:

Keywords:

Penalizing unknown words’ emissions in hmm pos tagger based on Malay affix morphemes

H. Mohamed

N. Omar

M.J.A. Aziz

Abstract

Journal Identifiers

Article Sidebar

Published:

DOI:

Keywords:

Article Details

Main Article Content

H. Mohamed

N. Omar

M.J.A. Aziz

Abstract

Journal Identifiers