Main Article Content
Multiple protein-domain conservation architecture as a non-deterministic confounder of linear B cell epitopes
Abstract
Epitope prediction is a critical step to diagnostic and vaccine discovery. Despite existence of some parameters for epitope discovery, this area remains inconclusive and wanting-for new complementary or stand-alone tools. The phenomenon of multiple protein-domain conservation architecture (MPDCA) as used here refers to homologous motifs unveiled by multiple sequence alignments across strain-variants of the same protein aside of the conserved domains (CD) present within the same super family. Unpublished data suggests that MPDCA might be a confounder of epitope necessitating further investigation as a predictor of the same. The ease of determining MPDCA is appealing when considering protein-analysis; specifically epitope discovery. This study aimed to validate MPDCA as a predictive confounder of epitope. Using two-sets of surface viral glycoproteins of human immunodeficiency virus type I, HIV-1 (gp120) and Ebola virus, EBOV (gp1,2 preprotein) (selected because their CD-architecture has widely been studied, their sequences are available in public databases, and the same are well annotated), the MPDCAs among three different virus-strains in each-set, were compared to epitopes predicted by established tools (Bipred and DiscoTope). 4/6 (66.6%) of the linear epitopes confounded MPDCA, with 3/6 (50%) of these MPDCA’s confounding with the predicted linear epitopes (LE) at identities of > 50%, when compared to just 3/6 (50%) of the discontinuous epitopes (DE) that confounded with MPDCA at a < 50% identity. MPDCA is a non-deterministic confounder of Linear B cell epitopy. There is no causal relationship between the two, much as there is an evident co-occurrence. Therefore, MPDCA cannot accurately be used as an additional parameter to predict linear and or non-linear B cell epitopes.