CGB - Universidad Mayor

Efficient extraction of knowledge from biological data requires the development

of structured vocabularies to unambiguously define biological terms. This paper

proposes descriptions and definitions to disambiguate the term 'single-exon

gene'. Eukaryotic Single-Exon Genes (SEGs) have been defined as genes that do

not have introns in their protein coding sequences. They have been studied not

only to determine their origin and evolution but also because their expression

has been linked to several types of human cancer and neurological/developmental

disorders and many exhibit tissue-specific transcription. Unfortunately, the

term 'SEGs' is rife with ambiguity, leading to biological misinterpretations. In

the classic definition, no distinction is made between SEGs that harbor introns

in their untranslated regions (UTRs) versus those without. This distinction is

important to make because the presence of introns in UTRs affects

transcriptional regulation and post-transcriptional processing of the mRNA. In

addition, recent whole-transcriptome shotgun sequencing has led to the discovery

of many examples of single-exon mRNAs that arise from alternative splicing of

multi-exon genes, these single-exon isoforms are being confused with SEGs

despite their clearly different origin. The increasing expansion of RNA-seq

datasets makes it imperative to distinguish the different SEG types before

annotation errors become indelibly propagated in biological databases. This

paper develops a structured vocabulary for their disambiguation, allowing a

major reassessment of their evolutionary trajectories, regulation, RNA

processing and transport, and provides the opportunity to improve the detection

of gene associations with disorders including cancers, neurological and

developmental diseases.

PUBLICACIONES