Efficient extraction of knowledge from biological data requires the development
of structured vocabularies to unambiguously define biological terms. This paper
proposes descriptions and definitions to disambiguate the term 'single-exon
gene'. Eukaryotic Single-Exon Genes (SEGs) have been defined as genes that do
not have introns in their protein coding sequences. They have been studied not
only to determine their origin and evolution but also because their expression
has been linked to several types of human cancer and neurological/developmental
disorders and many exhibit tissue-specific transcription. Unfortunately, the
term 'SEGs' is rife with ambiguity, leading to biological misinterpretations. In
the classic definition, no distinction is made between SEGs that harbor introns
in their untranslated regions (UTRs) versus those without. This distinction is
important to make because the presence of introns in UTRs affects
transcriptional regulation and post-transcriptional processing of the mRNA. In
addition, recent whole-transcriptome shotgun sequencing has led to the discovery
of many examples of single-exon mRNAs that arise from alternative splicing of
multi-exon genes, these single-exon isoforms are being confused with SEGs
despite their clearly different origin. The increasing expansion of RNA-seq
datasets makes it imperative to distinguish the different SEG types before
annotation errors become indelibly propagated in biological databases. This
paper develops a structured vocabulary for their disambiguation, allowing a
major reassessment of their evolutionary trajectories, regulation, RNA
processing and transport, and provides the opportunity to improve the detection
of gene associations with disorders including cancers, neurological and
developmental diseases.