Characterizing exons and introns by regularity of nucleotide strings.
Academic Article
Overview
Research
Identity
Additional Document Info
Other
View All
Overview
abstract
BACKGROUND: Translation of nucleotides into a numeric form has been approached in many ways and has allowed researchers to investigate the properties of protein-coding sequences and noncoding sequences. Typically, more pronounced long-range correlations and increased regularity were found in intron-containing genes and in non-transcribed regulatory DNA sequences, compared to cDNA sequences or intron-less genes. The regularity is assessed by spectral tools defined on numerical translates. In most popular approaches of numerical translation the resulting spectra depend on the assignment of numerical values to nucleotides. Our contribution is to propose and illustrate a spectra which remains invariant to the translation rules used in traditional approaches. RESULTS: We outline a methodology for representing sequences of DNA nucleotides as numeric matrices in order to analytically investigate important structural characteristics of DNA. This representation allows us to compute the 2-dimensional wavelet transformation and assess regularity characteristics of the sequence via the slope of the wavelet spectra. In addition to computing a global slope measure for a sequence, we can apply our methodology for overlapping sections of nucleotides to obtain an "evolutionary slope." To illustrate our methodology, we analyzed 376 gene sequences from the first chromosome of the honeybee. CONCLUSION: For the genes analyzed, we find that introns are significantly more regular (lead to more negative spectral slopes) than exons, which agrees with the results from the literature where regularity is measured on "DNA walks". However, unlike DNA walks where the nucleotides are assigned numerical values depending on nucleotide characteristics (purine-pyrimidine, weak-strong hydrogen bonds, keto-amino, etc.) or other spatial assignments, the proposed spectral tool is invariant to the assignment of nucleotides. Thus, ambiguity in numerical translation of nucleotides is eliminated.