Abstract
It is probable that the distributional structure of DNA sequences arises from the accumulation of many successive stochastic events such as nucleotide deletions, insertions, substitutions and elongations [1, 2, 3, 4, 5, 6, 7]. Although the existence of long-range correlations in non-coding portions of DNA sequences is well established [8, 9, 10, 11], first order Markov chains might well capture aspects of their nucleotide distributions [12]. Here we propose a hidden Markov model based on a coupling of an urn process with a Markov chain to approximate the distributional structure of primitive DNA sequences. Then, by supposing that a bacterial DNA sequence can be derived from uniformly distributed mutations of some primitive DNA, we use the model to explain and predict some distributional properties of bacterial DNA sequences. The distributional properties intrinsic to the model were compared to statistical estimates from 1049 bacterial DNA sequences. In particular, the proposed model provides another possible theoretical explanation for Chargaff’s second parity rule for short oligonucleotides [13, 14].
Similar content being viewed by others
Article PDF
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sobottka, M., Hart, A. On the nucleotide distribution in bacterial DNA sequences. Nat Prec (2010). https://doi.org/10.1038/npre.2010.5245.1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/npre.2010.5245.1