Some Statistical Problems in the Assessment of Inhomogeneities of DNA Sequence Data
本文研究了DNA序列中生化标记分布异常的两类统计问题,提出了评估标记过度聚集、过度分散或过于规则的新统计方法,并应用于大肠杆菌物理图谱数据。
Abstract The fields of molecular genetics and medicine are accumulating DNA and protein sequence data at an accelerating rate. Discovering and interpreting sequence patterns can contribute to understanding molecular mechanisms and evolutionary processes. This article considers two types of statistical problems in these contexts: (1) identifying anomalies in the distribution of a specified biochemical marker along a DNA string; in particular, new statistical methods are set forth by which to assess excessive clustering, over dispersion, and too much regularity of the marker along the sequence. Applications are given to the physical map data of the bacterium Escherichia coli. (2) Some results and statistical problems on the assembly of cloned DNA segments are also described. Sections 2 and 3 of the article present helpful background material on DNA organization and inheritance.