🌙

生物序列比对中的马尔可夫结构

Markovian Structures in Biological Sequence Alignments

Journal of the American Statistical Association · 1999
被引 22
ABS 4

中文导读

本文分解隐马尔可夫模型为插入和删除两部分,结合块基序模型和贝叶斯选择准则,提出PROBE方法用于多序列比对,在GTP酶家族上验证了准确性。

Abstract

Abstract The alignment of multiple homologous biopolymer sequences is crucial in research on protein modeling and engineering, molecular evolution, and prediction in terms of both gene function and gene product structure. In this article we provide a coherent view of the two recent models used for multiple sequence alignment—the hidden Markov model (HMM) and the block-based motif model—to develop a set of new algorithms that have both the sensitivity of the block-based model and the flexibility of the HMM. In particular, we decompose the standard HMM into two components: the insertion component, which is captured by the so-called “propagation model,” and the deletion component, which is described by a deletion vector. Such a decomposition serves as a basis for rational compromise between biological specificity and model flexibility. Furthermore, we introduce a Bayesian model selection criterion that—in combination with the propagation model, genetic algorithm, and other computational aspects—forms the core of PROBE, a multiple alignment and database search methodology. The application of our method to a GTPase family of protein sequences yields an alignment that is confirmed by comparison with known tertiary structures.

计算生物学序列比对隐马尔可夫模型蛋白质建模