Sempervirens: A Fast Reconstruction Algorithm for Noisy and Incomplete Binary Matrix Representations of Trees
提出Sempervirens算法,用于从含大量错误和缺失的单细胞测序数据中快速重建细胞谱系树(系统发育树),速度比现有方法快三个数量级以上,且精度相当。
Applications such as reconstructing cell lineage trees (represented as phylogenetic trees) from single-cell sequencing data require reconstructing a [Formula: see text]-matrix that has many errors and missing entries. We introduce Sempervirens, a very fast matrix reconstruction algorithm for noisy and incomplete matrix representations of phylogenetic trees. Sempervirens uses an iterative maximum-likelihood approach to determine the topology tree represented by the corrupted data. We show that Sempervirens is at least three orders of magnitude faster than other methods on thousand by thousand matrices, with the speed gap widening with larger matrices. We also show that Sempervirens matches state-of-the-art methods in reconstruction accuracy. The speed of Sempervirens enables it to be tractably applied to reconstructing much larger matrices than those that other methods can reconstruct. In addition to experimental results, we justify the algorithm with a mathematical treatment of its subprocedures. History: Accepted by J. Paul Brooks, Area Editor for Applications in Biology, Medicine, & Healthcare. Funding: This work was supported by the Office of Science of the Department of Energy [Contract DE-AC02-05CH11231]. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2023.0373 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2023.0373 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .