Seriation using tree-penalized path length
提出一种混合序列化方法tpPL,结合旅行商问题与最优叶排序,通过树惩罚路径长度平衡两者优势,在44个合成数据集上验证其克服了两种方法的弱点。
dissimilarity matrix, data seriation methods produce a linear ordering of the objects, putting similar objects nearby in the ordering. One may visualize the reordered dissimilarity matrix with a heat map and thus understand the structure of the data, while still displaying the full matrix of dissimilarities. Good orderings produce heat maps that are easy to read and allow for clear interpretation. We consider two popular seriation methods, minimizing path length by solving the Traveling Salesman Problem (TSP), and Optimal Leaf Ordering (OLO), which minimizes path length among all orderings consistent with a given tree structure. Learning from the strengths and weaknesses of the two methods, we introduce a new hybrid seriation method, tree-penalized Path Length (tpPL). The objective is a linear combination of path length and the extent of violations of the tree structure, with a parameter that transitions the optimal paths smoothly from TSP to OLO. We present a detailed study over 44 synthetic datasets which are designed to bring out the strengths and weaknesses of the three methods, finding that the hybrid nature of tpPL enables it to overcome the weaknesses of TSP and OLO.