Yicong Jiang and Zheng Tracy Ke’s contribution to the Discussion of ‘Root and community inference on the latent growth process of a network’ by Crane and Xu
本文讨论将Crane和Xu提出的动态网络根节点推断方法应用于引文网络,识别关键词论文中的根论文,并建议扩展模型以处理节点异质性和多根情况。
We congratulate the authors on an excellent paper! Crane and Xu (2021) proposed novel methods for finding ‘root nodes’ from a single snapshot of a dynamic network process, with several interesting real-data examples. We now consider a new application for finding ‘root papers’ in a citation network. The MADStat dataset (Ji et al., 2022; Ke et al., 2023) consists of the bibtex and citation information of over 83 K papers, which we use to construct paper citation networks. Given a keyword (e.g. ‘Lasso’), let V0 be the set of papers whose titles contain this keyword, and let V be the set of papers that are either citers or citees of papers in V0 (we only count the citations recorded in MADStat). We then build a symmetric network on V, with an edge between two papers i and j if either i cites j or j cites i; if the network is disconnected, we restrict it to its giant component. The networks for two keywords, Lasso and Bayesian, are shown in Figure 1. We apply the method in Crane and Xu (2021) to each network to obtain the posterior probability of each node being a root node. The top 6 papers with the highest posterior root probability are in Table 1. In the Lasso network, Tibshirani (1996) is ranked top 1. In the Bayesian network, Gelfand and Smith (1990) is ranked top 1. The results are meaningful and motivate a new application of the proposed method. The Lasso network (left graph) and the Bayesian network (right graph); only the 30 highest-degree nodes are shown. The table on the right provides the summary statistics, where dmax, dmin, and d¯ are the maximum, minimum, and average degrees, respectively. The top 6 papers with the highest posterior root probability in the Lasso network (top) and the Bayesian network (bottom), respectively We also suggest some extensions of Crane and Xu (2021). First, the PAPER model is built on the Erdos–Renyi model and does not model degree heterogeneity among nodes. The Erdos–Renyi model can be generalized to accommodate degree heterogeneity [such as a DCBM model with K = 1; see Jin et al. (2022)]. It will be interesting to see if the PAPER model can be generalized similarly. Second, in the case of multiple roots, we may run community detection first and then apply the algorithm to each community separately. There are fast community detection algorithms [e.g. Jin et al. (2022); Jiang and Ke (2023)] equipped with data-driven choices of the number of communities (Jin et al., 2023). Combining them with the current algorithm will help reduce computational costs and avoid randomness caused by forest partition. We hope these ideas are beneficial. Congratulations to the authors again on their remarkable work!