Poorbita Kundu 和 Johannes Schmidt-Hieber 对 Whiteley 等人《流形假设的统计探索》讨论的贡献

Poorbita Kundu’s and Johannes Schmidt-Hieber’s contribution to the Discussion of ‘Statistical exploration of the manifold hypothesis’ by Whiteley et al.

Journal of the Royal Statistical Society. Series B: Statistical Methodology · 2026

被引 0 · 同刊同年前 4%

ABS 4

Poorbita Kundu
J. Schmidt-Hieber 通讯

中文导读

本文是对 Whiteley 等人关于流形假设论文的讨论，将潜度量模型与函数主成分分析进行比较，探讨了估计均值、利用核平滑技术以及结合在线学习等方向。

Abstract

The authors have done an admirable job by combining new methodology, meaningful statistical theory and applications all within one article. The work is both deep and accessible. As a first discussion point, let us relate the proposed latent metric model (LMM) to functional PCA. This comparison can serve as basis to borrow strength and to advance the understanding of both approaches or even to unify them. The latent metric model assumes that we observe where Z1,…,Zn∼μ are i.i.d. and μ is a Borel probability measure defined on a metric space Z, X1,…,Xp:Z→R are random functions (assumed to be independent in A4), Eij is the noise and standard assumptions are imposed. In functional PCA (fPCA), one observes random time points Tij∈[a,b] and with similar assumptions on the random functions Xj:[a,b]→R and the noise variables. The case of dense functional data refers to n→∞ (e.g. Li & Hsing, 2010). When the observation grid is fixed, as is common in dense functional data settings, we have Tij = Ti for all j, that is, all subjects are observed on a common set of time points. The asymptotic framework adopted in Theorem 1 of the paper under discussion is also closely aligned with that commonly used in the dense functional data literature, cf. Section 3.2 in Hall et al. (2006). The goal of fPCA is to recover the eigenfunctions and eigenvalues of the covariance operator (s,t)↦Cov(X(s),X(t)) as a map [a,b]2→R. The strategy is to use smoothing techniques. Convergence rates were obtained in Hall et al. (2006), Theorems 1–2 and Li and Hsing (2010), Corollary 3.7. In functional data analysis, it is moreover common to estimate the mean function Hall et al. (2006). The key difference between both models seems to be that in the latent metric model the Zi are latent and unobserved. Even stronger, it is an aim in this model to recover structure of the latent domain Z and of ϕ(Z) (Zhang and Wang, 2016). This comparison raises the question of whether it is possible to estimate the mean E[p−1∑j=1pXj(z)] in the LMM. Moreover, for any i,i′∈{1,…,n}, with f the mean correlation kernel defined in Equation (2) of the paper. By imposing suitable assumptions on f one could infer from this quantity whether Zi and Zi′ are close. This information could then potentially be used to apply kernel smoothing techniques such as in fPCA. Furthermore, it would be interesting to consider a combined model, where parts of the Zi are observed. Another fruitful direction would be to consider an online/streaming version of the proposed method. That the data arrive in an online fashion seems natural from an applied viewpoint and there is growing interest in streaming PCA, see e.g. Jain et al. (2016). While it is standard in the statistics literature to assume additive noise, this seems less present in modern ML problems. For instance, Chen et al. (2022) argues that for image classification, the main source of noise are random deformations of the inputs. This would rather suggest to add noise in the latent space. None.

流形假设函数主成分分析潜度量模型统计理论

作者公开的免费版 ↗阅读原文 ↗