David J. Olive's contribution to the Discussion of “On optimal linear prediction” by I. Helland
讨论了在假设H_m下k分量PLS估计量的行为,指出该假设较强,并给出了OLS与PLS估计量之间的关系,以及模型选择中预测变量重要性递减的结果。
It was interesting to see how the k $$ k $$ -component PLS estimator β ^ k PLS $$ {\hat{\boldsymbol{\beta}}}_{k\mathrm{PLS}} $$ behaves under assumption H m $$ {H}_m $$ for m < p $$ m<p $$ : β = ∑ j = 1 p γ j d j $$ \boldsymbol{\beta} ={\sum}_{j=1}^p{\gamma}_j{\boldsymbol{d}}_j $$ with exactly m $$ m $$ nonzero terms. Some simple results hold for model selection estimators when the number of predictors p $$ p $$ is fixed. Several methods, including PLS, use p $$ p $$ linear combinations η 1 T x , … , η p T x $$ {\boldsymbol{\eta}}_1^T\boldsymbol{x},\dots, {\boldsymbol{\eta}}_p^T\boldsymbol{x} $$ . Performing the ordinary least squares (OLS) regression of Y $$ Y $$ on ( η ^ 1 T x , η ^ 2 T x , … , η ^ k T x ) $$ \left({\hat{\boldsymbol{\eta}}}_1^T\boldsymbol{x},{\hat{\boldsymbol{\eta}}}_2^T\boldsymbol{x},\dots, {\hat{\boldsymbol{\eta}}}_k^T\boldsymbol{x}\right) $$ and a constant gives the k $$ k $$ -component estimator. The PLS literature often assumes (a1): Y | x = α + x T β k PLS + e $$ Y\mid \boldsymbol{x}=\alpha +{\boldsymbol{x}}^T{\boldsymbol{\beta}}_{k\mathrm{PLS}}+e $$ for some k $$ k $$ . If Y | x = α + x T β + e $$ Y\mid \boldsymbol{x}=\alpha +{\boldsymbol{x}}^T\boldsymbol{\beta} +e $$ , then under mild regularity conditions, β = β OLS $$ \boldsymbol{\beta} ={\boldsymbol{\beta}}_{\mathrm{OLS}} $$ . Hence assumption (a1) forces β k PLS = β OLS $$ {\boldsymbol{\beta}}_{k\mathrm{PLS}}={\boldsymbol{\beta}}_{\mathrm{OLS}} $$ . For k = 1 $$ k=1 $$ , (a1) forces β OLS = β 1 PLS = $$ {\boldsymbol{\beta}}_{\mathrm{OLS}}={\boldsymbol{\beta}}_{1\mathrm{PLS}}= $$ an eigenvector of the covariance matrix C o v ( x ) = Σ x $$ Cov\left(\boldsymbol{x}\right)={\boldsymbol{\Sigma}}_{\boldsymbol{x}} $$ . Assume instead that the cases ( x i , Y i ) $$ \left({\boldsymbol{x}}_i,{Y}_i\right) $$ are iid with E ( Y ) = μ Y $$ E(Y)={\mu}_Y $$ and E ( x ) = μ x $$ E\left(\boldsymbol{x}\right)={\boldsymbol{\mu}}_{\boldsymbol{x}} $$ . Let C o v ( x , Y ) = Σ x Y $$ Cov\left(\boldsymbol{x},Y\right)={\boldsymbol{\Sigma}}_{\boldsymbol{x}Y} $$ . Then β = β OLS = Σ x − 1 Σ x Y $$ \boldsymbol{\beta} ={\boldsymbol{\beta}}_{\mathrm{OLS}}={\boldsymbol{\Sigma}}_{\boldsymbol{x}}^{-1}{\boldsymbol{\Sigma}}_{\boldsymbol{x}Y} $$ , β 1 PLS = θ Σ x Y $$ {\boldsymbol{\beta}}_{1\mathrm{PLS}}=\theta {\boldsymbol{\Sigma}}_{\boldsymbol{x}Y} $$ where θ $$ \theta $$ is a constant, and Σ x Y = Σ x β $$ {\boldsymbol{\Sigma}}_{\boldsymbol{x}Y}={\boldsymbol{\Sigma}}_{\boldsymbol{x}}\boldsymbol{\beta} $$ , even when heterogeneity is present. Since Σ x $$ {\boldsymbol{\Sigma}}_{\boldsymbol{x}} $$ is positive definite, β = 0 $$ \boldsymbol{\beta} =\mathbf{0} $$ iff Σ x Y = 0 . $$ {\boldsymbol{\Sigma}}_{\boldsymbol{x}Y}=\mathbf{0}. $$ This hypothesis can be tested by applying a one sample test to v i = ( x i − x ‾ ) ( Y i − Y ‾ ) $$ {\boldsymbol{v}}_i=\left({\boldsymbol{x}}_i-\overline{\boldsymbol{x}}\right)\left({Y}_i-\overline{Y}\right) $$ for i = 1 , … , n $$ i=1,\dots, n $$ . See Olive and Zhang (2025), Olive et al. (2025), and Abid et al. (2025). Let an OLS working model for β ^ k P L S $$ {\hat{\boldsymbol{\beta}}}_{kPLS} $$ be Y i = α k + θ k 1 W 1 i + ⋯ + θ k k W k i + e k i $$ {Y}_i={\alpha}_k+{\theta}_{k1}{W}_{1i}+\cdots +{\theta}_{kk}{W}_{ki}+{e}_{ki} $$ where W j i = x i T Σ ^ x j − 1 Σ ^ x Y $$ {W}_{ji}={\boldsymbol{x}}_i^T{\hat{\boldsymbol{\Sigma}}}_{\boldsymbol{x}}^{j-1}{\hat{\boldsymbol{\Sigma}}}_{\boldsymbol{x}Y} $$ with Σ ^ x 0 = Σ x 0 = I p $$ {\hat{\boldsymbol{\Sigma}}}_{\boldsymbol{x}}^0={\boldsymbol{\Sigma}}_{\boldsymbol{x}}^0={\boldsymbol{I}}_p $$ . Then β ^ k PLS = ( ∑ j = 1 k θ ^ k j Σ ^ x j − 1 ) Σ ^ x Y $$ {\hat{\boldsymbol{\beta}}}_{k\mathrm{PLS}}=\left({\sum}_{j=1}^k{\hat{\theta}}_{kj}{\hat{\boldsymbol{\Sigma}}}_{\boldsymbol{x}}^{j-1}\right){\hat{\boldsymbol{\Sigma}}}_{\boldsymbol{x}Y} $$ and β k PLS = ( ∑ j = 1 k θ k j Σ x j − 1 ) Σ x Y $$ {\boldsymbol{\beta}}_{k\mathrm{PLS}}=\left({\sum}_{j=1}^k{\theta}_{kj}{\boldsymbol{\Sigma}}_{\boldsymbol{x}}^{j-1}\right){\boldsymbol{\Sigma}}_{\boldsymbol{x}Y} $$ (under iid cases) using Y = α k PLS + x T β k PLS + e k $$ Y={\alpha}_{k\mathrm{PLS}}+{\boldsymbol{x}}^T{\boldsymbol{\beta}}_{k\mathrm{PLS}}+{e}_k $$ . This result suggests that the β k PLS $$ {\boldsymbol{\beta}}_{k\mathrm{PLS}} $$ are typically different for each k = 1 , … , p $$ k=1,\dots, p $$ , but β k PLS = β q PLS $$ {\boldsymbol{\beta}}_{k\mathrm{PLS}}={\boldsymbol{\beta}}_{q\mathrm{PLS}} $$ for k ≤ q ≤ p $$ k\le q\le p $$ if θ p j = 0 $$ {\theta}_{pj}=0 $$ for k + 1 ≤ j ≤ p $$ k+1\le j\le p $$ . The assumption H m $$ {H}_m $$ appears to be strong since β ∈ ℝ p $$ \boldsymbol{\beta} \in {\mathbb{R}}^p $$ is a much weaker assumption than β ∈ ℝ m $$ \boldsymbol{\beta} \in {\mathbb{R}}^m $$ where 1 ≤ m < p . $$ 1\le m<p. $$ The model selection result of Equation (1) implies that H m $$ {H}_m $$ is approximately true in that θ p j ≈ 0 $$ {\theta}_{pj}\approx 0 $$ for k ∗ + 1 ≤ j ≤ p $$ {k}^{\ast }+1\le j\le p $$ for some m = k ∗ < p $$ m={k}^{\ast }<p $$ . Hence the predictors W j $$ {W}_j $$ are “weak" or “almost immaterial” for m + 1 ≤ j ≤ p $$ m+1\le j\le p $$ .