基于lp正则化统计量的模型X敲除方法的功效分析

A power analysis for model-X knockoffs with lp-regularized statistics

Annals of Statistics · 2023

被引 5

ABS 4*

Asaf Weinstein
Weijie Su
Małgorzata Bogdan
Rina Foygel Barber
Emmanuel J. Candès

中文导读

研究了使用lp正则化估计量排序特征并利用模型X敲除控制错误发现率的变量选择方法的统计功效，推导了精确渐近预测，并比较了Lasso和阈值Lasso敲除版本的性能。

Abstract

Variable selection properties of procedures utilizing penalized-likelihood estimates is a central topic in the study of high-dimensional linear regression problems. Existing literature emphasizes the quality of ranking of the variables by such procedures as reflected in the receiver operating characteristic curve or in prediction performance. Specifically, recent works have harnessed modern theory of approximate message-passing (AMP) to obtain, in a particular setting, exact asymptotic predictions of the type I, type II error tradeoff for selection procedures that rely on ℓp-regularized estimators. In practice, effective ranking by itself is often not sufficient because some calibration for Type I error is required. In this work, we study theoretically the power of selection procedures that similarly rank the features by the size of an ℓp-regularized estimator, but further use Model-X knockoffs to control the false discovery rate in the realistic situation where no prior information about the signal is available. In analyzing the power of the resulting procedure, we extend existing results in AMP theory to handle the pairing between original variables and their knockoffs. This is used to derive exact asymptotic predictions for power. We apply the general results to compare the power of the knockoffs versions of Lasso and thresholded-Lasso selection, and demonstrate that in the i.i.d. covariate setting under consideration, tuning by cross-validation on the augmented design matrix is nearly optimal. We further demonstrate how the techniques allow to analyze also the Type S error, and a corresponding notion of power, when selections are supplemented with a decision on the sign of the coefficient.

高维线性回归变量选择错误发现率控制近似消息传递

阅读原文 ↗