Reconciling model-X and doubly robust approaches to conditional independence testing
研究了模型X方法在条件分布学习自样本时的性质,发现蒸馏条件随机化检验(dCRT)具有双重稳健性,并与广义协方差度量检验(GCM)渐近等价,后者在部分线性备择假设下最优。模拟显示两者性能相似,但dCRT在小样本或离散响应时控制第一类错误略优而功效略差,基于post-lasso的检验统计量可显著改善两类方法的第一类错误控制。
Model-X approaches to testing conditional independence between a predictor and an outcome variable given a vector of covariates usually assume exact knowledge of the conditional distribution of the predictor given the covariates. Nevertheless, model-X methodologies are often deployed with this conditional distribution learned in sample. We investigate the consequences of this choice through the lens of the distilled conditional randomization test (dCRT). We find that Type-I error control is still possible, but only if the mean of the outcome variable given the covariates is estimated well enough. This demonstrates that the dCRT is doubly robust, and motivates a comparison to the generalized covariance measure (GCM) test, another doubly robust conditional independence test. We prove that these two tests are asymptotically equivalent, and show that the GCM test is optimal against (generalized) partially linear alternatives by leveraging semiparametric efficiency theory. In an extensive simulation study, we compare the dCRT to the GCM test. These two tests have broadly similar Type-I error and power, though dCRT can have somewhat better Type-I error control but somewhat worse power in small samples or when the response is discrete. We also find that post-lasso based test statistics (as compared to lasso based statistics) can dramatically improve Type-I error control for both methods.