超高维分类数据的特征筛选及其应用

Feature Screening for Ultrahigh Dimensional Categorical Data With Applications

Journal of Business & Economic Statistics · 2013
被引 74
人大 AABS 4

中文导读

针对响应变量和协变量均为分类变量的超高维数据,提出基于皮尔逊卡方的特征筛选方法,可直接检测重要交互效应,并通过模拟和实证验证其有效性。

Abstract

Ultrahigh dimensional data with both categorical responses and categorical covariates are frequently encountered in the analysis of big data, for which feature screening has become an indispensable statistical tool. We propose a Pearson chi-square based feature screening procedure for categorical response with ultrahigh dimensional categorical covariates. The proposed procedure can be directly applied for detection of important interaction effects. We further show that the proposed procedure possesses screening consistency property in the terminology of Fan and Lv (2008). We investigate the finite sample performance of the proposed procedure by Monte Carlo simulation studies, and illustrate the proposed method by two empirical datasets.

超高维分类数据特征筛选皮尔逊卡方交互效应检测