USING MODEL SELECTION ALGORITHMS TO OBTAIN RELIABLE COEFFICIENT ESTIMATES
综述了常见模型选择算法,通过蒙特卡洛实验说明它们在排除相关变量和保留无关变量之间的权衡,并指出没有一种算法在所有情况下最优。
Abstract This review surveys a number of common model selection algorithms (MSAs), discusses how they relate to each other and identifies factors that explain their relative performances. At the heart of MSA performance is the trade‐off between type I and type II errors. Some relevant variables will be mistakenly excluded, and some irrelevant variables will be retained by chance. A successful MSA will find the optimal trade‐off between the two types of errors for a given data environment. Whether a given MSA will be successful in a given environment depends on the relative costs of these two types of errors. We use Monte Carlo experimentation to illustrate these issues. We confirm that no MSA does best in all circumstances. Even the worst MSA in terms of overall performance – the strategy of including all candidate variables – sometimes performs best (viz., when all candidate variables are relevant). We also show how (1) the ratio of relevant to total candidate variables and (2) data‐generating process noise affect relative MSA performance. Finally, we discuss a number of issues complicating the task of MSAs in producing reliable coefficient estimates.