Density Estimation With Confidence Sets Exemplified by Superclusters and Voids in the Galaxies
提出一种同时给出半参数密度的点估计和置信集的方法,通过样本间距的非参数拟合优度检验确定平滑参数范围,并用交叉验证选择点估计。以星系速度数据为例,展示密度估计从平滑到粗糙的变化,模式数从三到七个。
Abstract A method is presented for forming both a point estimate and a confidence set of semiparametric densities. The final product is a three-dimensional figure that displays a selection of density estimates for a plausible range of smoothing parameters. The boundaries of the smoothing parameter are determined by a nonparametric goodness-of-fit test that is based on the sample spacings. For each value of the smoothing parameter our estimator is selected by choosing the normal mixture that maximizes a function of the sample spacings. A point estimate is selected from this confidence set by using the method of cross-validation. An algorithm to find the mixing distribution that maximizes the spacings functional is presented. These methods are illustrated with a data set from the astronomy literature. The measurements are velocities at which galaxies in the Corona Borealis region are moving away from our galaxy. If the galaxies are clustered, the velocity density will be multimodal, with clusters corresponding to modes. Natural candidates for examining the distribution of the data are finite normal mixtures and histograms. The shortcomings of these methods become apparent from the analysis of these data. By finding a confidence set of densities a set of estimates is obtained, ranging from smooth to rough; the number of modes ranges from three to seven. The confidence set of densities is further substantiated by performing nonparametric tests for the number of modes. Key Words: Cross-validationNormal mixturesSpacingsVertex exchange method