Multivariate Stratified Sampling by Optimization
提出一种基于最优聚类分析的分层抽样方法,通过整数优化模型和次梯度算法确定分层边界,能有效降低层内方差和标准误差,适用于需要高精度样本的微观经济模拟研究。
An important, recurring problem in statistics involves the determination of strata boundaries for use in stratified sampling. This paper describes a practical method for stratifying a population of observations based on optimal cluster analysis. The goal of stratification is constructing a partition such that observations within a stratum are homogeneous as defined by within-cluster variances for attributes that are deemed important, while observations between strata are heterogeneous. The problem is defined as a deterministic optimization model with integer variables and is solved by means of a subgradient method. Computational tests with several examples show that the within-strata variances and thus the accompanying standard errors can be substantially reduced. Since the proposed model strives to minimize standard error, it is applicable to situations where a precise sample is essential, for example, microeconomic simulation studies.