计算机模型的预测性子数据选择

Predictive Subdata Selection for Computer Models

Journal of Computational and Graphical Statistics · 2022

被引 4

ABS 3

Ming-Chung Chang 通讯

中文导读

针对大规模计算机模型，提出一种基于期望改进优化的子数据选择方法，利用输入特征区域几何和输出值信息，选出能提高预测精度的子数据集，降低计算成本。

Abstract

An explosion in the availability of rich data from the technological advances is hindering efforts at statistical analysis due to constraints on time and memory storage, regardless of whether researchers employ simple methods (e.g., linear regression) or complex models (e.g., Gaussian processes). A recent approach to overcoming these limits involves information-based optimal subdata selection and Latin hypercube subagging. In the current study, we develop a novel subdata selection method for large-scale computer models based on expected improvement optimization. Numerical and empirical analysis using real-world data are used to select subdata by which to derive accurate predictions. During the optimization procedure, the proposed scheme employs the geometry of the input feature region as well as information related to output values. The data points associated with the largest improvement in prediction accuracy are combined in the construction of a subdataset that can be used to formulate predictions with affordable computing time. Supplementary materials for this article, including proofs of theorems and additional numerical results, are available online.

计算机科学机器学习数据挖掘统计建模优化算法

免费全文 ↗阅读原文 ↗