一个用于从多变量数据中系统构建时空指数的整洁框架与基础设施

A Tidy Framework and Infrastructure to Systematically Assemble Spatio-temporal Indexes from Multivariate Data

Journal of Computational and Graphical Statistics · 2024
被引 1
ABS 3

中文导读

提出一个模块化数据管道框架,用于构建和评估多变量时空指数,支持参数调整、步骤替换和不确定性计算,并通过干旱指数、性别差距指数和标准化降水指数三个案例展示其应用。

Abstract

Indexes are useful for summarizing multivariate information into single metrics for monitoring, communicating, and decision-making. While most work has focused on defining new indexes for specific purposes, more attention needs to be directed towards making it possible to understand index behavior in different data conditions, and to determine how their structure affects their values and the variability therein. Here we discuss a modular data pipeline recommendation to assemble indexes. It is universally applicable to index computation and allows investigation of index behavior as part of the development procedure. One can compute indexes with different parameter choices, adjust steps in the index definition by adding, removing, and swapping them to experiment with various index designs, calculate uncertainty measures, and assess indexes’ robustness. The paper presents three examples to illustrate the usage of the pipeline framework: comparison of two different indexes designed to monitor the spatio-temporal distribution of drought in Queensland, Australia; the effect of dimension reduction choices on the Global Gender Gap Index (GGGI) on countries’ ranking; and how to calculate bootstrap confidence intervals for the Standardized Precipitation Index (SPI). The methods are supported by a new R package, called tidyindex. Supplemental materials for the article are available online.

多变量统计数据科学计量经济学机器学习时空分析