🌙

一种通过融合在线数据与离线数据的成本效益数据采样策略

A Cost-Effective Data Sampling Strategy by Unifying Online Data With Offline Data

IEEE Transactions on Engineering Management · 2025
被引 0
ABS 3

中文导读

提出DFO2模型,动态融合低成本在线数据与高成本但准确的离线数据,通过非线性上下文赌博机方法优化采样决策,在医疗和电力系统实验中实现成本与精度的平衡。

Abstract

Data collected from different channels that describe the condition of the same object may differ in their reliability and sampling costs. For example, online sensor tracking data of an object may be less reliable but much cheaper than offline field inspection data. This provides chances of fusing multi-channel data to accurately monitor the condition of an object at low costs. In the paper, we formulate a model of Dynamically Fusing easily achieved Online data with costly Offline sampled data (abbreviated as DFO<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>). In DFO<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>, we consider a dynamic decision process where a decision-maker observes online data and then decides whether to acquire a new piece of offline data. Offline data is costly to acquire, but it is accurate and can yield a reward in correcting errors in online data. A nonlinear contextual bandit method is then proposed to estimate the expected reward of offline sampling decisions, and an offline sampling policy is obtained by maximizing the expected reward. Theoretical analysis indicates that DFO<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> achieves a sub-linear regret bound, which means that the reward of DFO<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> asymptotically approaches that of the optimal policy over time. To demonstrate the wide applicability of DFO<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>, experiments are performed across two distinct domains—healthcare and power systems. Results show that DFO<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> has a better performance in trading off sampling cost and information accuracy compared to the benchmarks. Comparative experiments under different conditions also reveal the robustness of the method performance. Overall, this paper provides a practical framework for unifying multi-channel data to realize cost-effective monitoring.

数据融合在线学习决策优化成本效益分析