CatBoost-enhanced EWMA chart: monitoring high-dimensional categorical data streams
利用历史失控数据,基于CatBoost模型构建EWMA控制图,用于高效监测高维分类数据流,模拟显示其优于现有方法,并通过基因序列案例验证实用性。
With the rapid development of modern sensor technology, data flow characterised by high-dimension and category frequently appear, which poses a great challenge to traditional statistical process control (SPC) tools. In this study, by making full use of the information provided by the historical out-of-control (OC) data, we construct a Phase II EWMA control scheme based on the probabilities of in-control (IC) state from the gradient boosting with categorical features support (CatBoost). Comprehensive simulation analyses are performed to examine the characteristics of the proposed control chart under various scenarios relative to some existing multivariate control charts. The simulation findings indicate that the proposed control chart demonstrates greater efficiency versus its competitors across numerous categorical data situations. In addition, we illustrate the practicality and efficacy of the proposed control chart through a case study involving gene sequences.