变朴素贝叶斯模型及其在中文文本分类中的应用

Varying Naïve Bayes Models With Applications to Classification of Chinese Text Documents

Journal of Business & Economic Statistics · 2014
被引 7
人大 AABS 4

中文导读

针对传统分类方法无法处理时变结构的问题,提出变朴素贝叶斯模型,允许分类规则随时间变化,并通过核平滑估计参数和BIC准则选择特征,在长春市长公开电话数据集上验证了有效性。

Abstract

Document classification is an area of great importance for which many classification methods have been developed. However, most of these methods cannot generate time-dependent classification rules. Thus, they are not the best choices for problems with time-varying structures. To address this problem, we propose a varying naïve Bayes model, which is a natural extension of the naïve Bayes model that allows for time-dependent classification rule. The method of kernel smoothing is developed for parameter estimation and a BIC-type criterion is invented for feature selection. Asymptotic theory is developed and numerical studies are conducted. Finally, the proposed method is demonstrated on a real dataset, which was generated by the Mayor Public Hotline of Changchun, the capital city of Jilin Province in Northeast China.

时变朴素贝叶斯模型中文文本分类核平滑特征选择