When Costs Are Unequal and Unknown: A Subtree Grafting Approach for Unbalanced Data Classification*
针对二分类中不平衡数据导致少数类误分类率高的问题,提出子树嫁接(STG)方法,在成本未知时平衡两类准确率,基于银行等数据集验证其有效性。
In binary classifications, a decision tree learned from unbalanced data typically creates an important challenge related to the high misclassification rate of the minority class. Assigning different misclassification costs can address this problem, though usually at the cost of accuracy for the majority class. This effect can be particularly hazardous if the costs cannot be specified precisely. When the costs are unknown or difficult to determine, decision makers may prefer a classifier with more balanced accuracy for both classes rather than a standard or cost-sensitively learned one. In the context of learning trees, this research therefore proposes a new tree induction approach called subtree grafting (STG). On the basis of a real bank data set and several other data sets, we test the proposed STG method and find that our proposed approach provides a successful compromise between standard and cost-sensitive trees.