论文部分内容阅读
数据高维且不平衡时,产生的分类器易过度拟合且倾向于牺牲少数类准确率.为降低分类器复杂度且提高少数类识别率,提出了一种代价敏感随机森林算法.以随机森林算法框架为基础,利用Bagging平衡数据,并在基分类器属性分裂度量以及评价函数中引入误分类和测试双重代价,其中测试代价由分裂属性与少数类的相关度决定,使得基决策树在建模过程中向少数类倾斜.与随机森林和仅引入误分类代价的随机森林相比,引入双重代价的随机森林的分类准确率较高,尤其在少数类识别上具有较大优势.
When the data is high-dimensional and unbalanced, the generated classifiers tend to overfitting and tend to sacrifice the accuracy of minority classes.In order to reduce the classifier complexity and improve the recognition rate of minority classes, a cost-sensitive stochastic forest algorithm is proposed, Based on the forest algorithm framework, Bagging is used to balance the data, and the misclassification and testing costs are introduced into the base classifier attribute splitting metric and evaluation function. The test cost is determined by the correlation between split attributes and minority classes, Compared with random forest and random forest which only introduced misclassification cost, random forest with double cost is more accurate in classification, especially in minority recognition.