论文部分内容阅读
近年来,类不平衡问题已逐渐成为人工智能﹑机器学习和数据挖掘等领域的研究热点,目前已有大量实用有效的方法.然而,近期的研究结果却表明,并非所有的不平衡数据分类任务都是有害的,在无害的任务上采用类不平衡学习算法将很难提高,甚至会降低分类的性能,同时可能大幅度增加训练的时间开销.针对此问题,提出了一种危害预评估策略.该策略采用留一交叉验证法(LOOCV,Leave-one-out cross validation)测试训练集的分类性能,并据此计算一种称为危害测度(HM,Harmful-ness Measure)的新指标,用以量化危害的大小,从而为学习算法的选择提供指导.通过8个类不平衡数据集对所提策略进行了验证,表明该策略是有效和可行的.
In recent years, the class imbalance has gradually become a research hotspot in the field of artificial intelligence, machine learning and data mining, and there are a lot of practical and effective methods.However, recent research shows that not all the unbalanced data classification tasks Are all harmful. It is difficult to improve the class imbalance learning algorithm on innocuous tasks, and even reduce the performance of classification, at the same time, the training time overhead may be greatly increased. In view of this problem, a pre-hazard assessment Strategy.This strategy uses a leave-one-out cross validation (LOOCV) to test the classification performance of the training set and calculate a new indicator called Harmfulness Measure (HM) Which can be used to quantify the size of the hazard and provide guidance for the selection of learning algorithms.The validation of the proposed strategy through eight classes of unbalanced data sets shows that the strategy is effective and feasible.