论文部分内容阅读
连续属性离散化在数据挖掘、机器学习和人工智能等领域起着重要的作用.鉴于此,提出一种基于类-属性关联度的启发式离散化技术.该技术定义了一个新的离散化标准,根据数据本身的特性选择最佳断点,克服了目前最先进自顶向下离散化方法存在的缺陷.基于粗糙集理论中变精度粗糙集模型,提出一种新的不一致衡量标准,能够有效地控制离散化所产生的信息丢失,允许数据存在适当的分类错误度.实验结果和统计性分析表明,所提出的技术显著地提高了J4.8决策树和SVM分类器的学习精度.
Discontinuousness of continuous attributes plays an important role in data mining, machine learning and artificial intelligence.In view of this, a heuristic discretization technique based on class-attribute relevance is proposed, which defines a new discretization criterion , According to the characteristics of the data to choose the best breakpoint to overcome the most advanced top-down discretization of the existing shortcomings.Based on the rough set theory of variable precision rough set model, a new inconsistent measurement standards, can be effective The loss of information caused by discretization of ground control allows the data to be properly classified.Experimental results and statistical analysis show that the proposed technique significantly improves the learning precision of J4.8 decision tree and SVM classifier.