论文部分内容阅读
鉴于传统的K-means聚类算法只限于处理数值型数据,将K-means算法扩展到分类型数据域,提出一种分类型数据聚类方法.根据与每个分类属性的每个值相关的数据分布信息,同时结合数据的纵向与横向分布来评价数据对象与类之间的差异性,定义了一种新的距离度量.该方法能发现同一属性不同值间的内在关系,并能有效地度量对象间的差异性.用UCI中的数据集对所提算法进行验证,实验结果表明了该算法具有较好的聚类效果.
In view of the fact that the traditional K-means clustering algorithm only deals with the numerical data and extends the K-means algorithm to the classified data domains, a clustering data classification method is proposed. According to each value associated with each classification attribute Data distribution information, and at the same time to evaluate the difference between data objects and classes according to the vertical and horizontal distribution of data, a new distance measure is defined, which can find the intrinsic relationship between different values of the same attribute and effectively Measure the difference between objects.Using the data set in UCI to validate the proposed algorithm, the experimental results show that the algorithm has good clustering effect.