论文部分内容阅读
针对k-means算法事先必须获知聚类数目以及难以确定初始中心的缺点,提出了一种改进的k-means聚类算法.首先引入轮廓系数的概念,通过计算不同K值下簇集中各对象的轮廓系数确定事先未知分类信息的数据集中所包含的最优聚类数Kopt;然后通过凝聚层次聚类的方法获得数据集的分布,确定初始聚类中心;最后利用传统的k-means方法完成聚类.理论分析表明,所提出的算法具有适度的计算复杂度.IRIS测试数据集的实验结果表明了该算法能够合理区分不同类型的簇集,且可以有效地识别离群点,聚合后的结果簇集具有较低的熵值.
For k-means algorithm, we must know the number of clusters in advance and the shortcoming that it is difficult to determine the initial center. An improved k-means clustering algorithm is proposed. Firstly, the concept of contour coefficients is introduced. The contour coefficient determines the optimal cluster number Kopt contained in the data set of the unknown classification information in advance and then obtains the distribution of the data set through the method of agglomerative hierarchical clustering to determine the initial cluster center. Finally, the traditional k-means method is used to complete the poly The theoretical analysis shows that the proposed algorithm has moderate computational complexity.The experimental results of the IRIS test dataset show that the algorithm can reasonably distinguish different types of clusters and can effectively identify the outliers and the aggregated results Clusters have lower entropy values.