论文部分内容阅读
基于密度的聚类是聚类分析中的一种,其主要优点是发现任意形状的聚类和对噪音数据不敏感.文章提出了一种新的基于网格密度和空间划分树的CGDSPT(Clustering based on Grid-Density and Spatial Partition Tree)聚类算法.其创新点在于,将数据空间划分成多个体积相等的单元格,然后基于单元格定义了密度、簇等概念,对单元格建立了一种基于空间划分的空间索引结构(空间划分树)来对数据进行聚类.CGDSPT算法保持了基于密度的聚类算法的上述优点,而且CGDSPT算法具有线性的时间复杂性,因此CGDSPT算法适合对大规模数据的挖掘.理论分析和实验结果也证明了CGDSPT算法的优点.
Density-based clustering is a kind of clustering analysis, whose main advantage is to find clusters of arbitrary shape and insensitive to noise data. In this paper, a new CGDSPT (Clustering based on the Grid-Density and Spatial Partition Tree clustering algorithm. Its innovation lies in the data space is divided into multiple equal-sized cells, and then based on the cell definition of density, clustering and other concepts, the cell has been established Based on spatial partitioning of spatial index structure (spatial partitioning tree) to cluster the data.CGDSPT algorithm maintains the above advantages of density-based clustering algorithm, and CGDSPT algorithm has linear time complexity, so CGDSPT algorithm is suitable for large The mining of scale data.The theoretical analysis and experimental results also prove the advantages of CGDSPT algorithm.