论文部分内容阅读
不同数据集中数据的不同分布特征,对于频繁模式挖掘算法往往有着较大影响。将不同的现有算法结合起来,根据数据集的不同特性采用不同的挖掘策略,有可能构造出鲁棒性强的新算法。本文首先提出了一种基于FP-tree的简单深度优先搜索算法NDFS,并简单分析了其在不同数据集上的特性。在分析的基础上,本文进一步将NDFS和经典的FP-growth算法进行结合,提出了一种在挖掘过程中根据局部空间特征动态采用不同策略的自适应算法SAFP。实验证明,SAFP算法在不同数据集上均能达到或优于原有最优算法的性能,具有较好的鲁棒性。
Different distribution characteristics of data in different data sets often have a greater impact on frequent pattern mining algorithms. Combining different existing algorithms and adopting different mining strategies according to the different characteristics of the data set, it is possible to construct a robust new algorithm. In this paper, we first propose a simple depth-first search algorithm based on FP-tree NDFS, and simply analyze its characteristics on different data sets. Based on the analysis, we further combine the NDFS with the classical FP-growth algorithm, and propose an adaptive algorithm SAFP that dynamically adopts different strategies in the mining process based on the local spatial characteristics. Experiments show that the SAFP algorithm can achieve or outperform the original optimal algorithm on different data sets and has good robustness.