Abstract:
In recent years, improving the performance of mining frequent patterns in massive uncertain datasets has become an active research topic. Most traditional algorithms for mining frequent patterns consider only a single factor of data items-any of expectation, probability, or weight, while for those algorithms that consider both probability and weight, it is difficult to balance execution efficiency when big data are involved. Therefore, we propose a Spark framework-based algorithm for mining frequent patterns according to expected weight for uncertain datasets (UWEFP for short). To consider both the probabilities and weights of items, UWEFP first calculates the maximum probability weight value of one set and to prune them. A novel UWEFP-tree structure with the advantages of Spark framework is designed to mine frequent patterns; it has the FP-tree characteristics and reduces the time of scanning the datasets. Finally, in the Spark environment, UCI datasets are used to verify the algorithm. The experimental results show that the proposed algorithm is effective and has excellent performance.