Citation: | WANG Kang, MA Zongfang, TIAN Hongpeng, SONG Lin. Belief Density Peak Clustering Algorithm for Uncertain Data[J]. INFORMATION AND CONTROL, 2022, 51(3): 349-360. DOI: 10.13976/j.cnki.xk.2022.1203 |
The density peak clustering algorithm is simple and efficient and does not require iterative calculations. It has the advantages of setting the number of clusters in advance, but it is easy to produce a "domino"effect when dividing non-centered samples. Moreover, it cannot accurately partition the samples and noise in the overlapping area. To solve the above problems, the belief density peak clustering algorithm for uncertain data is proposed. First, the algorithm uses the K-nearest neighbors of non-class center samples to determine the degree of belief of the samples belonging to different clusters based on the density peak clustering algorithm so as to obtain the cluster center samples and partition the samples into a meta-cluster with the largest degree of belief to obtain the preliminary clustering results of K-nearest neighbors. Then, the upper quantile of the density is calculated to obtain the density threshold and credal partition under the framework of evidence reasoning, and isolated samples whose density is less than the threshold are classified into the noise cluster. Afterward, the samples in the overlapping part are partitioned into the composite cluster composed of related single clusters. The degree of belief strongly supports the classification of samples belonging to a certain cluster into the corresponding single cluster. The algorithm introduces the composite cluster and noise cluster to accurately show the uncertainty of the sample under the existing attribute information. Experimental results show that this algorithm can achieve better clustering performance compared with other algorithms on artificial and UCI datasets.
[1] |
陈叶旺, 申莲莲, 钟才明, 等. 密度峰值聚类算法综述[J]. 计算机研究与发展, 2020, 57(2): 378-394. https://www.cnki.com.cn/Article/CJFDTOTAL-JFYZ202002013.htm
Chen Y W, Shen L L, Zhong C M, et al. A review of density peak clustering algorithms[J]. Journal of Computer Research and Development, 2020, 57(2): 378-394. https://www.cnki.com.cn/Article/CJFDTOTAL-JFYZ202002013.htm
|
[2] |
丁伟, 王宏伟, 崔龙, 等. 基于聚类分析和运动描述语言的扑翼飞行机器人行为规划[J]. 信息与控制, 2021, 50(1): 102-112. doi: 10.13976/j.cnki.xk.2021.0211
Ding W, Wang H W, Cui L, et al. Behavior planning of flapping-wing flying robot based on cluster analysis and motion description language[J]. Information and Control, 2021, 50(1): 102-112. doi: 10.13976/j.cnki.xk.2021.0211
|
[3] |
章永来, 周耀鉴. 聚类算法综述[J]. 计算机应用, 2019, 39(7): 1869-1882. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY201907002.htm
Zhang Y L, Zhou Y J. Review of clustering algorithms[J]. Computer Applications, 2019, 39(7): 1869-1882. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY201907002.htm
|
[4] |
Alex R, Alessandro L. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191): 1492-1496. doi: 10.1126/science.1242072
|
[5] |
金辉, 钱雪忠. 自然最近邻优化的密度峰值聚类算法[J]. 计算机科学与探索, 2019, 13(4): 711-720. https://www.cnki.com.cn/Article/CJFDTOTAL-KXTS201904019.htm
Jin H, Qian X Z. Density peak clustering algorithm based on natural nearest neighbour optimization[J]. Computer Science and Exploration, 2019, 13(4): 711-720. https://www.cnki.com.cn/Article/CJFDTOTAL-KXTS201904019.htm
|
[6] |
Li C Z, Zhang Y N. Density peak clustering based on relative density optimization[J]. Mathematical Problems in Engineering, 2020. DOI: 10.1155/2020/2816102.
|
[7] |
Kamali T, Stashuk D W, Discovering density-based clustering structures using neighborhood distance entropy consistency[J]. IEEE Transactions on Computational Social Systems, 2020, 7(4): 1069-1080. doi: 10.1109/TCSS.2020.3003538
|
[8] |
丁志成, 葛洪伟. 优化分配策略的密度峰值聚类算法[J]. 计算机科学与探索, 2020, 14(5): 792-802. https://www.cnki.com.cn/Article/CJFDTOTAL-KXTS202005009.htm
Ding Z C, Ge H W. A density peak clustering algorithm for optimal allocation strategy[J]. Computer Science and Exploration, 2020, 14(5): 792-802. https://www.cnki.com.cn/Article/CJFDTOTAL-KXTS202005009.htm
|
[9] |
Jiang J H, Hao D H, Chen Y J, et al. GDPC: Gravitation-based density peaks clustering algorithm[J]. Physica A: Statistical Mechanics and its Applications, 2018, 502: 345-355. doi: 10.1016/j.physa.2018.02.084
|
[10] |
贾露, 张德生, 吕端端. 物理学优化的密度峰值聚类算法[J]. 计算机工程与应用, 2020, 56(13): 47-53. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG202013006.htm
Jia L, Zhang D S, Lü D D. Physical optimization of density peak clustering algorithm[J]. Computer Engineering and Applications, 2020, 56(13): 47-53. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG202013006.htm
|
[11] |
Yu D H, Liu G J, Guo M Z, et al. Density peaks clustering based on weighted local density sequence and nearest neighbor assignment[J]. IEEE Access, 2019, 7: 34301-34317. doi: 10.1109/ACCESS.2019.2904254
|
[12] |
Liu R, Wang H, Yu X M. Shared-nearest-neighbor-based clustering by fast search and find of density peaks[J]. Information Sciences, 2018, 450: 200-226. doi: 10.1016/j.ins.2018.03.031
|
[13] |
朱庆峰, 葛洪伟. K近邻相似度优化的密度峰聚类[J]. 计算机工程与应用, 2019, 55(2): 148-153. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG201902024.htm
Zhu Q F, Ge H W. Density peaks clustering optimized by K-nearest neighbor's similarity[J]. Computer Engineering and Applications, 2019, 55(2): 148-153. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG201902024.htm
|
[14] |
王鸿飞, 刘海斌, 邓鑫洋, 等. 基于幂均算子和证据理论的故障诊断方法[J]. 信息与控制, 2019, 48(5): 567-572. doi: 10.13976/j.cnki.xk.2019.8531
Wang H F, Liu H B, Deng X Y, et al. Fault diagnosis method based on power-average operator and evidence theory[J]. Information and Control, 2019, 48(5): 567-572. doi: 10.13976/j.cnki.xk.2019.8531
|
[15] |
Shafer G. A mathematical theory of evidence turns 40[J]. International Journal of Approximate Reasoning, 2016, 79: 7-25. doi: 10.1016/j.ijar.2016.07.009
|
[16] |
Ma Z F, Tian H P, Liu Z C, et al. A new incomplete pattern belief classification method with multiple estimations based on KNN[J]. Applied Soft Computing, 2020. DOI: 10.1016/j.asoc.2020.106175.
|
[17] |
Ma Z F, Liu Z, Zhang Y R, et al. Credal transfer learning with multi-estimation for missing data[J]. IEEE Access, 2020, 8: 70316-70328. doi: 10.1109/ACCESS.2020.2983319
|
[18] |
曹可劲, 赵宗贵, 江汉. 不确定性证据聚类问题讨论[J]. 信息与控制, 2006, 35(1): 55-58. http://xk.sia.cn/CN/
Cao K J, Zhao Z G, Jiang H. Discussion on the nonspecific evidence clustering problem[J]. Information and Control, 2006, 35(1): 55-58. http://xk.sia.cn/CN/
|
[19] |
段中兴, 毕瀚元, 张作伟. 基于D-S证据理论的不完整数据混合分类算法[J]. 信息与控制, 2020, 49(4): 455-463. doi: 10.13976/j.cnki.xk.2020.9359
Duan Z X, Bi H Y, Zhang Z W. A hybrid classification algorithm for incomplete data based on d-s evidence theory[J]. Information and Control, 2020, 49(4): 455-463. doi: 10.13976/j.cnki.xk.2020.9359
|
[20] |
柯小路, 马荔瑶, 李子懿, 等. 证据推理规则的性质研究及方法修正[J]. 信息与控制, 2016, 45(2): 165-170. doi: 10.13976/j.cnki.xk.2016.0165
Ke X L, Ma L Y, Li Z Y, et al. Property research and approach modification of evidential reasoning rule[J]. Information and Control, 2016, 45(2): 165-170. doi: 10.13976/j.cnki.xk.2016.0165
|
[21] |
Du M J, Ding S F, Jia H J. Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J]. Knowledge-Based Systems, 2016. DOI: 10.1016/j.knosys.2016.02.001.
|
[22] |
Shafer G. A Mathematical theory of evidence[M]. Princeton, USA: Princeton University Press, 1976: 56-85.
|
[23] |
Dempster A P. Upper and lower probabilities induced by multi-valued mapping[J]. Annals of Mathematical Statistics, 1967, 38(4): 325-339.
|
[24] |
Smets P. Analyzing the combination of conflicting belief functions[J]. Information Fusion, 2007, 8(4): 387-412. doi: 10.1016/j.inffus.2006.04.003
|
[25] |
Su Z G, Denoeux T. BPEC: Belief-peaks evidential clustering[J]. IEEE Transactions on Fuzzy Systems, 2019, 27(1): 111-123. doi: 10.1109/TFUZZ.2018.2869125
|
[26] |
丁志成, 葛洪伟, 周竞. 基于KL散度的密度峰值聚类算法[J]. 重庆邮电大学学报(自然科学版), 2019, 31(3): 367-374. https://www.cnki.com.cn/Article/CJFDTOTAL-CASH201903011.htm
Ding Z C, Ge H W, Zhou J. Density peak clustering algorithm based on KL divergence[J]. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 2019, 31(3): 367-374. https://www.cnki.com.cn/Article/CJFDTOTAL-CASH201903011.htm
|
[27] |
Liu Z G, Pan Q, Dezert J, et al. Credal C-means clustering method based on belief functions[J]. Knowledge-Based Systems, 2015, 74(5): 119-132.
|
1. |
赵嘉,马清,陈蔚昌,肖人彬,崔志华,潘正祥. 面向流形数据的加权自然近邻密度峰值聚类算法. 兰州大学学报(自然科学版). 2024(05): 652-660+669 .
![]() | |
2. |
张利,路颜萍,侯晴,张皓博. K近邻空间密度分布的模糊聚类算法. 辽宁大学学报(自然科学版). 2023(04): 289-301 .
![]() |