陈刚, 王丽娟. 基于高斯混合模型的非平衡数据对称翻转算法[J]. 信息与控制, 2020, 49(2): 203-209, 218. DOI: 10.13976/j.cnki.xk.2020.9292
引用本文: 陈刚, 王丽娟. 基于高斯混合模型的非平衡数据对称翻转算法[J]. 信息与控制, 2020, 49(2): 203-209, 218. DOI: 10.13976/j.cnki.xk.2020.9292
CHEN Gang, WANG Lijuan. Symmetric Inverting Algorithm for Imbalanced Datasets Based on Gaussian Mixture Model[J]. INFORMATION AND CONTROL, 2020, 49(2): 203-209, 218. DOI: 10.13976/j.cnki.xk.2020.9292
Citation: CHEN Gang, WANG Lijuan. Symmetric Inverting Algorithm for Imbalanced Datasets Based on Gaussian Mixture Model[J]. INFORMATION AND CONTROL, 2020, 49(2): 203-209, 218. DOI: 10.13976/j.cnki.xk.2020.9292

基于高斯混合模型的非平衡数据对称翻转算法

Symmetric Inverting Algorithm for Imbalanced Datasets Based on Gaussian Mixture Model

  • 摘要: 针对传统分类器对于非平衡数据的分类效果存在的问题,提出了一种基于高斯混合模型-期望最大化(GMM-EM)的对称翻转算法.该算法的核心思想是基于概率论中的"3σ法则"使数据达到平衡.首先,利用高斯混合模型和EM算法得到多数类与少数类数据的密度函数;其次,以少数类数据的均值为对称中心,根据"3σ法则"确定多数类侵入少数类的翻转边界,进行数据翻转,同时剔除与翻转区间中少数类原始数据数据重复的点;此时,若两类数据不平衡,则在翻转区域内使用概率密度增强方法使数据达到平衡.最后,从UCI、KEEL数据库中选取的14组数据使用决策树分类器对平衡后的数据进行分类,实例分析表明了该算法的有效性.

     

    Abstract: Facing the unfavorable classification on imbalanced datasets, we propose a symmetric inverting algorithm based on Gaussian mixture model and expectation maximization (GMM-EM). The algorithm is used to balance the datasets based on the "3σ rule" in probability theory. Firstly, we obtain the density functions of the minority class and majority class using GMM algorithm and EM algorithm. Secondly, we operate the symmetric transformation of minority class after obtaining the centers and the radius of the inverting region according to the "3σ rule." After the inverting process, we eliminate the repetitive points of the original data of the minority class. At this moment, if the two types of data are imbalanced, the samples of the minority class are generated by using the probability density enhancing method. Finally, we apply our algorithm and other methods together with decision tree classifier for assessment. We choose 14 imbalanced datasets from UCI and KEEL repositories. Experimental results show that our algorithm is more effective than other methods.

     

/

返回文章
返回