Abstract:
Facing the unfavorable classification on imbalanced datasets, we propose a symmetric inverting algorithm based on Gaussian mixture model and expectation maximization (GMM-EM). The algorithm is used to balance the datasets based on the "3
σ rule" in probability theory. Firstly, we obtain the density functions of the minority class and majority class using GMM algorithm and EM algorithm. Secondly, we operate the symmetric transformation of minority class after obtaining the centers and the radius of the inverting region according to the "3
σ rule." After the inverting process, we eliminate the repetitive points of the original data of the minority class. At this moment, if the two types of data are imbalanced, the samples of the minority class are generated by using the probability density enhancing method. Finally, we apply our algorithm and other methods together with decision tree classifier for assessment. We choose 14 imbalanced datasets from UCI and KEEL repositories. Experimental results show that our algorithm is more effective than other methods.