基于时间序列模型的非平衡数据的过采样算法

Oversampling Algorithm for Imbalanced Datasets Based on Time Series Model

  • 摘要: 针对非平衡数据的再平衡问题,提出了一种基于时间序列模型的过采样算法.首先,提出了一种确定性数据转化为随机数据方法,把少数类数据转化为时间序列;其次,对经少数类数据转化而成的时间序列进行平稳性检验,并进行平稳化处理;再次,对平稳后的序列建立合适的时间序列模型并进行预报,从而使数据集达到平衡.最后,从UCI(University of Californialrvine)和KEEL(Knowledge Extraction based on Evolutionary Learning)数据库中选择6组数据集,将所提算法与其他常用的过采样算法进行比较,并使用决策树分类器进行分类实验.利用评价指标对分类实验结果进行评判,结果表明了本文所提算法的有效性.

     

    Abstract: This study proposes an oversampling algorithm based on a time series model to address the rebalancing problem of imbalanced data. First, a method of converting deterministic data into random data is proposed through which minority data are converted into time series. Second, a stationarity test is performed on time series transformed from the minority class, and stationary processing is carried out. Third, the stationary series is fitted to obtain a suitable time series model and forecast the minority class. In this way, the datasets are balanced. Lastly, six datasets are selected from UCI and KEEL repositories, and the proposed algorithm is compared with other common oversampling algorithms. A decision tree classifier is utilized to perform classification experiments. Evaluation indicators are used to examine the results of classification experiments. The results show the effectiveness of the proposed algorithm.

     

/

返回文章
返回