基于实例迁移的数据流分类挖掘方法

Classification Mining Method for Data Streams Based on Instances Transfer

  • 摘要: 为解决数据流分类过程中样本标注和概念漂移问题,提出了一种基于实例迁移的数据流分类挖掘模型.首先,该模型用支持向量机作学习器,用所得分类模型中的支持向量构建源领域,待分类的当前数据块为目标域.然后,借助互近邻思想在源域中挑选目标域中样本的真邻居进行实例迁移,避免发生负迁移.最后,通过合并目标域和迁移样本形成训练集,提高标注样本数量,增强模型的泛化能力.理论分析和实验结果表明,所提方法具有可行性,相比其它学习方法在分类准确性方面更具优势.

     

    Abstract: To solve the problem of sample labeling and concept drift in the process of data streams classification, we propose an instance-based transfer data streams classification model. First, we use support vector machine as the learning machine in this model. The support vectors constitute the source domain, and the current data block forms the target domain. Then, we select the real neighbors of the target domain from the source domain according to mutual neighbor concept; as a result, the occurrence of negative transfer can be neglected. Finally, we combine the target domain and the transfer sample to form a training set, and this enlarges the number of labeled sample and enhances the generalization ability of the classifier model. Through the analysis of theory and the experiment results, the method is found to be feasible and superior to the other learning methods in terms of classification accuracy.

     

/

返回文章
返回