基于支持向量机的代价敏感挖掘

SVM-based Cost Sensitive Mining

  • 摘要: 针对一些数据挖掘应用中反例样本和正例样本具有不同误分类代价的情况,提出一种代价敏感支持向量机算法CS-SVM.CS-SVM包括3个步骤:首先,引入Sigmoid函数,根据样本到分类超平面的距离估计其后验概率;然后,根据误分类代价最小原则重构训练样本的类标号;最后,在重构后的训练集上使用标准SVM进行学习即得到嵌入误分类代价的最优分类超平面.基于CS-SVM的思路,提出一个通用的嵌入误分类代价的代价敏感分类算法G-CSC.试验结果表明:相比于SVM,CS-SVM大大降低测试集上的平均误分类代价.

     

    Abstract: A cost-sensitive support vector machine(CS-SVM)algorithm is proposed for the error cost of positive class samples being generally unequal to that of negative class sample in some applications of data mining.The construction of CS-SVM algorithm consists of three steps.Firstly,the post probability of each sample in training set is estimated based on sigmoid function and the distance of each sample to the optimal separate hyper-plane.Secondly,the label of each sample in training set is reconstructed to minimize misclassification cost of each sample.Finally,we can learn the misclassification cost-sensitive hyper-plane using SVM algorithm and the reconstructed training set.On the basis of CS-SVM,a general cost-sensitive classification(G-CSC)algorithm wrapping different misclassification costs of each class sample is proposed.Experimental results show that CS-SVM greatly reduces the average misclassification cost.

     

/

返回文章
返回