Abstract:
A cost-sensitive support vector machine(CS-SVM)algorithm is proposed for the error cost of positive class samples being generally unequal to that of negative class sample in some applications of data mining.The construction of CS-SVM algorithm consists of three steps.Firstly,the post probability of each sample in training set is estimated based on sigmoid function and the distance of each sample to the optimal separate hyper-plane.Secondly,the label of each sample in training set is reconstructed to minimize misclassification cost of each sample.Finally,we can learn the misclassification cost-sensitive hyper-plane using SVM algorithm and the reconstructed training set.On the basis of CS-SVM,a general cost-sensitive classification(G-CSC)algorithm wrapping different misclassification costs of each class sample is proposed.Experimental results show that CS-SVM greatly reduces the average misclassification cost.