Abstract:
To address the problems of the traditional incomplete data imputation clustering algorithm, which does not consider the influence of imputation on the class center and the uncertainty caused by incomplete sample modeling, a hybrid classification algorithm for incomplete data based on the D-S evidence theory (HCA) is proposed. First, the classical soft clustering algorithm is used to cluster the complete samples in the dataset and select the training samples. Then, several training sets are constructed on the basis of the known attributes of the remaining samples, and the basic classifiers are used to classify them. Under the D-S evidence theory, samples belonging to several classes with similar probability are divided into corresponding metaclasses to reduce the misclassification rate. Finally, the incomplete samples in the metaclasses are classified after imputing by
K-nearest neighbor to their hard-to-distinguish classes, and several classification results are adaptively fused to determine the final class of these samples. The validation of the simulated datasets and UCI standard datasets show that the algorithm can reasonably represent the uncertainty caused by missing values and reduce the error rate.