基于D-S证据理论的不完整数据混合分类算法

A D-S Evidence Reasoning Based Hybrid Classification Algorithm for Incomplete Data

  • 摘要: 针对传统不完整数据插补聚类算法未考虑插补值对类中心的影响以及不完整样本建模带来的不确定性等问题,提出了一种基于D-S证据理论的不完整数据混合分类算法.首先,利用经典软聚类算法对数据集中的完整样本进行聚类并选择训练样本,再根据剩余样本已知属性构建若干训练集,并利用基础分类器分类;然后在D-S证据理论下,将属于若干个类别概率相近的样本划分到相应复合类以降低误分类率;最后,对处于复合类中的不完整样本,分别在构成其复合类的单类中进行K近邻插补并分类,将若干个分类结果自适应融合以决定这些样本的最终类别.模拟数据集和UCI数据集验证表明,算法能够合理地表征由缺失值引起的不确定性,降低了误分率.

     

    Abstract: To address the problems of the traditional incomplete data imputation clustering algorithm, which does not consider the influence of imputation on the class center and the uncertainty caused by incomplete sample modeling, a hybrid classification algorithm for incomplete data based on the D-S evidence theory (HCA) is proposed. First, the classical soft clustering algorithm is used to cluster the complete samples in the dataset and select the training samples. Then, several training sets are constructed on the basis of the known attributes of the remaining samples, and the basic classifiers are used to classify them. Under the D-S evidence theory, samples belonging to several classes with similar probability are divided into corresponding metaclasses to reduce the misclassification rate. Finally, the incomplete samples in the metaclasses are classified after imputing by K-nearest neighbor to their hard-to-distinguish classes, and several classification results are adaptively fused to determine the final class of these samples. The validation of the simulated datasets and UCI standard datasets show that the algorithm can reasonably represent the uncertainty caused by missing values and reduce the error rate.

     

/

返回文章
返回