Abstract:
In the process of large data application, it is necessary to reduce the feature set for improving the generalization ability of the data model. We use random forest model selection and similarity measure to select feature sets. Then, we adopt the forward search strategy to finish the second filtering. In the algorithmic model, it uses local traversal because it can be helpful to enhance the execution efficiency. At the same time, it can effectively solve the problem about how to determine the optimal number of features. The experimental results show that this method can obtain the feature subset more effectively and improve the classification accuracy.