Abstract:
Convolutional neural networks perform outstandingly in computer vision tasks but often face long inference times, large parameter sizes, and large floating point of operations. We identify multi-scale feature redundancy in hierarchical convolutional neural networks and develop an efficient multi-scale feature fusion module, the mixed and difference enhancement module. The mix block merges redundant features and enhances feature learning by leveraging this redundancy. The difference enhancement block focuses on the differences between features, optimizing the module's representation ability in small-sample tasks. We integrate the mixed and difference enhancement module into various network models for different tasks. Experiment results demonstrate that the mixed and difference enhancement module, as a plug-and-play component, reduces the parameter sizes, floating point of operations, and inference times without complex adjustments to the existing model. The mixed and difference enhancement module also exhibits superior feature representation abilities and significantly improves performance.