Abstract:
A lightweight spatial location attention module (SLAM) is proposed to address the shortcomings of previous attention methods that often overlook the critical role of spatial location information in cross-dimensional interactions. This module calculates the location information attention weights of the input feature map across horizontal, vertical, and channel directions through three branch structures, which results in the aggregation of features along the three spatial directions for adaptive adjustment of spatial and positional information attention weights in the feature map. Based on this module, ResNet18, ResNet50, and MobileNetV2 networks are improved, and a large number of experiments are conducted for image classification tasks. The results show that SLAM considerably improves model performance, outperforming other attention methods. In particular, on the classification tasks of ImageNet-1K and Stanford-Cars datasets, the Top-1 accuracy of the ResNet18, ResNet50, and MobileNetV2 networks improved by SLAM is the highest, increasing by 2.62% and 2.4%, respectively. In the scrap steel rating task, the YOLOv5s and YOLOv8s networks enhanced with SLAM show improvements across four indicators: recall,
F1 Score, mAP0.5:0.95, and mAP0.5. These results surpass the performance of the networks improved with convolutional block attention module and coordinate attention.