Abstract:
In complex scenes, object detection faces significant challenges due to occlusion, scale variation, and fast motion. To address these issues, this paper proposes an efficient object detection algorithm for complex ball sports scenes based on Transformer. First, a novel lightweight feature selection LFS module is introduced to replace the original backbone blocks, which effectively balances detection accuracy and computational cost. Second, a context-enhanced feature fusion structure CEF-BiFPN is designed, drawing on the advantages of the BiFPN architecture. It incorporates a global-local spatial attention mechanism to replace conventional convolutional transformations, enabling more efficient information aggregation and scale alignment. Finally, the GIoU loss is replaced with WIoU v3 loss to improve detection accuracy and convergence speed. Experimental results show that the improved model achieves mAP gains of 2.5% and 2.1% over the baseline model RT-DETR on the Basketball Detect and the expanded SportsMOT dataset (SportsMOT++), respectively, while reducing the number of parameters by 33%, with significant optimizations in model size and computational complexity.