Abstract:
Automated segmentation of the lumbar spine anatomical region plays a crucial role in the automated analysis pipeline of spinal images. Although classical convolutional neural networks can capture global image features, their inherent local priors and weight-sharing characteristics limit their ability to model long-range dependencies. To address these issues, a Swin Transformer hybrid network is proposed for the segmentation of the lumbar anatomical region. Firstly, the Swin Transformer hybrid network and multi-scale dilated convolution are combined as an encoder to achieve the hierarchical representation of global and local features. Additionally, a feature coupling module is designed, which couples the features of the Transformer and CNN in the channel and spatial dimensions, enhancing the model′s local and long-distance modeling capabilities. Dealing with data scarcity problems, a dataset composed of 663 lumbar vertebrae CT images with voxel-level labeled annotations is proposed. Experiments on this dataset show that the segmentation accuracy of the proposed model surpasses that of typical medical image segmentation methods. Specifically, the dice coefficient, the Hausdorff distance, and the average surface distance of the proposed model are 88. 24%, 14. 48, and 0. 997, respectively. Ablation experiments further verify the effectiveness of the proposed modules.