基于语义引导门控与混合状态空间 U-Net 的铁谱图像分割

李振林; 宋佳声; 王永坚

doi:10.13976/j.cnki.xk.2025.4093

基于语义引导门控与混合状态空间 U-Net 的铁谱图像分割

Ferrography Image Segmentation Based on Semantic Guided Gating and Hybrid State Space U-Net

摘要

摘要: 针对铁谱图像低对比度、光照不均及强背景噪声干扰导致现有方法难以兼顾全局语义连通性与局部边缘精确性的问题，本文提出一种基于语义引导门控与混合状态空间架构的混合Mamba聚合U-Net分割模型HMA-UNet。该模型构建 ConvNeXt 与视觉状态空间（VSS）的混合编码器，用来协同捕获图像的全局依赖与局部细节；在瓶颈层引入混合 Mamba 聚合器（HMA），通过集成多尺度空洞感知、显式边界增强及金字塔池化模块（PPM）来深度重构复杂目标的拓扑特征；在跳跃连接中设计基于语义引导的跨尺度融合门（CSFG）以有效抑制背景噪声并锐化微弱边界。实验结果表明，HMA-UNet 在自建铁谱数据集和公开数据集FSSD-12上的Dice系数分别达到了 0.9241和0.9022，同时其95%豪斯多夫距离（HD95）分别降至19.0302像素和11.2268像素，展现了优异的分割性能与泛化能力。

Abstract: To address the problem that existing methods struggle to balance global semantic connectivity and local edge precision in ferrography images due to low contrast, uneven illumination, and strong background noise, this paper proposes HMA-UNet, a hybrid Mamba aggregation U-Net segmentation model based on semantic-guided gating and a hybrid state space architecture. The model constructs a hybrid encoder composed of ConvNeXt and visual state space (VSS) modules to collaboratively capture global dependencies and local details in images; introduces a hybrid Mamba aggregator (HMA) at the bottleneck layer to deeply reconstruct the topological features of complex targets by integrating multi-scale dilated perception, explicit boundary enhancement, and a pyramid pooling module (PPM); and designs a semantic-guided cross-scale fusion gate (CSFG) in the skip connections to effectively suppress background noise and sharpen weak boundaries. Experimental results show that HMA-UNet achieves Dice coefficients of 0.9241 and 0.9022 on the self-built ferrography dataset and the public FSSD-12 dataset, respectively, while reducing the 95% Hausdorff distance (HD95) to 19.0302 pixels and 11.2268 pixels, respectively, demonstrating excellent segmentation performance and generalization ability.

HTML全文

参考文献(22)

施引文献

资源附件(0)