Abstract:
To address the problem that existing methods struggle to balance global semantic connectivity and local edge precision in ferrography images due to low contrast, uneven illumination, and strong background noise, this paper proposes HMA-UNet, a hybrid Mamba aggregation U-Net segmentation model based on semantic-guided gating and a hybrid state space architecture. The model constructs a hybrid encoder composed of ConvNeXt and visual state space (VSS) modules to collaboratively capture global dependencies and local details in images; introduces a hybrid Mamba aggregator (HMA) at the bottleneck layer to deeply reconstruct the topological features of complex targets by integrating multi-scale dilated perception, explicit boundary enhancement, and a pyramid pooling module (PPM); and designs a semantic-guided cross-scale fusion gate (CSFG) in the skip connections to effectively suppress background noise and sharpen weak boundaries. Experimental results show that HMA-UNet achieves Dice coefficients of
0.9241 and
0.9022 on the self-built ferrography dataset and the public FSSD-12 dataset, respectively, while reducing the 95% Hausdorff distance (HD95) to
19.0302 pixels and
11.2268 pixels, respectively, demonstrating excellent segmentation performance and generalization ability.