Citation: | FU Shuangjie, CHEN Wei, YIN Zhong. Semantic Segmentation Algorithm Combining Self-attention and Feature Adaptive Fusion[J]. INFORMATION AND CONTROL, 2022, 51(6): 680-687, 698. DOI: 10.13976/j.cnki.xk.2022.1584 |
In this study, we design a dual-path segmentation algorithm with an embedded improved self-attention mechanism and adaptive fusion of multi-scale features to solve the existence of multiscale targets in the scene image semantic segmentation task and the lack of global context information acquisition in the feature extraction network. We use the simple downsampling module with double branches in the spacial path to perform downsampling four times to extract high-resolution edge detail information, allowing the network to segment the object boundary accurately. Next, we embed the context capture and adaptive feature fusion modules in the semantic path to provide rich multiscale high semantic context information for the decoding stages and adopt a category balance strategy to further enhance the segmentation effect. After experimental verification, the model obtain the indicators of the mean intersection over union (MIOU) of the proposed model are 59.4% and 60.1% on the Camvid and Aeroscapes datasets, respectively, and has a good segmentation effect.
[1] |
Long J, Shelhamer E, Darrell T. Fully convolutional network for semantic segmentation[C]//IEEE International Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2015: 3431-3440.
|
[2] |
Olaf R, Philipp F, Thomax B. U-Net: Convolutional networks for biomedical image segmentation[C]//IEEEInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Piscataway, USA: IEEE, 2015: 234-241.
|
[3] |
Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. Pattern Analysis & Machine Intelligence, 2017, 39(12): 2481-2495.
|
[4] |
Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2017: 6230-6239.
|
[5] |
Chen L C, Zhu Y K, George P, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 883-851.
|
[6] |
Zhang Z, Zhang X, Peng C, et al. Exfuse: Enhancing feature fusion for semantic segmentation[C]//European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 269-284.
|
[7] |
Yu C Q, Wang J B, Peng C, et al. BiSeNet: Bilateral segmentation network for real-time semanticsegmentation[C]//European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 334-349.
|
[8] |
Li H C, Xiong P F, Fan H Q, et al. DFANet: Deep feature aggregation for real-time semantic segmentation[C]//IEEE International Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2019: 9514-9523.
|
[9] |
Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2019: 3146-3154.
|
[10] |
Zhong Z L, Lin Z Q, Bidart R, et al. Squeeze-and-attention network for semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2020: 13062-13071.
|
[11] |
Liu S, Huang D, Wang Y. Learning spatial fusion for single-shot object detection[DB/OL]. (2019-11-21)[2021-12-18]. https://doi.org/10.48550/arXiv.1911.09516.
|
[12] |
He K, Zhang X, Ren S, et al. Identity mappings in deep residual networks[C]//European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 630-645.
|
[13] |
Wang X, Girshick R, Gupta A, et al. Non-local neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2018: 7794-7803.
|
[14] |
Li Y, Yao T, Pan Y, et al. Contextual transformer networks for visual recognition[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence[2021-11-18]. https://ieeexplore.ieee.org/document/9747984/. DOI: 10.1109/TPAMI.2022.3164083.
|
[15] |
Woo S, Park J, Lee J Y, et al. CBAM: Convolutional block attentio module[C]//European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 3-19.
|
[16] |
Liu S, Huang D. Receptive field block net for accurate and fast object detection[C]//European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 385-400.
|
[17] |
黄泽华, 丁学明. 融合通道注意力机制和深度编解码卷积网络的道路场景分割[J]. 小型微型计算机系统, 2021, 42(11): 2362-2367. https://www.cnki.com.cn/Article/CJFDTOTAL-XXWX202111023.htm
Huang Z H, Ding X M. Road scene segmentation combining channel attention mechanism and deep codec convolutionalnetwork[J]. Journal of Chinese Computer Systems, 2021, 42(11): 2362-2367. https://www.cnki.com.cn/Article/CJFDTOTAL-XXWX202111023.htm
|
[18] |
鲜开义, 杨利萍, 周仁彬, 等. 变电站巡检机器人道路语义分割方法及其应用[J]. 科学技术与工程, 2020, 20(15): 6151-6157. https://www.cnki.com.cn/Article/CJFDTOTAL-KXJS202015038.htm
Xian K Y, Yang L P, Zhou R B, et al. Road semantic segmentation method for substation inspection robot and its application[J]. Science Technology and Engineering, 2020, 20(15): 6151-6157. https://www.cnki.com.cn/Article/CJFDTOTAL-KXJS202015038.htm
|
[19] |
Nigam I, Huang C, Ramanan D. Ensemble knowledge transfer for semantic segmentation[C]//IEEE Winter Conference on Applications of Computer Vision. Piscataway, USA: IEEE, 2018: 1499-1508.
|
[20] |
Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2018: 4510-4520.
|
[21] |
Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient CNN architecture design[C]//European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 116-131.
|
[22] |
Han K, Wang Y, Tian Q, et al. Ghostnet: More features from cheap operations[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2020: 1580-1589.
|
1. |
王甜甜,史卫亚,张世强,张绍文. 采用双支路和Transformer的视杯视盘分割方法. 科学技术与工程. 2023(06): 2499-2508 .
![]() | |
2. |
卢小燕,徐杨,袁文昊. 用于肺部病灶图像分割的多尺度稠密融合网络. 计算机应用. 2023(10): 3282-3289 .
![]() |