基于扩展补丁对的弱监督语义分割网络

孙铭辰; 葛洪伟; 李婷

doi:10.13976/j.cnki.xk.2024.0971

基于扩展补丁对的弱监督语义分割网络

Weakly Supervised Semantic Segmentation Network Based on Extended Patch Pairs

摘要

摘要: 针对弱监督语义分割任务中类激活图(class activation map，CAM)与对象种子相关性低、种子区域覆盖目标不完全的问题，提出基于扩展补丁对的弱监督语义分割网络。首先，提出扩展补丁对，并从信息论角度证明扩展补丁对得到的CAM的自信息总和大于标准CAM的自信息，扩展补丁对的CAM与对象种子相关性更高。其次，提出高低阶特征自注意力聚合模块，将图像低阶特征和CAM分别通过自注意力机制增强后聚合，逐像素细化CAM。最后，设计三重网络，将原始图片以及图片的扩展补丁对作为网络输入，通过缩小原始图像CAM与扩展补丁对的CAM之间的差距，训练得到分割精度更高的弱监督语义分割网络。在Pascal VOC 2012验证集和测试集上进行实验评估，平均交并比(mean Intersection over Union，mIoU)分别为72.1%和73.0%。实验结果表明，该网络性能优于当前主流图像级标签弱监督语义分割方法。

Abstract: In weakly supervised semantic segmentation, class activation maps (CAMs) often suffer from poor correlation with object seeds and incomplete area coverage on targets. To address these defects, we introduce a weakly supervised semantic segmentation network based on extended patch pairs. First, we propose the concept of extended patch pairs and demonstrate, through information theory, that the total self-information of CAMs obtained from extended patch pairs exceeds that of standard CAMs, thus achieving a higher correlation with object seeds. Second, we introduce a higher-lower feature self-attention combination module that enhances low-level features and CAMs through self-attention mechanisms and combines them to refine CAMs pixel by pixel. Finally, we design a triple network architecture that takes the original image and its extended patch pairs as network inputs. By narrowing the gap between the CAM of the original image and that of the extended patch pair, the network achieves higher segmentation accuracy. Experimental evaluations on the Pascal VOC 2012 validation and test sets yielded mean intersection over union (mIoU) scores of 72.1% and 73.0%, respectively. The experimental results show that the performance of this network outperforms current mainstream image-level weakly supervised semantic segmentation methods.

HTML全文

参考文献(29)

施引文献

资源附件(0)