融合卷积和Transformer的轻量化抓取检测算法

胡平; 李浩; 张靖

doi:10.13976/j.cnki.xk.2025.2973

融合卷积和Transformer的轻量化抓取检测算法

Lightweight Grasp Detection Algorithm via a CNN-Transformer Fusion

摘要

摘要: 面向非结构化与杂乱场景的实时抓取，提出位置感知的轻量化抓取检测网络PAG-Net（Position-aware grasping network）。方法在MBConv(Mobile inverted bottleneck block convolution)中嵌入PEG（Positional encoding generator）以显式注入空间位置信息，并通过增强特征引入改进的LowFormer（Low-resolution Transformer）进一步提升网络的计算效率。网络以像素级生成方式同时预测抓取质量、角度与夹爪宽度，有效提升抓取检测的精度与速度。实验表明，PAG-Net在Cornell数据集上的准确率达到98.8%的准确率，在Jacquard数据集上达到96.1%。在Pybullet仿真测试中，PAG-Net在杂乱环境中实现了约94%的抓取成功率，证明了该网络在复杂场景中的鲁棒性。

Abstract: To address real-time grasping in unstructured and cluttered environments, we propose a position-aware lightweight grasp detection network, PAG-Net (Position-aware grasping network). The method embeds PEG（Positional encoding generator）into MBConv (Mobile inverted bottleneck block convolution) to explicitly inject spatial position information, and further enhances network computational efficiency through the introduction of an improved LowFormer（Low-resolution Transformer）. The network simultaneously predicts grasp quality, angle, and gripper width in a pixel-wise generation manner, effectively improving the accuracy and speed of grasp detection. Experimental results show that PAG-Net achieves an accuracy of 98.8% on the Cornell dataset and 96.1% on the Jacquard dataset. In Pybullet simulation tests, PAG-Net achieves approximately 94% grasping success rate in cluttered environments, proving the robustness of the proposed network in complex scenes.

HTML全文

参考文献(21)

施引文献

资源附件(0)