琚玲, 周星群, 胡志强, 杨翊, 李黎明, 白士红. 基于合成数据的水下机器人视觉定位方法[J]. 信息与控制, 2023, 52(2): 129-141. DOI: 10.13976/j.cnki.xk.2023.2257
引用本文: 琚玲, 周星群, 胡志强, 杨翊, 李黎明, 白士红. 基于合成数据的水下机器人视觉定位方法[J]. 信息与控制, 2023, 52(2): 129-141. DOI: 10.13976/j.cnki.xk.2023.2257
JU Ling, ZHOU Xingqun, HU Zhiqiang, YANG Yi, LI Liming, BAI Shihong. Visual Localization Method of Autonomous Underwater Vehicle Based on Synthetic Data[J]. INFORMATION AND CONTROL, 2023, 52(2): 129-141. DOI: 10.13976/j.cnki.xk.2023.2257
Citation: JU Ling, ZHOU Xingqun, HU Zhiqiang, YANG Yi, LI Liming, BAI Shihong. Visual Localization Method of Autonomous Underwater Vehicle Based on Synthetic Data[J]. INFORMATION AND CONTROL, 2023, 52(2): 129-141. DOI: 10.13976/j.cnki.xk.2023.2257

基于合成数据的水下机器人视觉定位方法

Visual Localization Method of Autonomous Underwater Vehicle Based on Synthetic Data

  • 摘要: 针对水下场景水下机器人(AUV)位姿数据集难以获取、现有的基于深度学习的位姿估计方法无法应用的问题,提出了一种基于合成数据的AUV视觉定位方法。首先基于Unity3D仿真搭建虚拟水下场景,通过虚拟相机获取仿真环境下已知的渲染位姿数据。其次,通过非配对图像转换工作实现渲染图片到真实水下场景下的风格迁移,结合已知渲染图片的位姿信息得到了合成的水下位姿数据集。最后,提出一种基于局部区域关键点投影的卷积神经网络(CNN)位姿估计方法,并基于合成数据训练网络,预测已知参考角点的2维投影,产生2D-3D点对,基于随机一致性采样的Perspective-n-Point(PnP)算法获得相对位置和姿态。本文在渲染数据集以及合成数据集上进行了定量实验,并在真实水下场景进行了定性实验,论证了所提出方法的有效性。实验结果表明,非配对图像转换能够有效消除渲染图像与真实水下图像之间的差距,所提出的局部区域关键点投影方法可以进行更有效的6D位姿估计。

     

    Abstract: The autonomous underwater vehicle (AUV) pose dataset is difficult to obtain in underwater scenarios. In addition, the existing deep learning-based pose estimation methods cannot be applied in this scenario. Thus, this paper proposes an AUV visual localization method based on synthetic data. In this method, we first build a virtual underwater scene by Unity3D and obtain the rendering data of the known pose through the virtual camera. Then, we realize the style transfer of the rendered image to the real underwater scene through the unpaired image translation work. We also obtain the synthetic underwater pose dataset by combining the pose information of the known rendered image. Finally, we propose a convolutional neural network (CNN) pose estimation method based on local region keypoint projections. The CNN is trained using synthetic data to predict 2D projections of known reference corners. The resulting 2D-3D point pairs obtain the relative positions and pose through the Perspective-n-Point algorithm that is based on random sample consensus. The effectiveness of the proposed method is examined using quantitative experiments on rendered datasets and synthetic datasets, as well as qualitative experiments on real underwater scenes. Our experimental results show that the unpaired image translation can effectively eliminate the gap between the rendered image and the real underwater image. We also find that the proposed local area keypoint projection method can perform more effective 6D pose estimation.

     

/

返回文章
返回