大语言模型辅助的多模态联合重定位方法

Large Language Model Assisted Multi-modal Joint Relocalization Method

  • 摘要: 针对季节更迭与场景结构变异导致移动机器人重定位失效的难题,本研究提出一种大语言模型增强的多模态融合定位方法。该方法通过构建激光-视觉协同感知框架,克服传统重定位系统对环境稳定性的依赖。创新性地将多模态大语言模型嵌入定位决策环路,设计语义引导的分阶段重定位机制:在粗略定位阶段,联合基于DINOv2的视觉全局描述子与大语言模型解析的通用性场景文本语义指纹(如楼宇特征),实现跨模态候选位姿检索;在精细定位阶段,采用基于平面、直线特征约束的点云配准算法有效抑制动态物体干扰。实验模拟了常见的工业园区中的复杂变换场景,构建了包含季节变化与空间动态变换的基准数据集。并且同时在公开数据集与自录数据集中进行与传统算法进行对比实验,实验结果表明在正常光照情况下的该系统的重定位准确率保持在84.5%以上,证明了该系统在复杂、动态场景中的稳定性与鲁棒性。

     

    Abstract: A large language model-enhanced multimodal fusion localization method is proposed to address the challenge of mobile robot relocation failure caused by seasonal variations and scene structural changes. A laser-visual collaborative perception framework is constructed to eliminate the dependence of traditional relocation systems on environmental stability. A semantically-guided phased relocation mechanism is innovatively designed by embedding a multimodal large language model into the localization decision loop. During the coarse localization phase, DINOv2-based visual global descriptors are integrated with universal scene text semantic fingerprints (e.g., building features) parsed by the large language model to achieve cross-modal candidate pose retrieval. In the fine localization phase, a point cloud registration algorithm constrained by planar and linear features is employed to effectively suppress dynamic object interference. Complex transformation scenarios commonly found in industrial parks are simulated, and a benchmark dataset containing seasonal variations and spatial dynamic transformations is constructed. Comparative experiments with traditional algorithms are conducted on both public datasets and self-recorded datasets. Experimental results demonstrate that the relocation accuracy of the system under normal illumination conditions remains above 84.5%, confirming the system's stability and robustness in complex and dynamic scenarios.

     

/

返回文章
返回