Large Language Model Assisted Multi-modal Joint Relocalization Method

FENG Chao; LI Zhibiao; ZHANG Lei; ZHANG Guoyong; CHEN Jin; TANG Fei

doi:10.13976/j.cnki.xk.2025.1731

FENG Chao, LI Zhibiao, ZHANG Lei, ZHANG Guoyong, CHEN Jin, TANG Fei. Large Language Model Assisted Multi-modal Joint Relocalization MethodJ. INFORMATION AND CONTROL. DOI: 10.13976/j.cnki.xk.2025.1731

Citation:

FENG Chao, LI Zhibiao, ZHANG Lei, ZHANG Guoyong, CHEN Jin, TANG Fei. Large Language Model Assisted Multi-modal Joint Relocalization MethodJ. INFORMATION AND CONTROL. DOI: 10.13976/j.cnki.xk.2025.1731

Citation:

FENG Chao, LI Zhibiao, ZHANG Lei, ZHANG Guoyong, CHEN Jin, TANG Fei. Large Language Model Assisted Multi-modal Joint Relocalization MethodJ. INFORMATION AND CONTROL. DOI: 10.13976/j.cnki.xk.2025.1731

Large Language Model Assisted Multi-modal Joint Relocalization Method

Graphical Abstract

Graphical Abstract

Abstract

Abstract

A large language model-enhanced multimodal fusion localization method is proposed to address the challenge of mobile robot relocation failure caused by seasonal variations and scene structural changes. A laser-visual collaborative perception framework is constructed to eliminate the dependence of traditional relocation systems on environmental stability. A semantically-guided phased relocation mechanism is innovatively designed by embedding a multimodal large language model into the localization decision loop. During the coarse localization phase, DINOv2-based visual global descriptors are integrated with universal scene text semantic fingerprints (e.g., building features) parsed by the large language model to achieve cross-modal candidate pose retrieval. In the fine localization phase, a point cloud registration algorithm constrained by planar and linear features is employed to effectively suppress dynamic object interference. Complex transformation scenarios commonly found in industrial parks are simulated, and a benchmark dataset containing seasonal variations and spatial dynamic transformations is constructed. Comparative experiments with traditional algorithms are conducted on both public datasets and self-recorded datasets. Experimental results demonstrate that the relocation accuracy of the system under normal illumination conditions remains above 84.5%, confirming the system's stability and robustness in complex and dynamic scenarios.

FullText(HTML)

References (29)

Cited By

Large Language Model Assisted Multi-modal Joint Relocalization Method

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content