Large Language Model Assisted Multi-modal Joint Relocalization Method
-
Graphical Abstract
-
Abstract
A large language model-enhanced multimodal fusion localization method is proposed to address the challenge of mobile robot relocation failure caused by seasonal variations and scene structural changes. A laser-visual collaborative perception framework is constructed to eliminate the dependence of traditional relocation systems on environmental stability. A semantically-guided phased relocation mechanism is innovatively designed by embedding a multimodal large language model into the localization decision loop. During the coarse localization phase, DINOv2-based visual global descriptors are integrated with universal scene text semantic fingerprints (e.g., building features) parsed by the large language model to achieve cross-modal candidate pose retrieval. In the fine localization phase, a point cloud registration algorithm constrained by planar and linear features is employed to effectively suppress dynamic object interference. Complex transformation scenarios commonly found in industrial parks are simulated, and a benchmark dataset containing seasonal variations and spatial dynamic transformations is constructed. Comparative experiments with traditional algorithms are conducted on both public datasets and self-recorded datasets. Experimental results demonstrate that the relocation accuracy of the system under normal illumination conditions remains above 84.5%, confirming the system's stability and robustness in complex and dynamic scenarios.
-
-