基于增强感知 Transformer 的低光照环境目标驱动导航

王汇丰; 孙然; 丁德锐

doi:10.13976/j.cnki.xk.2025.1771

基于增强感知 Transformer 的低光照环境目标驱动导航

Target-Driven Navigation in Low-Light Environments Based on Enhanced Perception Transformer

摘要

摘要: 针对移动机器人在未知、低光照环境下执行目标驱动视觉导航(Target-Driven Visual Navigation, TDN)时面临的感知退化、数据效率低及实时性问题，提出了一种基于增强感知Transformer (Enhanced Perception Transformer, EPT)的集成导航框架。首先利用包含增强因子提取与循环图像增强过程的高效低光图像增强技术提升原始视觉输入的质量；随后，EPT编码器将增强后的视觉信息与相对极坐标表示的目标状态深度融合，通过目标令牌和多头自注意力机制生成目标导向的场景表征。在此基础上利用Soft Actor-Critic (SAC)算法进行导航决策。为保障实时性，框架集成了输入下采样和即时编译(JIT)等性能优化手段。最后使用Gazebo进行仿真实验，结果表明在多种低光照室内环境中，提出的EPT-SAC框架经过性能优化后图像最大处理帧率及延迟可满足移动机器人实时导航的需求，在导航成功率和学习效率方面均显著优于传统基线方法，实验室环境和仓库环境中的平均成功率分别达到了61.2%和82%，能够有效的在低光视觉目标驱动导航任务中增强对障碍物及目标位置的识别。

Abstract: To address the challenges of perceptual degradation, low data efficiency, and real-time performance issues encountered by mobile robots during Target-Driven Visual Navigation (TDN) in unknown, low-light environments, this paper proposes an integrated navigation framework based on an Enhanced Perception Transformer (EPT). Initially, an efficient low-light image enhancement technique, comprising an Enhancement Factor Extraction (EFE) network and a Recurrent Image Enhancement (RIE) process, is utilized to improve the quality of raw visual inputs. Subsequently, the EPT encoder deeply fuses the enhanced visual information with the target state, represented in relative polar coordinates, generating a goal-oriented scene representation by means of goal tokens and a multi-head self-attention mechanism. Based on this representation, a Soft Actor-Critic (SAC) algorithm is employed for navigation decision-making. To ensure real-time capability, the framework integrates performance optimization strategies, including input downsampling and Just-In-Time (JIT) compilation. Extensive simulations in Gazebo demonstrate that the optimized EPT-SAC framework achieves high frame rates and low latency, meeting the real-time requirements for mobile robot navigation. It outperforms conventional baselines in both navigation success rate and learning efficiency, achieving average success rates of 61.2% and 82.0% in laboratory and warehouse environments, respectively. It effectively enhances the recognition of obstacles and target locations in low-light visual target-driven navigation tasks.

HTML全文

参考文献(32)

施引文献

资源附件(0)