孙立香, 孙晓娴, 刘成菊, 靖文. 人群环境中基于深度强化学习的移动机器人避障算法[J]. 信息与控制, 2022, 51(1): 107-118. DOI: 10.13976/j.cnki.xk.2022.0099
引用本文: 孙立香, 孙晓娴, 刘成菊, 靖文. 人群环境中基于深度强化学习的移动机器人避障算法[J]. 信息与控制, 2022, 51(1): 107-118. DOI: 10.13976/j.cnki.xk.2022.0099
SUN Lixiang, SUN Xiaoxian, LIU Chengju, JING Wen. Obstacle Avoidance Algorithm for Mobile Robot Based on Deep Reinforcement Learning in Crowd Environment[J]. INFORMATION AND CONTROL, 2022, 51(1): 107-118. DOI: 10.13976/j.cnki.xk.2022.0099
Citation: SUN Lixiang, SUN Xiaoxian, LIU Chengju, JING Wen. Obstacle Avoidance Algorithm for Mobile Robot Based on Deep Reinforcement Learning in Crowd Environment[J]. INFORMATION AND CONTROL, 2022, 51(1): 107-118. DOI: 10.13976/j.cnki.xk.2022.0099

人群环境中基于深度强化学习的移动机器人避障算法

Obstacle Avoidance Algorithm for Mobile Robot Based on Deep Reinforcement Learning in Crowd Environment

  • 摘要: 为了控制移动机器人在人群密集的复杂环境中高效友好地完成避障任务,本文提出了一种人群环境中基于深度强化学习的移动机器人避障算法。首先,针对深度强化学习算法中值函数网络学习能力不足的情况,基于行人交互(crowd interaction)对值函数网络做了改进,通过行人角度网格(angel pedestrian grid)对行人之间的交互信息进行提取,并通过注意力机制(attention mechanism)提取单个行人的时序特征,学习得到当前状态与历史轨迹状态的相对重要性以及对机器人避障策略的联合影响,为之后多层感知机的学习提供先验知识;其次,依据行人空间行为(human spatial behavior)设计强化学习的奖励函数,并对机器人角度变化过大的状态进行惩罚,实现了舒适避障的要求;最后,通过仿真实验验证了人群环境中基于深度强化学习的移动机器人避障算法在人群密集的复杂环境中的可行性与有效性。

     

    Abstract: To control mobile robots to efficiently perform obstacle avoidance in crowded and complex environments, a mobile robot obstacle avoidance algorithm based on deep reinforcement learning in the human-robot integration environment is proposed. First, in response to the lack of learning capability of the value network of deep reinforcement learning algorithms, the value function network is improved based on crowd interaction. The crowd information is exchanged through the angel pedestrian grid. The temporal characteristics of a single pedestrian are then extracted through an attention mechanism, which learns the relative importance of historical trajectory state and joint impact on the obstacle avoidance strategy of the robot, providing a first step for the subsequent learning of the multilayer perceptron. Next, a reward function was developed for reinforcement learning based on human spatial behavior. The state where the robot angle changes significantly is punished to achieve the requirements of comfortable obstacle avoidance. Finally, the feasibility and effectiveness of the proposed algorithm in crowded and complex environments are verified through simulation experiments.

     

/

返回文章
返回