Abstract:
Traditional algorithms for simultaneous localization and mapping, referred to as SLAM algorithms, are easily affected by dynamic objects and cannot extract semantic scene information. To solve these problems, we propose an algorithm for building outdoor three-dimensional (3D) semantic maps in dynamic environments. Firstly, with respect to semantic segmentation, we propose a conditional random field (CRF) image semantic segmentation algorithm based on fully convolutional networks and superpixels. This algorithm combines semantic information with epipolar constraints to remove the feature points of dynamic object. Then, the camera trajectory is estimated by the visual odometer, depth data is obtained by the monocular depth estimation algorithm, and a 3D model is obtained based on the depth data. At the same time, two-dimensional semantic labels are gradually mapped to the 3D point cloud by a Bayesian progressive label migration algorithm. On this basis, we propose a global 3D map optimization algorithm based on a high-order CRF. This algorithm sets up the high-order terms of the CRF based on time-space consistency of 3D superpixels, and adds the constraint relationship between the point cloud and the 3D region to achieve boundary consistency of the point-cloud category during semantic segmentation. Experimental results show that this algorithm obtains more accurate pose estimation than the mainstream algorithm in dynamic scenes, and can effectively improve the accuracy of the semantic segmentation of single-frame images and obtain a globally consistent semantic map.