To address the problem of the inaccurate pose estimations by most simultaneous localization and map construction (SLAM) systems in dynamic scenes, in this paper, a motion consistency detection algorithm is proposed that utilizes weighted epipolar and depth constraints based on semantic priors. This algorithm can construct a visual SLAM system in a dynamic indoor scene. In this method, a semantic segmentation thread is applied to the input image to obtain the set of potential motion feature points. Then, feature points on the non-latent motion area of the image are extracted to obtain an initial value of inter-frame transformation. In addition, weighted epipolar and depth constraints are uesd to determine the potential outliers (i.e., dynamic feature points) and update the static feature point set by the removal of outlies. Lastly, the set of robust static feature points is used to exactly determine the pose of the camera, which is sent to the back end as the initial motion optimization value.We introduced the proposed algorithm into the visual SLAM system, and evaluated its performance on nine dynamic scene sequences of the TUM dataset and three image sequences of the BONN complex dynamic environment dataset. The RMSE (root mean squared error) of the absolute trajectory error is reduced by 10.53% to 93.75% compared to that of the state-of-the-art dynamic SLAM system DS-SLAM, and the RMSE for the translation and rotation relative pose error achieved reductions of up to 73.44% and 68.73%, respectively. Experimental results show that the proposed method significantly reduces the motion estimation error in dynamic environments.