Recently, there has been a rapid development in monocular depth estimation based on self-supervised learning. However, these existing self... Show moreRecently, there has been a rapid development in monocular depth estimation based on self-supervised learning. However, these existing self-supervised learning methods are insufficient for estimating motion objects, occlusions, and large static areas. Uncertainty or vanishing easily occurs during depth inferencing. To address this problem, the model proposed in this thesis further explores the consistency in video and builds a multi-frame model for depth estimation; secondly, by taking advantage of the optical flow, a motion mask is generated, with additional photometric loss applied for those masked regions. Experiments are carried out on the KITTI dataset. The proposed model performs better than the baseline model in quantitative results, and as seen from the depth map, the scale uncertainty and depth incomplete situations are improved in motion objects and occlusions explicitly. Show less