Data for supervised learning of ego-motion and depth from video is scarce and expensive to produce. Subsequently, recent work has focused on... Show moreData for supervised learning of ego-motion and depth from video is scarce and expensive to produce. Subsequently, recent work has focused on unsupervised learning methods and achieved remarkable results which surpass in some instances the accuracy of supervised methods. Many unsupervised approaches rely on predicted monocular depth and so ignore motion information. Moreover, unsupervised methods which do incorporate motion information do so only indirectly by designing the depth prediction network as an RNN. Hence, none of the existing methods model motion directly. In this work, we show that it is possible to achieve superior pose estimation results by modeling motion explicitly. Our method uses a novel learning-based formulation for depth propagation and refinement which transforms predicted depth maps from the current frame onto the next frame where it serves as a prior for predicting the next frame's depth map. Experimental results demonstrate that the proposed approach surpasses state of the art techniques for the pose prediction task while being better or on par with other methods for the depth prediction task. Show less