Search results
(1 - 2 of 2)
- Title
- Unsupervised Learning of Visual Odometry Using Direct Motion Modeling
- Creator
- Andrei, Silviu Stefan
- Date
- 2020
- Description
-
Data for supervised learning of ego-motion and depth from video is scarce and expensive to produce. Subsequently, recent work has focused on...
Show moreData for supervised learning of ego-motion and depth from video is scarce and expensive to produce. Subsequently, recent work has focused on unsupervised learning methods and achieved remarkable results which surpass in some instances the accuracy of supervised methods. Many unsupervised approaches rely on predicted monocular depth and so ignore motion information. Moreover, unsupervised methods which do incorporate motion information do so only indirectly by designing the depth prediction network as an RNN. Hence, none of the existing methods model motion directly. In this work, we show that it is possible to achieve superior pose estimation results by modeling motion explicitly. Our method uses a novel learning-based formulation for depth propagation and refinement which transforms predicted depth maps from the current frame onto the next frame where it serves as a prior for predicting the next frame's depth map. Experimental results demonstrate that the proposed approach surpasses state of the art techniques for the pose prediction task while being better or on par with other methods for the depth prediction task.
Show less
- Title
- Image Synthesis with Generative Adversarial Networks
- Creator
- Ouyang, Xu
- Date
- 2023
- Description
-
Image synthesis refers to the process of generating new images from an existing dataset, with the objective of creating images that closely...
Show moreImage synthesis refers to the process of generating new images from an existing dataset, with the objective of creating images that closely resemble the target images, learned from the source data distribution. This technique has a wide range of applications, including transforming captions into images, deblurring blurred images, and enhancing low-resolution images. In recent years, deep learning techniques, particularly Generative Adversarial Network (GAN), has achieved significant success in this field. GAN consists of a generator (G) and a discriminator (D) and employ adversarial learning to synthesize images. Researchers have developed various strategies to improve GAN performance, such as controlling learning rates for different models and modifying the loss functions. This thesis focuses on image synthesis from captions using GANs and aims to improve the quality of generated images. The study is divided into four main parts:In the first part, we investigate the LSTM conditional GAN which is to generate images from captions. We use the word2vec as the caption features and combine these features’ information by LSTM and generate images via conditional GAN. In the second part, to improve the quality of generated images, we address the issue of convergence speed and enhance GAN performance using an adaptive WGAN update strategy. We demonstrate that this update strategy is applicable to Wasserstein GAN(WGAN) and other GANs that utilize WGAN-related loss functions. The proposed update strategy is based on a loss change ratio comparison between G and D. In the third part, to further enhance the quality of synthesized images, we investigate a transformer-based Uformer GAN for image restoration and propose a two-step refinement strategy. Initially, we train a Uformer model until convergence, followed by training a Uformer GAN using the restoration results obtained from the first step.In the fourth part, to generate fine-grained image from captions, we delve into the Recurrent Affine Transformation (RAT) GAN for fine-grained text-to-image synthesis. By incorporating an auxiliary classifier in the discriminator and employing a contrastive learning method, we improve the accuracy and fine-grained details of the synthesized images.Throughout this thesis, we strive to enhance the capabilities of GANs in various image synthesis applications and contribute valuable insights to the field of deep learning and image processing.
Show less