Distribution-aware Visual Semantic Understanding

Access to IIT electronic theses and dissertations is restricted to IIT community members with a valid iit.edu email address. Please log in using your iit.edu email address and password. Requests for external users are handled through inter-library loan and can be emailed to iitmyillms@iit.edu.

Description

Understanding visual semantics, including change detection and semantic segmentation, is an essential task in many computer vision and image processing applications. Examples of visual semantics understanding in images include land cover monitoring, urban expansion evaluation, autonomous driving,... Show moreUnderstanding visual semantics, including change detection and semantic segmentation, is an essential task in many computer vision and image processing applications. Examples of visual semantics understanding in images include land cover monitoring, urban expansion evaluation, autonomous driving, and scene understanding. The goal is to locate and recognize appropriate pixel-wise semantic labels in images. Classical computer vision algorithms involve sophisticated semi-heuristic pre-processing steps and potentially manual interaction. In this thesis, I propose and evaluate end-to-end deep neural approaches for processing images which achieve better performance compared with existing approaches.
Supervised semantic segmentation has been widely studied and achieved great success with deep learning. However, existing deep learning methods typically suffer from generalization issues where a well-trained model may not work well on unseen samples from a different dataset. This is due to a distribution change or domain shift between the training and test sets that can degrade performance. Providing more labeled samples covering many possible variations can further improve the generalization of models, but acquiring labeled data is typically time-consuming, labor-intensive and requires domain knowledge. To tackle this label scarcity bottleneck for supervised learning, we propose to apply unsupervised domain adaptation, semi-supervised learning, and semi-supervised domain adaptation for neural semantic segmentation.

The motivation behind unsupervised domain adaptation for semantic segmentation is to transfer learned knowledge from one or more source domains with sufficient labeled samples to a different but relevant target domain where labeled data is sparse or non-existent. The adaptation algorithm tends to learn a common representation space where the distributions over both source and target domains are matched. In this way, we expect a classifier working well in the source domain to generalize well to the target domain. More specifically, we try to learn class-aware source-target domain distribution differences, and transfer the knowledge learned from labeled synthetic data on the source domain to the unlabeled real data on the target domain.

Different from domain adaptation, semi-supervised semantic segmentation aims at utilizing a large amount of unlabeled data to improve semantic classification trained on a small amount of labeled data from the same distribution. Specifically, supervised semantic segmentation is trained together with an unsupervised model by applying perturbations on encoded states of the network instead of the input, or using mask-based data augmentation techniques to encourage consistent predictions over mixed samples. In this way, learned representation which capture many kinds of unseen variations in unlabeled data, benefit the supervised semantic classifier. We propose a mask-based data augmentation semi-supervised learning network to utilize structure information from a variety of unlabeled examples to improve the learning on a limited number of labeled examples.Both unsupervised domain adaptation (UDA) with full source supervision but without target supervision and semi-supervised learning (SSL) with partial supervision have shown to be able to address the generalization problem to some extent. While such methods are effective at aligning different feature distributions, their inability to efficiently exploit unlabeled data leads to intra-domain discrepancy in the target domain, where the target domain is separated into two unaligned sub-distributions due to source-aligned and target-aligned data. That is, enforcing partial alignment between full labeled source data and a few labeled target data does not guarantee that the remaining unlabeled target samples will be aligned with source feature clusters, thus leaving them unaligned. Hence, I propose methods for incorporating the advantages of both UDA and SSL, termed semi-supervised domain adaptation (SSDA), with a goal to align cross-domain features as well as addressing the intra-domain discrepancy within the target domain. I propose a simple yet effective semi-supervised domain adaptation approach by utilizing a two-step domain adaptation addressing both cross-domain and intra-domain shifts. Show less

In collections

Electronic Theses and Dissertations

Details

Department: CS / Computer Science
Title: Distribution-aware Visual Semantic Understanding
Institution: Illinois Institute of Technology
Creator(s): Chen, Ying
Advisor(s): Agam, Gady
Date: 2021
Description: Understanding visual semantics, including change detection and semantic segmentation, is an essential task in many computer vision and image processing applications. Examples of visual semantics... Show moreUnderstanding visual semantics, including change detection and semantic segmentation, is an essential task in many computer vision and image processing applications. Examples of visual semantics understanding in images include land cover monitoring, urban expansion evaluation, autonomous driving, and scene understanding. The goal is to locate and recognize appropriate pixel-wise semantic labels in images. Classical computer vision algorithms involve sophisticated semi-heuristic pre-processing steps and potentially manual interaction. In this thesis, I propose and evaluate end-to-end deep neural approaches for processing images which achieve better performance compared with existing approaches. Supervised semantic segmentation has been widely studied and achieved great success with deep learning. However, existing deep learning methods typically suffer from generalization issues where a well-trained model may not work well on unseen samples from a different dataset. This is due to a distribution change or domain shift between the training and test sets that can degrade performance. Providing more labeled samples covering many possible variations can further improve the generalization of models, but acquiring labeled data is typically time-consuming, labor-intensive and requires domain knowledge. To tackle this label scarcity bottleneck for supervised learning, we propose to apply unsupervised domain adaptation, semi-supervised learning, and semi-supervised domain adaptation for neural semantic segmentation. The motivation behind unsupervised domain adaptation for semantic segmentation is to transfer learned knowledge from one or more source domains with sufficient labeled samples to a different but relevant target domain where labeled data is sparse or non-existent. The adaptation algorithm tends to learn a common representation space where the distributions over both source and target domains are matched. In this way, we expect a classifier working well in the source domain to generalize well to the target domain. More specifically, we try to learn class-aware source-target domain distribution differences, and transfer the knowledge learned from labeled synthetic data on the source domain to the unlabeled real data on the target domain. Different from domain adaptation, semi-supervised semantic segmentation aims at utilizing a large amount of unlabeled data to improve semantic classification trained on a small amount of labeled data from the same distribution. Specifically, supervised semantic segmentation is trained together with an unsupervised model by applying perturbations on encoded states of the network instead of the input, or using mask-based data augmentation techniques to encourage consistent predictions over mixed samples. In this way, learned representation which capture many kinds of unseen variations in unlabeled data, benefit the supervised semantic classifier. We propose a mask-based data augmentation semi-supervised learning network to utilize structure information from a variety of unlabeled examples to improve the learning on a limited number of labeled examples.Both unsupervised domain adaptation (UDA) with full source supervision but without target supervision and semi-supervised learning (SSL) with partial supervision have shown to be able to address the generalization problem to some extent. While such methods are effective at aligning different feature distributions, their inability to efficiently exploit unlabeled data leads to intra-domain discrepancy in the target domain, where the target domain is separated into two unaligned sub-distributions due to source-aligned and target-aligned data. That is, enforcing partial alignment between full labeled source data and a few labeled target data does not guarantee that the remaining unlabeled target samples will be aligned with source feature clusters, thus leaving them unaligned. Hence, I propose methods for incorporating the advantages of both UDA and SSL, termed semi-supervised domain adaptation (SSDA), with a goal to align cross-domain features as well as addressing the intra-domain discrepancy within the target domain. I propose a simple yet effective semi-supervised domain adaptation approach by utilizing a two-step domain adaptation addressing both cross-domain and intra-domain shifts. Show less
Type: Dissertation
File type: application/pdf
Subject: Computer science
Change detection
Convolutional neural network
Domain adaptation
Semantic segmentation
Semi-supervised domain adaptation
Semi-supervised learning
Access: Restricted Access
Rights: In Copyright
http://rightsstatements.org/page/InC/1.0/
Identifier: http://hdl.handle.net/10560/islandora:1024830

File	Size	Format
MODS Metadata Record	6.92 KiB	application/xml
Dublin Core Metadata Record	5.64 KiB	application/xml
File Restricted	10.73 MiB	application/pdf

repository.iit

Search the repository

Distribution-aware Visual Semantic Understanding

Description

In collections