Search results
(21 - 40 of 69)
Pages
- Title
- DATA PRIVACY AND DEEP LEARNING IN THE MOBILE ERA: TRACEABILITY AND PROTECTION
- Creator
- Chen, Linlin
- Date
- 2020
- Description
-
Privacy and deep learning have been two of the most exciting research trends in both academia and industry. On the one hand, big data rapidly...
Show morePrivacy and deep learning have been two of the most exciting research trends in both academia and industry. On the one hand, big data rapidly expedite lots of data orientated applications, especially like deep learning services. With the tremendous value exhibited by the data, the privacy of data subjects who generate the data, has also raised much attention. Meanwhile more regulations and legislation have been enacted or enforced, intending to enforce the companies and organizations to strictly comply with the personal privacy protection while collecting or utilizing their data. All these moves will substantially change the ways to train the deep learning models and provide AI services, and in some ways might hinder the development of deep learning if not coming up with some sophisticated mechanisms. On the other hand, deep learning has been showing incredibly promising performance in a variety of areas like face recognition, voice recognition, recommendation & advertising, autonomous driving, medical imaging, etc.. This keeps us thinking will deep learning also in turn influence privacy and be leveraged to compromise privacy. Meanwhile we also observe that mobile devices become so ubiquitous that more shares of data are generated on mobile devices, and mostly those data are both extremely sensitive for data subjects as well as extremely valuable for developing deep learning. We shouldn’t neglect the impact of mobile devices on both privacy and deep learning.In this thesis I explore the research on the interactions between privacy and deep learning, especially with the mobile devices being involved in. Specifically I work on: 1). How does privacy change the way we use the data when building deep learning models, and present the mechanism for privacy protection towards deep learning. 2). How does deep learning in turn make privacy more vulnerable to be compromised, and demonstrate the privacy compromise by facilitating deep learning to trace the source mobile devices and link the personal identities.
Show less
- Title
- A FRAMEWORK FOR MANAGING UNSPECIFIED ASSUMPTIONS IN SAFETY-CRITICAL CYBER-PHYSICAL SYSTEMS
- Creator
- Fu, Zhicheng
- Date
- 2020
- Description
-
For a cyber-physical system, its execution behaviors are often impacted by its operating environment. However, the assumptions about a cyber...
Show moreFor a cyber-physical system, its execution behaviors are often impacted by its operating environment. However, the assumptions about a cyber-physical system’s expected environment are often informally documented, or even left unspecified during the system development process. Unfortunately, such unspecified assumptions made in cyber-physical systems, such as medical cyber-physical systems, can result in patients’ injures and loss of lives. Based on the U.S. Food and Drug Administration (FDA) data, from 2006 to 2011, there were 5,294 recalls and 1,154,451 adverse events resulting in 92,600 patient injuries and 25,800 deaths. One of the most critical reasons for these medical device recalls is the violations of unspecified assumptions. These compelling data motivated us to research unspecified assumptions issues in safety-critical cyber-physical systems, and develop approaches to reduce the failures caused by unspecified assumptions.In particular, this thesis is to study the issues of unspecified assumptions in cyber-physical system design process, and to develop an unspecified assumption management framework to (1) identify unspecified assumptions in system design models; (2) facilitate domain experts to perform impact analysis on the failures caused by violating unspecified assumptions; and (3) explicitly model unspecified assumptions in system design models for system safety validation and verification.Before starting to develop the unspecified assumption management framework, we first need to study how unspecified assumptions may be introduced into cyber- physical systems. We took cases from the FDA medical device recall database to analyze the root causes of medical device failures. By analyzing these cases, we found two important facts: (1) one of the major reasons that causes medical device recalls is violation of some unspecified assumptions; and (2) unspecified assumptions are often introduced into the system design models through syntactic carriers. Based on the two findings, we propose a framework for managing unspecified assumption in cyber- physical system development process. The framework has three components. The first component is called the Unspecified Assumption Carrier Finder (UACFinder), which is to identify unspecified assumptions in system design models through automatically extracting syntactic carriers associated with unspecified assumptions. However, as the number of unspecified assumptions identified from system design models can be large, and it may not be always feasible for domain experts to validate and address the most safety-critical assumptions at different system development phases. Therefore, the second component of the framework is a methodology that uses the Failure Mode and Effects Analysis (FMEA) based prioritization approach to facilitate domain experts to perform impact analysis on unspecified assumptions identified by the UACFinder and asses their safety-critical level. The third component of the framework describes a model architecture and corresponding algorithms to model and integrate assumptions into system design models, so that system safety associated with these unspecified assumptions can be validated and formally verified by existing tools.We also have conducted case-studies on representative system models to demonstrate how UACFinder can identify unspecified assumptions from system design mod- els, and how the FMEA based prioritizing approach can facilitate domain experts to verify the appropriateness of identified assumptions. In addition, case studies are also conducted to demonstrate how system safety properties can be improved by modeling and integrating unspecified assumptions into system models. The results of case-studies indicate that the unspecified assumption management framework can identify unspecified assumptions, facilitate domain experts to validate and verify the appropriateness of identified assumptions, and explicitly specify assumptions that would cause defects in these systems.
Show less
- Title
- Integrity based landmark generation: A method to generate landmark configurations that guarantee mobile robot localization safety
- Creator
- Chen, Yihe
- Date
- 2020
- Description
-
From the bronze-age city Nineveh to the modern metropolitan like Tokyo, traffic shape cities and profoundly affect the life of people. Similar...
Show moreFrom the bronze-age city Nineveh to the modern metropolitan like Tokyo, traffic shape cities and profoundly affect the life of people. Similar to how the wide-spreading of automobile had modified the modern cities in early 20th century, we are now standing on the eve of yet another traffic revolution. With the vast spreading of autonomous/semi- autonomous robotics application, it is important for the urban designers to design or retrofit urban environment that is safe and friendly to the autonomous robots; As more robots are deployed in life-critical situations, such as autonomous passenger vehicles, it is imperative to consider their safety, and in particular, their localization safety. While it would be ideal to guarantee safety in any environment without having to physically modify said environment, this is not always possible and one may have add landmarks or active beacons to reach an acceptable level of safety for landmark-based localization. Localization safety is assessed using integrity, the primary safety metric used in open-sky aviation applications that has been recently applied to mobile robots and can ac- count for the impact of rarely occurring, undetected faults. Conventional integrity monitor- ing method has high dependency on GPS system, while the traditional Global Navigation Satellite System - Inertia Measurement Unit (GNSS-IMU) based localization does not ap- plied in the metropolitan areas due to the signal blocking and multi-pathing problem caused by high-rise structures. Thus, this dissertation concentrates on the feature based integrity monitoring method. This dissertation formulates environmental localization safety problem as a system- atic optimization problem: given the robot’s trajectory and the current landmark map, add the minimal number of new landmarks at certain location such that the integrity risk along the trajectory is below a given safety threshold. This dissertation proposes two algorithms to solve the problem: Integrity-based Landmark Generator (I-LaG) and Fast I-LaG. I-LaG adds fewer landmarks but it is relatively computationally expensive; Fast I-LaG is less com- putationally intensive at the expense of more landmarks. Both simulation and experimental results are presented.
Show less
- Title
- DATA SHARING WITH PRIVACY AND SECURITY
- Creator
- Qian, Jianwei
- Date
- 2019
- Description
-
Data is a non-exclusive resource and has synergistic effects. Open data sharing will enhance the utilization of big data’s value and...
Show moreData is a non-exclusive resource and has synergistic effects. Open data sharing will enhance the utilization of big data’s value and tremendously boost economic growth and transparency. Data sharing platforms have emerged worldwide, but with very limited services. Security is one of the main reasons why most data are not commonly shared. This dissertation aims to tackle several security issues in building a trustworthy data sharing ecosystem. First, I reveal the privacy risks in data sharing by designing de-anonymization and privacy inference attacks. Second, I present an analysis of the relationship between the attacker's knowledge and the privacy risk of data sharing, and try quantifying and estimating the risk. Then, I propose anonymization algorithms to protect the privacy of participants in data sharing. Finally, I survey the status quo, privacy and security concerns, and opportunities in data trading. This dissertation involves various data types with a focus on graph data and speech data; it also involves various forms of data sharing including collection, publishing, query, and trading.
Show less
- Title
- Fast Automatic Bayesian Cubature Using Matching Kernels and Designs
- Creator
- Rathinavel, Jagadeeswaran
- Date
- 2019
- Description
-
Automatic cubatures approximate multidimensional integrals to user-specified error tolerances. In many real-world integration problems, the...
Show moreAutomatic cubatures approximate multidimensional integrals to user-specified error tolerances. In many real-world integration problems, the analytical solution is either unavailable or difficult to compute. To overcome this, one can use numerical algorithms that approximately estimate the value of the integral. For high dimensional integrals, quasi-Monte Carlo (QMC) methods are very popular. QMC methods are equal-weight quadrature rules where the quadrature points are chosen deterministically, unlike Monte Carlo (MC) methods where the points are chosen randomly.The families of integration lattice nodes and digital nets are the most popular quadrature points used. These methods consider the integrand to be a deterministic function. An alternative approach, called Bayesian cubature, postulates the integrand to be an instance of a Gaussian stochastic process. For high dimensional problems, it is difficult to adaptively change the sampling pattern. But one can automatically determine the sample size, $n$, given a fixed and reasonable sampling pattern. We take this approach using a Bayesian perspective. We assume a Gaussian process parameterized by a constant mean and a covariance function defined by a scale parameter and a function specifying how the integrand values at two different points in the domain are related. These parameters are estimated from integrand values or are given non-informative priors. This leads to a credible interval for the integral. The sample size, $n$, is chosen to make the credible interval for the Bayesian posterior error no greater than the desired error tolerance. However, the process just outlined typically requires vector-matrix operations with a computational cost of $O(n^3)$. Our innovation is to pair low discrepancy nodes with matching kernels, which lowers the computational cost to $O(n \log n)$. We begin the thesis by introducing the Bayesian approach to calculate the posterior cubature error and define our automatic Bayesian cubature. Although much of this material is known, it is used to develop the necessary foundations. Some of the major contributions of this thesis include the following: 1) The fast Bayesian transform is introduced. This generalizes the techniques that speedup Bayesian cubature when the kernel matches low discrepancy nodes. 2) The fast Bayesian transform approach is demonstrated using two methods: a) rank-1 lattice sequences and shift-invariant kernels, and b) Sobol' sequences and Walsh kernels. These two methods are implemented as fast automatic Bayesian cubature algorithms in the Guaranteed Automatic Integration Library (GAIL). 3) We develop additional numerical implementation techniques: a) rewriting the covariance kernel to avoid cancellation error, b) gradient descent for hyperparameter search, and c) non-integer kernel order selection.The thesis concludes by applying our fast automatic Bayesian cubature algorithms to three sample integration problems. We show that our algorithms are faster than the basic Bayesian cubature and that they provide answers within the error tolerance in most cases. The Bayesian cubatures that we develop are guaranteed for integrands belonging to a cone of functions that reside in the middle of the sample space. The concept of a cone of functions is also explained briefly.
Show less
- Title
- MULTIVARIABLE SIMULATION PLATFORM FOR TYPE 1 DIABETES AND AUTOMATIC MEAL HANDLING IN ARTIFICIAL PANCREAS SYSTEMS
- Creator
- Samadi, Sediqeh
- Date
- 2019
- Description
-
Artificial pancreas (AP) systems are designed to automate the glucose control in type 1 diabetes mellitus (T1DM). Multivariable artificial...
Show moreArtificial pancreas (AP) systems are designed to automate the glucose control in type 1 diabetes mellitus (T1DM). Multivariable artificial pancreas systems have evolved to incorporate various additional physiological measurements beyond the conventional continuous glucose monitoring measurements to better integrate information on the metabolic state of the patients affecting the glycemic dynamics. The changes in the physiological measurements such as heart rate, energy expenditure, skin temperature, and skin conductance measured by wearable devices are indicative of the changes in the metabolic state. The controller receives the physiological measurements in the feed forward manner which accelerates the remedy control decision in response to the disturbances. Although various AP systems are proposed in the literature to accommodate these additional sources of information, the testing and evaluation of these advanced multivariable AP systems are hindered by the requirements of conducting time-consuming and expensive clinical trials. Development of a simulation platform for rapid prototyping and iterative development of AP systems is one of the main contributions of this study. Simulation platform for T1DM includes a compartmental model generating glucose concentration in response to physical activity in addition to meals and infused insulin. The proposed exercise-glucose-insulin model is an extension of the previously developed glucose-insulin model to derive transient variations in glycemic dynamics caused by physical activity and to improve the glucose prediction accuracy. Physiological variables affected by physical activity, such as heart rate, skin temperature, and blood volume pulse are generated in addition to the glucose concentration in the simulator. The simulation platform includes several virtual patients providing a reliable platform for in silico evaluation of different algorithms proposed for automation of glucose control in T1DM. The multivariable simulator will accelerate the development of next-generation artificial pancreas systems.The development of a disturbance detection algorithm is the other contribution of this study. Meals are major disturbances to the glucose homeostasis, and automated detection of meal consumption and carbohydrate estimation of the consumed meal are critical for fully automated artificial pancreas control systems. In this study, a detection algorithm integrating fuzzy logic classification and qualitative analysis is proposed. A fuzzy logic system estimates the carbohydrate content of the meal.
Show less
- Title
- A SCALABLE SIMULATION AND MODELING FRAMEWORK FOR EVALUATION OF SOFTWARE-DEFINED NETWORKING DESIGN AND SECURITY APPLICATIONS
- Creator
- Yan, Jiaqi
- Date
- 2019
- Description
-
The world today is densely connected by many large-scale computer networks, supporting military applications, social communications, power...
Show moreThe world today is densely connected by many large-scale computer networks, supporting military applications, social communications, power grid facilities, cloud services, and other critical infrastructures. However, a gap has grown between the complexity of the system and the increasing need for security and resilience. We believe this gap is now reaching a tipping point, resulting in a dramatic change in the way that networks and applications are architected, developed, monitored, and protected. This trend calls for a scalable and high-fidelity network testing and evaluation platform to facilitate the transformation from in-house research ideas to real-world working solutions. With this objective, we investigate means to build a scalable and high-fidelity network testbed using container-based emulation and parallel simulation; our study focuses on the emerging software-defined networking (SDN) technology. Existing evaluation platforms facilitate the adoption of the SDN architecture and applications to production systems. However, the performance of those platforms is highly dependent on the underlying physical hardware resources. Insufficient resources would lead to undesired results, such as low experimental fidelity or slow execution speed, especially with large-scale network settings. To improve the testbed fidelity, we first develop a lightweight virtual time system for Linux container and integrate the system into a widely-used SDN emulator. A key issue with an ordinary container-based emulator is that it uses the system clock across all the containers even if a container is not being scheduled to run, which leads to the issue of both performance and temporal fidelity, especially with high workloads. We investigate virtual time approaches by precisely scaling the time of interactions between containers and physical devices. Our evaluation results indicate a definite improvement in fidelity and scalability. To improve the testbed scalability, we investigate how the centralized paradigm of SDN can be utilized to reduce the simulation workload. We explore a model abstraction technique that effectively transforms the SDN network devices to one virtualized switch model. While significantly reducing the model execution time and enabling the real-time simulation capability, our abstracted model also preserves the end-to-end forwarding behavior of the original network.With enhanced fidelity and scalability, it is realistic to utilize our network testbed to perform a security evaluation of various SDN applications. We notice that the communication network generates and processes a huge amount of data. The logically-centralized SDN control plane, on the one hand, has to process both critical control traffic and potentially big data traffic, and on the other hand, enables many efficient security solutions, such as intrusion detection, mitigation, and prevention. Recently, deep neural networks achieve state-of-the-art results across a range of hard problem spaces. We study how to utilize the big data and deep learning to secure communication networks and host entities. For classifying malicious network traffic, we have performed the feasibility study of off-line deep-learning based intrusion detection by constructing the detection engine with multiple advanced deep learning models. For malware classification on individual hosts, another necessity to secure computer systems, existing machine learning-based malware classification methods rely on handcrafted features extracted from raw binary files or disassembled code. The diversity of such features created has made it hard to build generic malware classification systems that work effectively across different operational environments. To strike a balance between generality and performance, we explore new graph convolutional neural network techniques to effectively yet efficiently classify malware programs represented as their control flow graphs.
Show less
- Title
- ENHANCING PRIVACY AND SECURITY IN IOT-BASED SMART HOME
- Creator
- Du, Haohua
- Date
- 2019
- Description
-
The IoT-based smart home is envisioned as a system that augments everyone’s daily life. In the past few years, the smart home attracted...
Show moreThe IoT-based smart home is envisioned as a system that augments everyone’s daily life. In the past few years, the smart home attracted immense attention from the industrial organizations and has been considered as one of the principal pillars of the fourth industrial revolution. However, while the rapidly increasing number of Internet-connected smart devices expends the functionalities of smart homes, it also raises substantial security and privacy concerns.Commonly, a smart home system is composed of three major components, smart devices, communication among devices, and smart applications connecting the devices. Thus, this dissertation aims to enhance the security and privacy of the smart home system without weakening its functionalities from the perspectives of these three components. First, I improve the security of smart devices within the smart home by monitoring their behaviors based on the contextual environment. Then, I enhance the security of the communications among the devices through visible light communication, whose receivers have to be physically visible to senders and avoid possible eavesdropping. Finally, I study two popular smart applications – the augmented reality assistant and the cloud-based surveillance system, to discuss how to define privacy, how to reduce the leakage, and how to balance the privacy and security in the smart home. This dissertation proposes the mechanisms for each component, respectively, and it also implements the design in the real-world for evaluating their effectiveness and efficiency.
Show less
- Title
- Continuous Generalization of 2’s Complement Arithmetic
- Creator
- Patel, Shivam
- Date
- 2022-11-26
- Title
- Efficient management of uncertain data
- Creator
- Feng, Su
- Date
- 2023
- Description
-
Uncertainty arises naturally in many application domains. It can be caused by an uncertain data source (sensor errors, noise, etc.). Data...
Show moreUncertainty arises naturally in many application domains. It can be caused by an uncertain data source (sensor errors, noise, etc.). Data preprocessing techniques (data curation, data integration, etc.) can also results in uncertainty to the data. Analyzing uncertain data without accounting for its uncertainty can create hard to trace errors, with severe real world implications. Certain answers are a principled method for coping with the uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Other techniques from incomplete database record and propagate more detailed uncertainty information. However, most of these approaches are either too expensive to be practical, or only focus on a narrow class of queries and only work for a specific representation. In this thesis, we investigate models and query semantics for uncertain data management and present a framework that is general and practically efficient, backed up by fundamental theoretical foundations and with formally proven correctness guarantees. We first propose Uncertainty Annotated Databases (UA-DB), which combine an under- and over-approximation of certain answers to combine the reliability of certain answers with the performance of a classical database system. We then introduce attribute-annotated uncertain databases (AU-DB), which extend the UA-DB model with attribute-level annotations that record bounds on the values of an attribute across all possible worlds. AU-DB extends UA-DBs to encode a compact over-approximation of possible answers which is necessary to support non-monotone queries including aggregation and set difference. With a further extension to AU-DB that supports ranking and windowed aggregation queries using native implementation on modern DBMS, our approaches scale to complex queries and large datasets, and produces accurate results. Furthermore, they significantly outperforms alternative methods for uncertain data management.
Show less
- Title
- Towards Utility-Driven Data Analytics with Differential Privacy
- Creator
- Wang, Han
- Date
- 2023
- Description
-
The widespread use of personal devices and dedicated recording facilities has led to the generation of massive amounts of personal information...
Show moreThe widespread use of personal devices and dedicated recording facilities has led to the generation of massive amounts of personal information or data. Some of them are high-dimensional and unstructured data, such as video and location data. Analyzing these data can provide significant benefits in real-world scenarios, such as videos for monitoring and location data for traffic analysis. However, while providing benefits, these complicated data always raise serious privacy concerns since all of them involve personal information. To address privacy issues, existing privacy protection methods often fail to provide adequate utility in practical applications due to the complexity of high-dimensional and unstructured data. For example, most video sanitization techniques merely obscure the video by detecting and blurring sensitive regions, such as faces, vehicle plates, locations, and timestamps. Unfortunately, privacy breaches in blurred videos cannot be effectively contained, especially against unknown background knowledge. In this thesis, we propose three different differentially private frameworks to preserve the utility of video and location data (both are high-dimensional and unstructured data) while meeting the privacy requirements, under different well-known privacy settings. Specifically, to our best knowledge, wepropose the first differentially private video analytics platform (VideoDP) which flexibly supports different video queries or query-based analyze with a rigorous privacy guarantee. Given the input video, VideoDP randomly generates a utility-driven private video in which adding or removing any sensitive visual element (e.g., human, and object) does not significantly affect the output video. Then, different video analyses requested by untrusted video analysts can be flexibly performed over the sanitized video with differential privacy. Secondly, we define a novel privacy notion ϵ-Object Indistinguishability for all the predefined sensitive objects (e.g., humans, vehicles) in the video, and then propose a video sanitization technique VERRO that randomly generates utility-driven synthetic videos with indistinguishable objects. Therefore, all the objects can be well protected in the generated utility-driven synthetic videos which can be disclosed to any untrusted video recipient. Third, we propose the first strict local differential privacy (LDP) framework for location-based service (LBS) (“L-SRR”) to privately collect and analyze user locations or trajectories with ε-LDP guarantees. Specifically, we design a novel LDP mechanism “staircase randomized response” (SRR) and extend the empirical estimation to further boost the utility for a diverse set of LBS Apps (e.g., traffic density estimation, k nearest neighbors search, origin-destination analysis, and traffic-aware GPS navigation). Finally, we conduct experiments on real videos and location dataset, and the experimental results demonstrate all frameworks can have good performance.
Show less
- Title
- Approximation Algorithms for Selected Network and Graph Problems
- Creator
- Wang, Xiaolang
- Date
- 2023
- Description
-
This dissertation proposes new polynomial-time approximation algorithms for selected optimization problems, including network and classic...
Show moreThis dissertation proposes new polynomial-time approximation algorithms for selected optimization problems, including network and classic graph problems. We employed distinct strategies and techniques to solve these problems. In Chapter 1, we consider a problem we term FCSA, which aims to find an optimum way how clients are assigned to servers such that the largest latency on an interactivity path between two clients (client 1 to server 1, server 1 to server 2, then server 2 to client 2) is minimized. We present a (3/2)-approximation algorithm for FCSA and a (3/2)-approximation algorithm when server capacity constraints are considered. In Chapter 2, we focus on two variants of the Steiner Tree Problem and present better approximation ratios using known algorithms. For the Steiner Tree with minimum number of Steiner points and bounded edge length problem, we provide a polynomial time algorithm with ratio 2.277. For the Steiner Tree in quasi-bipartite graphs, we improve the best-known approximation ratio to 298/245 . In Chapter 3, we address the problem of searching for a maximum weighted series-parallel subgraph in a given graph, and present a (1/2 + 1/60)-approximation for this problem. Although there is currently no known real-life application of this problem, it remains an important and challenging open question in the field.
Show less
- Title
- Understanding Location Bias in Fake News Datasets of Twitter
- Creator
- Patil, Kayenat Kailas
- Date
- 2023
- Description
-
Fake news tends to spread faster and wider than real news. It has a greater impact and can lead to negative and dangerous outcomes. With the...
Show moreFake news tends to spread faster and wider than real news. It has a greater impact and can lead to negative and dangerous outcomes. With the world spending an increasing amount of time on their mobile devices, people tend to get more of their news from their desired social media platform. It has become part of our daily lives, whether it is to keep in touch with friends and family, to getting gossip on celebrities or even shopping. In 2022, the average time a person spends per day on the internet on a social media platform has been accounted to be about 147 minutes,[1] indicating an increase in time spent scrolling through information online.It has become a widespread phenomenon in recent years, thanks in part to the rapid spread of information through social media and other online channels. It is increasingly important to explore and understand fake news and its impact on society, as well as to develop effective tools and methods for detecting and combating it. There are several factors that can tamper with the successful detection of fake news. Machine learning models often fall to such biases that result in inaccurate predictions. There are several biases that have been identified like age, gender, sex and many more. In this thesis, we are exploring location as a form of a bias and if it hinders prediction. We have looked at location from two perspectives. One, taking location as co-ordinates in the form of latitude and longitude and analyzing the likelihood of a tweet coming from a location to be fake or not. The second method we have used is that we have considered location as an entity and used natural language processing model to see if its able to predict if the given tweet is fake or not, along with masking the location mentioned in the tweet and analyzing how the performance of the model changes. Machine learning models can play an important role in fake news detection models, by analyzing large amounts of data and identifying patterns and indicators that suggest a piece of information may be false or misleading, but they are often susceptible to some form of biases. By studying biases on machine learning models on fake news datasets, we can develop more effective tools for identifying fake news and taking steps towards mitigating it, ultimately helping to protect the integrity of information and promote informed decision-making in society.
Show less
- Title
- Image Synthesis with Generative Adversarial Networks
- Creator
- Ouyang, Xu
- Date
- 2023
- Description
-
Image synthesis refers to the process of generating new images from an existing dataset, with the objective of creating images that closely...
Show moreImage synthesis refers to the process of generating new images from an existing dataset, with the objective of creating images that closely resemble the target images, learned from the source data distribution. This technique has a wide range of applications, including transforming captions into images, deblurring blurred images, and enhancing low-resolution images. In recent years, deep learning techniques, particularly Generative Adversarial Network (GAN), has achieved significant success in this field. GAN consists of a generator (G) and a discriminator (D) and employ adversarial learning to synthesize images. Researchers have developed various strategies to improve GAN performance, such as controlling learning rates for different models and modifying the loss functions. This thesis focuses on image synthesis from captions using GANs and aims to improve the quality of generated images. The study is divided into four main parts:In the first part, we investigate the LSTM conditional GAN which is to generate images from captions. We use the word2vec as the caption features and combine these features’ information by LSTM and generate images via conditional GAN. In the second part, to improve the quality of generated images, we address the issue of convergence speed and enhance GAN performance using an adaptive WGAN update strategy. We demonstrate that this update strategy is applicable to Wasserstein GAN(WGAN) and other GANs that utilize WGAN-related loss functions. The proposed update strategy is based on a loss change ratio comparison between G and D. In the third part, to further enhance the quality of synthesized images, we investigate a transformer-based Uformer GAN for image restoration and propose a two-step refinement strategy. Initially, we train a Uformer model until convergence, followed by training a Uformer GAN using the restoration results obtained from the first step.In the fourth part, to generate fine-grained image from captions, we delve into the Recurrent Affine Transformation (RAT) GAN for fine-grained text-to-image synthesis. By incorporating an auxiliary classifier in the discriminator and employing a contrastive learning method, we improve the accuracy and fine-grained details of the synthesized images.Throughout this thesis, we strive to enhance the capabilities of GANs in various image synthesis applications and contribute valuable insights to the field of deep learning and image processing.
Show less
- Title
- A Novel Explainability Approach For Spectrum Measurement Insight
- Creator
- Nagpure, Vaishali
- Date
- 2023
- Description
-
Spectrum is an extremely valuable natural resource in high demand. Although the spectrum has been fully allocated, there is no comprehensive...
Show moreSpectrum is an extremely valuable natural resource in high demand. Although the spectrum has been fully allocated, there is no comprehensive method for understanding about how it’s being used. Spectrum measurements are highly complex spatiotemporal data sets that play a key role in understanding spectrum use and require very specialized domain information for understanding. To leverage existing and future spectrum measurements to the fullest extent, it is necessary to have a systematic way to connect them to the contextual information that helps provide meaning to the data. To analyze and interpret the measurements, a variety of contextual information is needed. This research develops a novel approach for spectrum measurement understanding that unifies five years of wideband spectrum measurement summary data together with relevant contextual information from a variety of sources in a spectrum knowledge graph. Both quantitative and qualitative information is modeled and implemented on a Neo4j graph database platform. This modeling formalizes the relationships that help spectrum stakeholders “connect the dots” and provide deeper understanding of RF spectrum utilization. The knowledge graph can be queried to extract a wide variety of insights thus making spectrum knowledge more widely accessible to a variety of stakeholders.
Show less
- Title
- A SCALABLE AND CUSTOMIZABLE SIMULATION PLATFORM FOR ACCURATE QUANTUM NETWORK DESIGN AND EVALUATION
- Creator
- Wu, Xiaoliang
- Date
- 2021
- Description
-
Recent advances in quantum information science enabled the development of quantum communication network prototypes and created an opportunity...
Show moreRecent advances in quantum information science enabled the development of quantum communication network prototypes and created an opportunity to study full-stack quantum network architectures. The scale and complexity of quantum networks require cost-efficient means for testing and evaluation. Simulators allow for testing hardware, protocols, and applications cost-effectively before constructing experimental networks. This work develops SeQUeNCe, a comprehensive, customizable quantum network simulator. We have explored SeQUeNCe for quantum communication network evaluation. We use SeQUeNCe to study the performance of the quantum network with different hardware and applications. Additionally, we extend SeQUeNCe to a parallel discrete-event simulator by using the message passing interface (MPI). We comprehensively analyze the benefit and overhead of parallelization. The parallelization technique significantly increases the scalability of SeQUeNCe. In the future, we would like to improve SeQUeNCe in three aspects. First, we plan to continue reducing overhead from parallelization and increasing the scalability of SeQUeNCe. Second, we plan to investigate means to model quantum memory, entanglement protocols, and control protocols to enrich simulation models in the SeQUeNCe library. Third, we plan to integrate hardware with SeQUeNCe to enable high-fidelity analysis.
Show less
- Title
- AI IN MEDICINE: ENABLING INTELLIGENT IMAGING, PROGNOSIS, AND MINIMALLY INVASIVE SURGERY
- Creator
- Getty, Neil
- Date
- 2022
- Description
-
While an extremely rich research field, compared to other applications of AI such as natural language processing (NLP) and image processing...
Show moreWhile an extremely rich research field, compared to other applications of AI such as natural language processing (NLP) and image processing/generation, AI in medicine has been much slower to be applied in real-world clinical settings. Often the stakes of failure are more dire, the access of private and proprietary data more costly, and the burden of proof required by expert clinicians is much higher. Beyond these barriers, the often typical data-driven approach towards validation is interrupted by a need for expertise to analyze results. Whereas the results of a trained Imagenet or machine translation model are easily verified by a computational researcher, analysis in medicine can be much more multi-disciplinary demanding. AI in medicine is motivated by a great demand for progress in health-care, but an even greater responsibility for high accuracy, model transparency, and expert validation.This thesis develops machine and deep learning techniques for medical image enhancement, patient outcome prognosis, and minimally invasive robotic surgery awareness and augmentation. Each of the works presented were undertaken in di- rect collaboration with medical domain experts, and the efforts could not have been completed without them. Pursuing medical image enhancement we worked with radiologists, neuroscientists and a neurosurgeon. In patient outcome prognosis we worked with clinical neuropsychologists and a cardiovascular surgeon. For robotic surgery we worked with surgical residents and a surgeon expert in minimally invasive surgery. Each of these collaborations guided priorities for problem and model design, analysis, and long-term objectives that ground this thesis as a concerted effort towards clinically actionable medical AI. The contributions of this thesis focus on three specific medical domains. (1) Deep learning for medical brain scans: developed processing pipelines and deep learn- ing models for image annotation, registration, segmentation and diagnosis in both traumatic brain injury (TBI) and brain tumor cohorts. A major focus of these works is on the efficacy of low-data methods, and techniques for validation of results without any ground truth annotations. (2) Outcome prognosis for TBI and risk prediction for Cardiovascular Disease (CVD): we developed feature extraction pipelines and models for TBI and CVD patient clinical outcome prognosis and risk assessment. We design risk prediction models for CVD patients using traditional Cox modeling, machine learning, and deep learning techniques. In this works we conduct exhaustive data and model ablation study, with a focus on feature saliency analysis, model transparency, and usage of multi-modal data. (3) AI for enhanced and automated robotic surgery: we developed computer vision and deep learning techniques for understanding and augmenting minimally invasive robotic surgery scenes. We’ve developed models to recognize surgical actions from vision and kinematic data. Beyond model and techniques, we also curated novel datasets and prediction benchmarks from simulated and real endoscopic surgeries. We show the potential for self-supervised techniques in surgery, as well as multi-input and multi-task models.
Show less
- Title
- Extreme Fine-grained Parallelism On Modern Many-Core Architectures
- Creator
- Nookala, Poornima
- Date
- 2022
- Description
-
Processors with 100s of threads of execution and GPUs with 1000s of cores are among the state-of-the-art in high-end computing systems. This...
Show moreProcessors with 100s of threads of execution and GPUs with 1000s of cores are among the state-of-the-art in high-end computing systems. This transition to many-core computing has required the community to develop new algorithms to overcome significant latency bottlenecks through massive concurrency. Implementing efficient parallel runtimes that can scale up to hundreds of threads with extremely fine-grained tasks (less than 100 microseconds) remains a challenge. We propose XQueue, a novel lockless concurrent queueing system that can scale up to hundreds of threads. We integrate XQueue into LLVM OpenMP and implement X-OpenMP, a library for lightweight tasking on modern many-core systems with hundreds of cores. We show that it is possible to implement a parallel execution model using lock-less techniques for enabling applications to strongly scale on many-core architectures. While the fork-join model is suitable for on-node parallelism, the use of joins and synchronization induces artificial dependencies which can lead to under utilization of resources. Data-flow based parallelism is crucial to overcome the limitations of fork-join parallelism by specifying dependencies at a finer granularity. It is also crucial for parallel runtime systems to support heterogeneous platforms to better utilize the hardware resources that are available in modern day supercomputers. The existing parallel programming environments that support distributed memory either discover the DAG entirely on all processes which limits the scalability or introduce explicit communications which increases the complexity of programming. We implement Template Task Graph (TTG), a novel programming model and its C++ implementation by marrying the ideas of control and data flowgraph programming. TTG can address the issues of performance portability without sacrificing scalability or programmability by providing higher-level abstractions than conventionally provided by task-centric programming systems, but without impeding the ability of these runtimes to manage task creation and execution as well as data and resource management efficiently. TTG implementation currently supports distributed memory execution over 2 different task runtimes PaRSEC and MADNESS.
Show less
- Title
- Towards a Secure and Resilient Smart Grid Cyberinfrastructure Using Software-Defined Networking
- Creator
- Qu, Yanfeng
- Date
- 2022
- Description
-
To enhance the cyber-resilience and security of the smart grid against malicious attacks and system errors, we present software-defined...
Show moreTo enhance the cyber-resilience and security of the smart grid against malicious attacks and system errors, we present software-defined networking (SDN)-based communication architecture design for smart grid operation. Our design utilizes SDN technology, which improves network manageability, and provides application-oriented visibility and direct programmability, to deploy the multiple SDN-aware applications to enhance grid security and resilience including optimization-based network management to recover Phasor Measurement Unit (PMU) network connectivity and restore power system observability; Flow-based anomaly detection and optimization-based network management to mitigate Manipulation of demand of IoT (MadIoT) attack. We also developed a prototype system in a cyber-physical testbed and conducted extensive evaluation experiments using the IEEE 30-bus system, IEEE 118-bus system, and IIT campus microgrid.
Show less
- Title
- PROGRAM SURVIVABILITY THROUGH K-VARIANT ARCHITECTURE
- Creator
- BEKIROGLU, BERK
- Date
- 2021
- Description
-
Numerous software systems, particularly mission and safety-critical systems, require a high level of security during their execution....
Show moreNumerous software systems, particularly mission and safety-critical systems, require a high level of security during their execution. Enhancing software security through architecture is a highly effective method of defending against cyberattacks. The N-version is a software architecture that was developed to increase the security of software systems. In the N-version architecture, functionally equivalent versions of a program run concurrently to complete a mission or task. Each version is developed independently by a different team using only the software specifications in common. As a result, each version is expected to contain unique vulnerabilities. Due to the high cost of developing and maintaining an N-version system, this architecture is typically used only in high-budget projects requiring a high-security level. The K-variant, an alternative architecture for enhancing system security, is explained and analyzed in this thesis. In contrast to the N-version architecture, each variant is automatically generated using source-to-source program transformation techniques. Automation significantly reduces the cost of developing variants in the K-variant architecture. The K-variant architecture can help protect systems from memory exploitation attacks. Various attack strategies can be used against K-variant systems in order to increase the likelihood of a successful attack. Various attack strategies are proposed and investigated in this thesis. Furthermore, experimental studies are being conducted to investigate various defense mechanisms against proposed attack strategies. The effectiveness of each defense mechanism against various attack strategies is evaluated by using a metric of the probability of an unsuccessful attack. Additionally, various source code program transformation techniques for generating new variants in the K-variant architecture have been proposed and investigated experimentally. This thesis also describes a machine learning technique for estimating the survivability of K-variant systems under various attack types and defense strategies. To make the design of K-variant systems easier, a neural network model is proposed. With the developed tool that utilizes the neural network model, fast and accurate predictions about the survivability of K-variant systems can be obtained.
Show less