Search results
(21 - 40 of 61)
Pages
- Title
- Fast Automatic Bayesian Cubature Using Matching Kernels and Designs
- Creator
- Rathinavel, Jagadeeswaran
- Date
- 2019
- Description
-
Automatic cubatures approximate multidimensional integrals to user-specified error tolerances. In many real-world integration problems, the...
Show moreAutomatic cubatures approximate multidimensional integrals to user-specified error tolerances. In many real-world integration problems, the analytical solution is either unavailable or difficult to compute. To overcome this, one can use numerical algorithms that approximately estimate the value of the integral. For high dimensional integrals, quasi-Monte Carlo (QMC) methods are very popular. QMC methods are equal-weight quadrature rules where the quadrature points are chosen deterministically, unlike Monte Carlo (MC) methods where the points are chosen randomly.The families of integration lattice nodes and digital nets are the most popular quadrature points used. These methods consider the integrand to be a deterministic function. An alternative approach, called Bayesian cubature, postulates the integrand to be an instance of a Gaussian stochastic process. For high dimensional problems, it is difficult to adaptively change the sampling pattern. But one can automatically determine the sample size, $n$, given a fixed and reasonable sampling pattern. We take this approach using a Bayesian perspective. We assume a Gaussian process parameterized by a constant mean and a covariance function defined by a scale parameter and a function specifying how the integrand values at two different points in the domain are related. These parameters are estimated from integrand values or are given non-informative priors. This leads to a credible interval for the integral. The sample size, $n$, is chosen to make the credible interval for the Bayesian posterior error no greater than the desired error tolerance. However, the process just outlined typically requires vector-matrix operations with a computational cost of $O(n^3)$. Our innovation is to pair low discrepancy nodes with matching kernels, which lowers the computational cost to $O(n \log n)$. We begin the thesis by introducing the Bayesian approach to calculate the posterior cubature error and define our automatic Bayesian cubature. Although much of this material is known, it is used to develop the necessary foundations. Some of the major contributions of this thesis include the following: 1) The fast Bayesian transform is introduced. This generalizes the techniques that speedup Bayesian cubature when the kernel matches low discrepancy nodes. 2) The fast Bayesian transform approach is demonstrated using two methods: a) rank-1 lattice sequences and shift-invariant kernels, and b) Sobol' sequences and Walsh kernels. These two methods are implemented as fast automatic Bayesian cubature algorithms in the Guaranteed Automatic Integration Library (GAIL). 3) We develop additional numerical implementation techniques: a) rewriting the covariance kernel to avoid cancellation error, b) gradient descent for hyperparameter search, and c) non-integer kernel order selection.The thesis concludes by applying our fast automatic Bayesian cubature algorithms to three sample integration problems. We show that our algorithms are faster than the basic Bayesian cubature and that they provide answers within the error tolerance in most cases. The Bayesian cubatures that we develop are guaranteed for integrands belonging to a cone of functions that reside in the middle of the sample space. The concept of a cone of functions is also explained briefly.
Show less
- Title
- MULTIVARIABLE SIMULATION PLATFORM FOR TYPE 1 DIABETES AND AUTOMATIC MEAL HANDLING IN ARTIFICIAL PANCREAS SYSTEMS
- Creator
- Samadi, Sediqeh
- Date
- 2019
- Description
-
Artificial pancreas (AP) systems are designed to automate the glucose control in type 1 diabetes mellitus (T1DM). Multivariable artificial...
Show moreArtificial pancreas (AP) systems are designed to automate the glucose control in type 1 diabetes mellitus (T1DM). Multivariable artificial pancreas systems have evolved to incorporate various additional physiological measurements beyond the conventional continuous glucose monitoring measurements to better integrate information on the metabolic state of the patients affecting the glycemic dynamics. The changes in the physiological measurements such as heart rate, energy expenditure, skin temperature, and skin conductance measured by wearable devices are indicative of the changes in the metabolic state. The controller receives the physiological measurements in the feed forward manner which accelerates the remedy control decision in response to the disturbances. Although various AP systems are proposed in the literature to accommodate these additional sources of information, the testing and evaluation of these advanced multivariable AP systems are hindered by the requirements of conducting time-consuming and expensive clinical trials. Development of a simulation platform for rapid prototyping and iterative development of AP systems is one of the main contributions of this study. Simulation platform for T1DM includes a compartmental model generating glucose concentration in response to physical activity in addition to meals and infused insulin. The proposed exercise-glucose-insulin model is an extension of the previously developed glucose-insulin model to derive transient variations in glycemic dynamics caused by physical activity and to improve the glucose prediction accuracy. Physiological variables affected by physical activity, such as heart rate, skin temperature, and blood volume pulse are generated in addition to the glucose concentration in the simulator. The simulation platform includes several virtual patients providing a reliable platform for in silico evaluation of different algorithms proposed for automation of glucose control in T1DM. The multivariable simulator will accelerate the development of next-generation artificial pancreas systems.The development of a disturbance detection algorithm is the other contribution of this study. Meals are major disturbances to the glucose homeostasis, and automated detection of meal consumption and carbohydrate estimation of the consumed meal are critical for fully automated artificial pancreas control systems. In this study, a detection algorithm integrating fuzzy logic classification and qualitative analysis is proposed. A fuzzy logic system estimates the carbohydrate content of the meal.
Show less
- Title
- A SCALABLE SIMULATION AND MODELING FRAMEWORK FOR EVALUATION OF SOFTWARE-DEFINED NETWORKING DESIGN AND SECURITY APPLICATIONS
- Creator
- Yan, Jiaqi
- Date
- 2019
- Description
-
The world today is densely connected by many large-scale computer networks, supporting military applications, social communications, power...
Show moreThe world today is densely connected by many large-scale computer networks, supporting military applications, social communications, power grid facilities, cloud services, and other critical infrastructures. However, a gap has grown between the complexity of the system and the increasing need for security and resilience. We believe this gap is now reaching a tipping point, resulting in a dramatic change in the way that networks and applications are architected, developed, monitored, and protected. This trend calls for a scalable and high-fidelity network testing and evaluation platform to facilitate the transformation from in-house research ideas to real-world working solutions. With this objective, we investigate means to build a scalable and high-fidelity network testbed using container-based emulation and parallel simulation; our study focuses on the emerging software-defined networking (SDN) technology. Existing evaluation platforms facilitate the adoption of the SDN architecture and applications to production systems. However, the performance of those platforms is highly dependent on the underlying physical hardware resources. Insufficient resources would lead to undesired results, such as low experimental fidelity or slow execution speed, especially with large-scale network settings. To improve the testbed fidelity, we first develop a lightweight virtual time system for Linux container and integrate the system into a widely-used SDN emulator. A key issue with an ordinary container-based emulator is that it uses the system clock across all the containers even if a container is not being scheduled to run, which leads to the issue of both performance and temporal fidelity, especially with high workloads. We investigate virtual time approaches by precisely scaling the time of interactions between containers and physical devices. Our evaluation results indicate a definite improvement in fidelity and scalability. To improve the testbed scalability, we investigate how the centralized paradigm of SDN can be utilized to reduce the simulation workload. We explore a model abstraction technique that effectively transforms the SDN network devices to one virtualized switch model. While significantly reducing the model execution time and enabling the real-time simulation capability, our abstracted model also preserves the end-to-end forwarding behavior of the original network.With enhanced fidelity and scalability, it is realistic to utilize our network testbed to perform a security evaluation of various SDN applications. We notice that the communication network generates and processes a huge amount of data. The logically-centralized SDN control plane, on the one hand, has to process both critical control traffic and potentially big data traffic, and on the other hand, enables many efficient security solutions, such as intrusion detection, mitigation, and prevention. Recently, deep neural networks achieve state-of-the-art results across a range of hard problem spaces. We study how to utilize the big data and deep learning to secure communication networks and host entities. For classifying malicious network traffic, we have performed the feasibility study of off-line deep-learning based intrusion detection by constructing the detection engine with multiple advanced deep learning models. For malware classification on individual hosts, another necessity to secure computer systems, existing machine learning-based malware classification methods rely on handcrafted features extracted from raw binary files or disassembled code. The diversity of such features created has made it hard to build generic malware classification systems that work effectively across different operational environments. To strike a balance between generality and performance, we explore new graph convolutional neural network techniques to effectively yet efficiently classify malware programs represented as their control flow graphs.
Show less
- Title
- ENHANCING PRIVACY AND SECURITY IN IOT-BASED SMART HOME
- Creator
- Du, Haohua
- Date
- 2019
- Description
-
The IoT-based smart home is envisioned as a system that augments everyone’s daily life. In the past few years, the smart home attracted...
Show moreThe IoT-based smart home is envisioned as a system that augments everyone’s daily life. In the past few years, the smart home attracted immense attention from the industrial organizations and has been considered as one of the principal pillars of the fourth industrial revolution. However, while the rapidly increasing number of Internet-connected smart devices expends the functionalities of smart homes, it also raises substantial security and privacy concerns.Commonly, a smart home system is composed of three major components, smart devices, communication among devices, and smart applications connecting the devices. Thus, this dissertation aims to enhance the security and privacy of the smart home system without weakening its functionalities from the perspectives of these three components. First, I improve the security of smart devices within the smart home by monitoring their behaviors based on the contextual environment. Then, I enhance the security of the communications among the devices through visible light communication, whose receivers have to be physically visible to senders and avoid possible eavesdropping. Finally, I study two popular smart applications – the augmented reality assistant and the cloud-based surveillance system, to discuss how to define privacy, how to reduce the leakage, and how to balance the privacy and security in the smart home. This dissertation proposes the mechanisms for each component, respectively, and it also implements the design in the real-world for evaluating their effectiveness and efficiency.
Show less
- Title
- Efficient management of uncertain data
- Creator
- Feng, Su
- Date
- 2023
- Description
-
Uncertainty arises naturally in many application domains. It can be caused by an uncertain data source (sensor errors, noise, etc.). Data...
Show moreUncertainty arises naturally in many application domains. It can be caused by an uncertain data source (sensor errors, noise, etc.). Data preprocessing techniques (data curation, data integration, etc.) can also results in uncertainty to the data. Analyzing uncertain data without accounting for its uncertainty can create hard to trace errors, with severe real world implications. Certain answers are a principled method for coping with the uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Other techniques from incomplete database record and propagate more detailed uncertainty information. However, most of these approaches are either too expensive to be practical, or only focus on a narrow class of queries and only work for a specific representation. In this thesis, we investigate models and query semantics for uncertain data management and present a framework that is general and practically efficient, backed up by fundamental theoretical foundations and with formally proven correctness guarantees. We first propose Uncertainty Annotated Databases (UA-DB), which combine an under- and over-approximation of certain answers to combine the reliability of certain answers with the performance of a classical database system. We then introduce attribute-annotated uncertain databases (AU-DB), which extend the UA-DB model with attribute-level annotations that record bounds on the values of an attribute across all possible worlds. AU-DB extends UA-DBs to encode a compact over-approximation of possible answers which is necessary to support non-monotone queries including aggregation and set difference. With a further extension to AU-DB that supports ranking and windowed aggregation queries using native implementation on modern DBMS, our approaches scale to complex queries and large datasets, and produces accurate results. Furthermore, they significantly outperforms alternative methods for uncertain data management.
Show less
- Title
- Designs and Optimizations of Oblivious Data Access for Mitigating Access Pattern Leakage
- Creator
- Che, Yuezhi
- Date
- 2023
- Description
-
In today’s data-driven world, data outsourcing has grown, increasing the importance of data security and privacy. Data encryption, while...
Show moreIn today’s data-driven world, data outsourcing has grown, increasing the importance of data security and privacy. Data encryption, while providing some protection, is insufficient against side-channel attacks such as access pattern leakage. This thesis focuses on designing and optimizing efficient oblivious access methods to enhance data security and privacy. Traditional solutions, like Oblivious RAM (ORAM), often impose significant overheads, limiting their market adoption. Our research proposes novel oblivious data access schemes tailored to specific applications, systems, and contexts. This approach enables us to identify critical vulnerabilities and performance bottlenecks, and balance performance, security, and other relevant parameters. In this thesis, I present four published works in Chapters 3 to 6, demonstrating the effectiveness of my proposed methods: (1) optimizing Ring ORAM for multi-channel memory systems, (2) introducing a multi-range supported ORAM for locality-aware applications, (3) proposing an oblivious data access solution for NVM hybrid memory systems, and (4) developing an oblivious access method for deep neural networks (DNNs), ensuring privacy without sacrificing performance. These contributions address unique challenges across application domains, enhancing data security and privacy in contemporary computing systems. This thesis provides a comprehensive investigation of targeted oblivious access methods, highlighting the benefits of the proposed designs, and contributing to more effective solutions for access pattern leakage mitigation, ultimately improving data security and privacy in contemporary computing systems.
Show less
- Title
- Towards Utility-Driven Data Analytics with Differential Privacy
- Creator
- Wang, Han
- Date
- 2023
- Description
-
The widespread use of personal devices and dedicated recording facilities has led to the generation of massive amounts of personal information...
Show moreThe widespread use of personal devices and dedicated recording facilities has led to the generation of massive amounts of personal information or data. Some of them are high-dimensional and unstructured data, such as video and location data. Analyzing these data can provide significant benefits in real-world scenarios, such as videos for monitoring and location data for traffic analysis. However, while providing benefits, these complicated data always raise serious privacy concerns since all of them involve personal information. To address privacy issues, existing privacy protection methods often fail to provide adequate utility in practical applications due to the complexity of high-dimensional and unstructured data. For example, most video sanitization techniques merely obscure the video by detecting and blurring sensitive regions, such as faces, vehicle plates, locations, and timestamps. Unfortunately, privacy breaches in blurred videos cannot be effectively contained, especially against unknown background knowledge. In this thesis, we propose three different differentially private frameworks to preserve the utility of video and location data (both are high-dimensional and unstructured data) while meeting the privacy requirements, under different well-known privacy settings. Specifically, to our best knowledge, wepropose the first differentially private video analytics platform (VideoDP) which flexibly supports different video queries or query-based analyze with a rigorous privacy guarantee. Given the input video, VideoDP randomly generates a utility-driven private video in which adding or removing any sensitive visual element (e.g., human, and object) does not significantly affect the output video. Then, different video analyses requested by untrusted video analysts can be flexibly performed over the sanitized video with differential privacy. Secondly, we define a novel privacy notion ϵ-Object Indistinguishability for all the predefined sensitive objects (e.g., humans, vehicles) in the video, and then propose a video sanitization technique VERRO that randomly generates utility-driven synthetic videos with indistinguishable objects. Therefore, all the objects can be well protected in the generated utility-driven synthetic videos which can be disclosed to any untrusted video recipient. Third, we propose the first strict local differential privacy (LDP) framework for location-based service (LBS) (“L-SRR”) to privately collect and analyze user locations or trajectories with ε-LDP guarantees. Specifically, we design a novel LDP mechanism “staircase randomized response” (SRR) and extend the empirical estimation to further boost the utility for a diverse set of LBS Apps (e.g., traffic density estimation, k nearest neighbors search, origin-destination analysis, and traffic-aware GPS navigation). Finally, we conduct experiments on real videos and location dataset, and the experimental results demonstrate all frameworks can have good performance.
Show less
- Title
- Approximation Algorithms for Selected Network and Graph Problems
- Creator
- Wang, Xiaolang
- Date
- 2023
- Description
-
This dissertation proposes new polynomial-time approximation algorithms for selected optimization problems, including network and classic...
Show moreThis dissertation proposes new polynomial-time approximation algorithms for selected optimization problems, including network and classic graph problems. We employed distinct strategies and techniques to solve these problems. In Chapter 1, we consider a problem we term FCSA, which aims to find an optimum way how clients are assigned to servers such that the largest latency on an interactivity path between two clients (client 1 to server 1, server 1 to server 2, then server 2 to client 2) is minimized. We present a (3/2)-approximation algorithm for FCSA and a (3/2)-approximation algorithm when server capacity constraints are considered. In Chapter 2, we focus on two variants of the Steiner Tree Problem and present better approximation ratios using known algorithms. For the Steiner Tree with minimum number of Steiner points and bounded edge length problem, we provide a polynomial time algorithm with ratio 2.277. For the Steiner Tree in quasi-bipartite graphs, we improve the best-known approximation ratio to 298/245 . In Chapter 3, we address the problem of searching for a maximum weighted series-parallel subgraph in a given graph, and present a (1/2 + 1/60)-approximation for this problem. Although there is currently no known real-life application of this problem, it remains an important and challenging open question in the field.
Show less
- Title
- Image Synthesis with Generative Adversarial Networks
- Creator
- Ouyang, Xu
- Date
- 2023
- Description
-
Image synthesis refers to the process of generating new images from an existing dataset, with the objective of creating images that closely...
Show moreImage synthesis refers to the process of generating new images from an existing dataset, with the objective of creating images that closely resemble the target images, learned from the source data distribution. This technique has a wide range of applications, including transforming captions into images, deblurring blurred images, and enhancing low-resolution images. In recent years, deep learning techniques, particularly Generative Adversarial Network (GAN), has achieved significant success in this field. GAN consists of a generator (G) and a discriminator (D) and employ adversarial learning to synthesize images. Researchers have developed various strategies to improve GAN performance, such as controlling learning rates for different models and modifying the loss functions. This thesis focuses on image synthesis from captions using GANs and aims to improve the quality of generated images. The study is divided into four main parts:In the first part, we investigate the LSTM conditional GAN which is to generate images from captions. We use the word2vec as the caption features and combine these features’ information by LSTM and generate images via conditional GAN. In the second part, to improve the quality of generated images, we address the issue of convergence speed and enhance GAN performance using an adaptive WGAN update strategy. We demonstrate that this update strategy is applicable to Wasserstein GAN(WGAN) and other GANs that utilize WGAN-related loss functions. The proposed update strategy is based on a loss change ratio comparison between G and D. In the third part, to further enhance the quality of synthesized images, we investigate a transformer-based Uformer GAN for image restoration and propose a two-step refinement strategy. Initially, we train a Uformer model until convergence, followed by training a Uformer GAN using the restoration results obtained from the first step.In the fourth part, to generate fine-grained image from captions, we delve into the Recurrent Affine Transformation (RAT) GAN for fine-grained text-to-image synthesis. By incorporating an auxiliary classifier in the discriminator and employing a contrastive learning method, we improve the accuracy and fine-grained details of the synthesized images.Throughout this thesis, we strive to enhance the capabilities of GANs in various image synthesis applications and contribute valuable insights to the field of deep learning and image processing.
Show less
- Title
- A Novel Explainability Approach For Spectrum Measurement Insight
- Creator
- Nagpure, Vaishali
- Date
- 2023
- Description
-
Spectrum is an extremely valuable natural resource in high demand. Although the spectrum has been fully allocated, there is no comprehensive...
Show moreSpectrum is an extremely valuable natural resource in high demand. Although the spectrum has been fully allocated, there is no comprehensive method for understanding about how it’s being used. Spectrum measurements are highly complex spatiotemporal data sets that play a key role in understanding spectrum use and require very specialized domain information for understanding. To leverage existing and future spectrum measurements to the fullest extent, it is necessary to have a systematic way to connect them to the contextual information that helps provide meaning to the data. To analyze and interpret the measurements, a variety of contextual information is needed. This research develops a novel approach for spectrum measurement understanding that unifies five years of wideband spectrum measurement summary data together with relevant contextual information from a variety of sources in a spectrum knowledge graph. Both quantitative and qualitative information is modeled and implemented on a Neo4j graph database platform. This modeling formalizes the relationships that help spectrum stakeholders “connect the dots” and provide deeper understanding of RF spectrum utilization. The knowledge graph can be queried to extract a wide variety of insights thus making spectrum knowledge more widely accessible to a variety of stakeholders.
Show less
- Title
- A SCALABLE AND CUSTOMIZABLE SIMULATION PLATFORM FOR ACCURATE QUANTUM NETWORK DESIGN AND EVALUATION
- Creator
- Wu, Xiaoliang
- Date
- 2021
- Description
-
Recent advances in quantum information science enabled the development of quantum communication network prototypes and created an opportunity...
Show moreRecent advances in quantum information science enabled the development of quantum communication network prototypes and created an opportunity to study full-stack quantum network architectures. The scale and complexity of quantum networks require cost-efficient means for testing and evaluation. Simulators allow for testing hardware, protocols, and applications cost-effectively before constructing experimental networks. This work develops SeQUeNCe, a comprehensive, customizable quantum network simulator. We have explored SeQUeNCe for quantum communication network evaluation. We use SeQUeNCe to study the performance of the quantum network with different hardware and applications. Additionally, we extend SeQUeNCe to a parallel discrete-event simulator by using the message passing interface (MPI). We comprehensively analyze the benefit and overhead of parallelization. The parallelization technique significantly increases the scalability of SeQUeNCe. In the future, we would like to improve SeQUeNCe in three aspects. First, we plan to continue reducing overhead from parallelization and increasing the scalability of SeQUeNCe. Second, we plan to investigate means to model quantum memory, entanglement protocols, and control protocols to enrich simulation models in the SeQUeNCe library. Third, we plan to integrate hardware with SeQUeNCe to enable high-fidelity analysis.
Show less
- Title
- AI IN MEDICINE: ENABLING INTELLIGENT IMAGING, PROGNOSIS, AND MINIMALLY INVASIVE SURGERY
- Creator
- Getty, Neil
- Date
- 2022
- Description
-
While an extremely rich research field, compared to other applications of AI such as natural language processing (NLP) and image processing...
Show moreWhile an extremely rich research field, compared to other applications of AI such as natural language processing (NLP) and image processing/generation, AI in medicine has been much slower to be applied in real-world clinical settings. Often the stakes of failure are more dire, the access of private and proprietary data more costly, and the burden of proof required by expert clinicians is much higher. Beyond these barriers, the often typical data-driven approach towards validation is interrupted by a need for expertise to analyze results. Whereas the results of a trained Imagenet or machine translation model are easily verified by a computational researcher, analysis in medicine can be much more multi-disciplinary demanding. AI in medicine is motivated by a great demand for progress in health-care, but an even greater responsibility for high accuracy, model transparency, and expert validation.This thesis develops machine and deep learning techniques for medical image enhancement, patient outcome prognosis, and minimally invasive robotic surgery awareness and augmentation. Each of the works presented were undertaken in di- rect collaboration with medical domain experts, and the efforts could not have been completed without them. Pursuing medical image enhancement we worked with radiologists, neuroscientists and a neurosurgeon. In patient outcome prognosis we worked with clinical neuropsychologists and a cardiovascular surgeon. For robotic surgery we worked with surgical residents and a surgeon expert in minimally invasive surgery. Each of these collaborations guided priorities for problem and model design, analysis, and long-term objectives that ground this thesis as a concerted effort towards clinically actionable medical AI. The contributions of this thesis focus on three specific medical domains. (1) Deep learning for medical brain scans: developed processing pipelines and deep learn- ing models for image annotation, registration, segmentation and diagnosis in both traumatic brain injury (TBI) and brain tumor cohorts. A major focus of these works is on the efficacy of low-data methods, and techniques for validation of results without any ground truth annotations. (2) Outcome prognosis for TBI and risk prediction for Cardiovascular Disease (CVD): we developed feature extraction pipelines and models for TBI and CVD patient clinical outcome prognosis and risk assessment. We design risk prediction models for CVD patients using traditional Cox modeling, machine learning, and deep learning techniques. In this works we conduct exhaustive data and model ablation study, with a focus on feature saliency analysis, model transparency, and usage of multi-modal data. (3) AI for enhanced and automated robotic surgery: we developed computer vision and deep learning techniques for understanding and augmenting minimally invasive robotic surgery scenes. We’ve developed models to recognize surgical actions from vision and kinematic data. Beyond model and techniques, we also curated novel datasets and prediction benchmarks from simulated and real endoscopic surgeries. We show the potential for self-supervised techniques in surgery, as well as multi-input and multi-task models.
Show less
- Title
- Integrating Provenance Management and Query Optimization
- Creator
- Niu, Xing
- Date
- 2021
- Description
-
Provenance, information about the origin of data and the queries and/or updates that produced it, is critical for debugging queries and...
Show moreProvenance, information about the origin of data and the queries and/or updates that produced it, is critical for debugging queries and transactions, auditing, establishing trust in data, and many other use cases.While how to model and capture the provenance of database queries has been studied extensively, optimization was recognized as an important problem in provenance management which includes storing, capturing, querying provenance and so on. However, previous work has almost exclusively focused on how to compress provenance to reduce storage cost, there is a lack of work focusing on optimizing provenance capture process. Many approaches for capturing database provenance are using SQL query language and representing provenance information as a standard relation. However, even sophisticated query optimizers often fail to produce efficient execution plans for such queries because of the query complexity and uncommon structures. To address this problem, we study algebraic equivalences and alternative ways of generating queries for provenance capture. Furthermore, we present an extensible heuristic and cost-based optimization framework utilizing these optimizations. While provenance has been well studied, no database optimizer is aware of using provenance information to optimize the query processing. Intuitively, provenance records exactly what data is relevant for a query. We can use this feature of provenance to figure out and filter out irrelevant input data of a query early on and such that the query processing will be speeded up. The reason is that instead of fully accessing the input dataset, we only run the query on the relevant input data. In this work, we develop provenance-based data skipping (PBDS), a novel approach that generates provenance sketches which are concise encodings of what data is relevant for a query. In addition, a provenance sketch captured for one query is used to speed up subsequent queries, possibly by utilizing physical design artifacts such as indexes and zone maps. The work we present in this thesis demonstrates a tight integration between provenance management and query optimization can lead a significant performance improvement of query processing as well as traditional database management task.
Show less
- Title
- Extreme Fine-grained Parallelism On Modern Many-Core Architectures
- Creator
- Nookala, Poornima
- Date
- 2022
- Description
-
Processors with 100s of threads of execution and GPUs with 1000s of cores are among the state-of-the-art in high-end computing systems. This...
Show moreProcessors with 100s of threads of execution and GPUs with 1000s of cores are among the state-of-the-art in high-end computing systems. This transition to many-core computing has required the community to develop new algorithms to overcome significant latency bottlenecks through massive concurrency. Implementing efficient parallel runtimes that can scale up to hundreds of threads with extremely fine-grained tasks (less than 100 microseconds) remains a challenge. We propose XQueue, a novel lockless concurrent queueing system that can scale up to hundreds of threads. We integrate XQueue into LLVM OpenMP and implement X-OpenMP, a library for lightweight tasking on modern many-core systems with hundreds of cores. We show that it is possible to implement a parallel execution model using lock-less techniques for enabling applications to strongly scale on many-core architectures. While the fork-join model is suitable for on-node parallelism, the use of joins and synchronization induces artificial dependencies which can lead to under utilization of resources. Data-flow based parallelism is crucial to overcome the limitations of fork-join parallelism by specifying dependencies at a finer granularity. It is also crucial for parallel runtime systems to support heterogeneous platforms to better utilize the hardware resources that are available in modern day supercomputers. The existing parallel programming environments that support distributed memory either discover the DAG entirely on all processes which limits the scalability or introduce explicit communications which increases the complexity of programming. We implement Template Task Graph (TTG), a novel programming model and its C++ implementation by marrying the ideas of control and data flowgraph programming. TTG can address the issues of performance portability without sacrificing scalability or programmability by providing higher-level abstractions than conventionally provided by task-centric programming systems, but without impeding the ability of these runtimes to manage task creation and execution as well as data and resource management efficiently. TTG implementation currently supports distributed memory execution over 2 different task runtimes PaRSEC and MADNESS.
Show less
- Title
- Towards a Secure and Resilient Smart Grid Cyberinfrastructure Using Software-Defined Networking
- Creator
- Qu, Yanfeng
- Date
- 2022
- Description
-
To enhance the cyber-resilience and security of the smart grid against malicious attacks and system errors, we present software-defined...
Show moreTo enhance the cyber-resilience and security of the smart grid against malicious attacks and system errors, we present software-defined networking (SDN)-based communication architecture design for smart grid operation. Our design utilizes SDN technology, which improves network manageability, and provides application-oriented visibility and direct programmability, to deploy the multiple SDN-aware applications to enhance grid security and resilience including optimization-based network management to recover Phasor Measurement Unit (PMU) network connectivity and restore power system observability; Flow-based anomaly detection and optimization-based network management to mitigate Manipulation of demand of IoT (MadIoT) attack. We also developed a prototype system in a cyber-physical testbed and conducted extensive evaluation experiments using the IEEE 30-bus system, IEEE 118-bus system, and IIT campus microgrid.
Show less
- Title
- PROGRAM SURVIVABILITY THROUGH K-VARIANT ARCHITECTURE
- Creator
- BEKIROGLU, BERK
- Date
- 2021
- Description
-
Numerous software systems, particularly mission and safety-critical systems, require a high level of security during their execution....
Show moreNumerous software systems, particularly mission and safety-critical systems, require a high level of security during their execution. Enhancing software security through architecture is a highly effective method of defending against cyberattacks. The N-version is a software architecture that was developed to increase the security of software systems. In the N-version architecture, functionally equivalent versions of a program run concurrently to complete a mission or task. Each version is developed independently by a different team using only the software specifications in common. As a result, each version is expected to contain unique vulnerabilities. Due to the high cost of developing and maintaining an N-version system, this architecture is typically used only in high-budget projects requiring a high-security level. The K-variant, an alternative architecture for enhancing system security, is explained and analyzed in this thesis. In contrast to the N-version architecture, each variant is automatically generated using source-to-source program transformation techniques. Automation significantly reduces the cost of developing variants in the K-variant architecture. The K-variant architecture can help protect systems from memory exploitation attacks. Various attack strategies can be used against K-variant systems in order to increase the likelihood of a successful attack. Various attack strategies are proposed and investigated in this thesis. Furthermore, experimental studies are being conducted to investigate various defense mechanisms against proposed attack strategies. The effectiveness of each defense mechanism against various attack strategies is evaluated by using a metric of the probability of an unsuccessful attack. Additionally, various source code program transformation techniques for generating new variants in the K-variant architecture have been proposed and investigated experimentally. This thesis also describes a machine learning technique for estimating the survivability of K-variant systems under various attack types and defense strategies. To make the design of K-variant systems easier, a neural network model is proposed. With the developed tool that utilizes the neural network model, fast and accurate predictions about the survivability of K-variant systems can be obtained.
Show less
- Title
- Workload Interference Analysis and Mitigation on Dragonfly Class Networks
- Creator
- Kang, Yao
- Date
- 2022
- Description
-
Dragonfly class of networks are promising interconnect topologies that support current and next-generation high-performance computing (HPC)...
Show moreDragonfly class of networks are promising interconnect topologies that support current and next-generation high-performance computing (HPC) systems. Serving as the "central nervous system", Dragonfly tightly couples tens of thousands of compute nodes together by providing high-bandwidth, low-latency data exchange for exascale computing capability. Dragonfly can support unprecedented system scale at a reasonable cost thanks to its hierarchical architecture. In Dragonfly systems, network resources such as routers and links are arranged into identical groups.Groups are all-to-all connected through global links, and routers within groups are connected via local links. In contrast to the fully connected inter-group topology, connections for the routers within groups are designed according to the system requirement. For example, the one-dimensional all-to-all connection is favored for higher network bandwidth, a two-dimensional grid arrangement can be constructed to support larger system size, and a tree structure router connection is built for the extreme system scale. The hierarchical design with groups enables the topology to support unprecedented system size while maintaining a low-diameter network. Packets can be minimally delivered by simply traversing the network hierarchy between groups through global links and reaching their destinations through local links. In case of network congestion, packets can be non-minimally forwarded through any intermediate group to increase the system throughput. As a result, all network resources are shared such that links and routers are not dedicated to any node pair. While link utilization is increased, shared network resources lead to inevitable network contention among different traffic flows, especially for the systems that hold multiple workloads at the same time. This network contention is observed as the workload interference that causes degraded system performance with delayed workload execution time. In this thesis, we first model and analyze the workload interference effect on Dragonfly+ topology through extensive system simulation.Based on the comprehensive interference study, we propose Q-adaptive routing, a multi-agent reinforcement learning based solution for Dragonfly systems. Compared with the existing routing solutions, the proposed Q-adaptive routing can learn to forward packets more efficiently with smaller packet latency and higher system throughput. Next, we demonstrate that intelligent routing algorithms such as Q-adaptive routing can greatly mitigate workload interference and optimize the overall system performance. Subsequently, we propose a dynamic job placement strategy for workload interference prevention. When combined with Q-adaptive routing, dynamic job placement gives users the flexibility to either reduce workload interference from communication intensive applications or protect target applications for higher performance stability.
Show less
- Title
- Technological Consciousness in Midwestern American Farming: From Party Lines to Autonomous Tractors
- Creator
- Sziron, Mónika
- Date
- 2022
- Description
-
This dissertation is primarily concerned with understanding the current conceptions, perceptions, and ethical concerns of artificial...
Show moreThis dissertation is primarily concerned with understanding the current conceptions, perceptions, and ethical concerns of artificial intelligence in Midwestern agriculture. Using the theory of technological consciousness as a backdrop for understanding the relationship between Midwestern agriculture and technology, in chapter two this dissertation first provides a narrative review of major technological developments throughout history in Midwestern farming and how the human experience in farming is influenced by technology throughout history. This history provides context for the current state of Midwestern agriculture, which is now increasingly entangled with artificial intelligence. The theory behind artificial intelligence ethics and general trends in artificial intelligence are discussed in chapter three. To understand present conceptions, perceptions, and ethical concerns of artificial intelligence for Midwestern farmers, a pilot survey was dispersed to farmers and pilot media content analysis was conducted on Midwestern agriculture publications. The results from this pilot survey and pilot media content analysis are discussed in chapter four. Chapter five delves into theory and how the human experience with technology has evolved over time and its effects on the human experience today. This chapter also provides theoretical insights for the future of farming with artificial intelligence. The dissertation concludes with reviewing the ethical concerns relating to artificial intelligence in agriculture for Midwestern farmers, provides recommendations for developers of agriculture technology, and highlights the new partnership between farmers and computer scientists and how this partnership will lead the way in the future of Midwestern farming.
Show less
- Title
- Quantum Computation for the Understanding of Mass: Simulating Quantum Field Theories
- Creator
- Rivero Ramírez, Pedro
- Date
- 2021
- Description
-
This thesis demonstrates the production of hadron mass on a quantum computer. Working in the Nambu–Jona-Lasinio model in 1+1 dimensions and 2...
Show moreThis thesis demonstrates the production of hadron mass on a quantum computer. Working in the Nambu–Jona-Lasinio model in 1+1 dimensions and 2 flavors, I show a separation of the contribution of quark masses and interactions to the mass. Along the way I develop a new tool called Quantum Sampling Regression (QSR) that allows for an optimal sampling of low qubit quantum computers when using hybrid variational eigenvalue solving techniques. I demonstrate the regime where QSR dominates the current standard Variational Eigensolver Technique, and benchmark it by improving the calculation of deuteron binding energy. Finally, I developed QRAND — a multiprotocol and multiplatform quantum random number generation framework — in support of the quantum computing community.
Show less
- Title
- Distribution-aware Visual Semantic Understanding
- Creator
- Chen, Ying
- Date
- 2021
- Description
-
Understanding visual semantics, including change detection and semantic segmentation, is an essential task in many computer vision and image...
Show moreUnderstanding visual semantics, including change detection and semantic segmentation, is an essential task in many computer vision and image processing applications. Examples of visual semantics understanding in images include land cover monitoring, urban expansion evaluation, autonomous driving, and scene understanding. The goal is to locate and recognize appropriate pixel-wise semantic labels in images. Classical computer vision algorithms involve sophisticated semi-heuristic pre-processing steps and potentially manual interaction. In this thesis, I propose and evaluate end-to-end deep neural approaches for processing images which achieve better performance compared with existing approaches. Supervised semantic segmentation has been widely studied and achieved great success with deep learning. However, existing deep learning methods typically suffer from generalization issues where a well-trained model may not work well on unseen samples from a different dataset. This is due to a distribution change or domain shift between the training and test sets that can degrade performance. Providing more labeled samples covering many possible variations can further improve the generalization of models, but acquiring labeled data is typically time-consuming, labor-intensive and requires domain knowledge. To tackle this label scarcity bottleneck for supervised learning, we propose to apply unsupervised domain adaptation, semi-supervised learning, and semi-supervised domain adaptation for neural semantic segmentation. The motivation behind unsupervised domain adaptation for semantic segmentation is to transfer learned knowledge from one or more source domains with sufficient labeled samples to a different but relevant target domain where labeled data is sparse or non-existent. The adaptation algorithm tends to learn a common representation space where the distributions over both source and target domains are matched. In this way, we expect a classifier working well in the source domain to generalize well to the target domain. More specifically, we try to learn class-aware source-target domain distribution differences, and transfer the knowledge learned from labeled synthetic data on the source domain to the unlabeled real data on the target domain. Different from domain adaptation, semi-supervised semantic segmentation aims at utilizing a large amount of unlabeled data to improve semantic classification trained on a small amount of labeled data from the same distribution. Specifically, supervised semantic segmentation is trained together with an unsupervised model by applying perturbations on encoded states of the network instead of the input, or using mask-based data augmentation techniques to encourage consistent predictions over mixed samples. In this way, learned representation which capture many kinds of unseen variations in unlabeled data, benefit the supervised semantic classifier. We propose a mask-based data augmentation semi-supervised learning network to utilize structure information from a variety of unlabeled examples to improve the learning on a limited number of labeled examples.Both unsupervised domain adaptation (UDA) with full source supervision but without target supervision and semi-supervised learning (SSL) with partial supervision have shown to be able to address the generalization problem to some extent. While such methods are effective at aligning different feature distributions, their inability to efficiently exploit unlabeled data leads to intra-domain discrepancy in the target domain, where the target domain is separated into two unaligned sub-distributions due to source-aligned and target-aligned data. That is, enforcing partial alignment between full labeled source data and a few labeled target data does not guarantee that the remaining unlabeled target samples will be aligned with source feature clusters, thus leaving them unaligned. Hence, I propose methods for incorporating the advantages of both UDA and SSL, termed semi-supervised domain adaptation (SSDA), with a goal to align cross-domain features as well as addressing the intra-domain discrepancy within the target domain. I propose a simple yet effective semi-supervised domain adaptation approach by utilizing a two-step domain adaptation addressing both cross-domain and intra-domain shifts.
Show less