Search results
(121 - 124 of 124)
Pages
- Title
- Resilience Enhancement of Critical Cyber-Physical Systems with Advanced Network Control
- Creator
- Liu, Xin
- Date
- 2020
- Description
-
Critical infrastructures are the systems whose failures would have a debilitating impact on national security, economics, public health or...
Show moreCritical infrastructures are the systems whose failures would have a debilitating impact on national security, economics, public health or safety, or any combination of those matters. It is important to improve those systems' resilience, which is the ability to reduce the magnitude and/or duration of disruptive events. However, today’s critical infrastructures, such as electrical power system and transportation system, are deploying advanced control applications with increasing scale and complexity, which leads to the migration of their underlying communication infrastructures from simple and proprietary networks to off-the-shelf network technologies (e.g., IP-based protocols and standards) to handle the intensive and heterogeneous traffic flows. On one hand, this migration provides an opportunity for both academic and industry communities to develop novel ideas on top of existing schemes; on the other hand, it exposes more vulnerabilities for cyber-attacks. Moreover, since the large-scale power system may choose leased networks from Internet service providers (which is a critical infrastructure itself), there exists an interdependency relationship between power and communication infrastructures, where the power transmission control requires message delivery services while the network devices rely on the power supply. These problems raise research challenges to improve the system resilience of critical cyber-physical systems.In this thesis, we focus on resilience enhancement of critical infrastructures from the communication network's aspects. The application domain includes both power and transportation systems. For power systems, we first apply advanced network control techniques (i.e., software-defined network (SDN) and fibbing control scheme) in the transmission grid communication network to improve the grid status restoration process under network failures and cyber-attacks. We develop a unified system model that contains both transmission grid monitoring system (i.e., phasor measurement unit (PMU) network) and communication network, and formalize a mixed-integer linear programming (MILP) problem to minimize the recovery time of system observability with the power and communication domain constraints. We evaluate the system performance regarding the recovery plan generation and installation using IEEE standard systems. However, the advanced network-based control scheme could also lead to problems, since it requires a power supply for the network devices. Thus, we investigate the interdependency relationship between the power grid and communication network and its impact on system resilience. We conduct a survey work that summarizes existing research based on two dimensions: objectives (i.e., failure analysis, vulnerability analysis, failure mitigation, and failure recovery) and methodologies (i.e., analytical solutions, co-simulation, and empirical studies). We also identify the limitations of existing works and propose potential research opportunities in this demanding area. Lastly, based on the review work, we conduct research that focuses on fast power distribution system restoration that involves interdependency constraints. When a natural disaster happens, both power and communication components might be damaged. Furthermore, since they are dependent on each other's service to function correctly, the failures may propagate to the hardware/software that are not affected initially. In this work, we focus on the recovery stage where the failed components in the system are already fully detected and isolated. We construct a mathematical model of the co-existing power and communication system and use optimization techniques to produce a crew dispatch plan that restores power as fast as possible by coordinating damage repairing, switch operation, and communication supply processes. We evaluate the restoration efficiency on the IEEE standard system using both analytical analysis and discrete-event simulation.For the second application domain, railway transportation system, we focus on evaluating the resilience of its communication system that exchanges control and monitoring messages with both on-board driver cabin and remote control center. We use advanced discrete-event simulation techniques to achieve a high-fidelity model of the network which makes the evaluation more concrete and realistic. For the Ethernet-based on-board train communication network (TCN), we develop a parallel simulation platform according to the IEC standard and use it to conduct a case study of a double-tagging VLAN attack on this control network. Another component of the railway communication system is the train-to-ground network that enables the communication between the driving system on the train and the control center that issues commands such as the movement authority messages. We customize the NS3 network simulator to model the LTE-based protocol with a real high-speed train trace dataset from public sources. We evaluate the resilience of the cellular network specifically on the handover process, which happens when the train travels from one base station to another. Due to the high-speed nature, the handover success rate is impacted and there are many protocol-based solutions proposed in this research area. We use the high-fidelity simulation model to evaluate some of them and compare the pros and cons.
Show less
- Title
- Efficient and Practical Cluster Scheduling for High Performance Computing
- Creator
- Li, Boyang
- Date
- 2023
- Description
-
Cluster scheduling plays a crucial role in the high-performance computing (HPC) area. It is responsible for allocating resources and...
Show moreCluster scheduling plays a crucial role in the high-performance computing (HPC) area. It is responsible for allocating resources and determining the order in which jobs are executed. Existing HPC job schedulers typically leverage simpleheuristics to schedule jobs, but such scheduling policies struggle to keep pace with modern changes and technology trends. The study of this dissertation is motivated by two new trends in HPC community: the rapid growth of heterogeneous system infrastructure and the emergence of artificial intelligence (AI) technologies. First, existing scheduling policies are solely CPU-centric. In contrast, systems become more complex and heterogeneous, and emerging workloads have diverse resource requirements, such as CPU, burst buffer, power, network bandwidth, and so on. Second, previous heuristic scheduling approaches are manually designed. Such a manual design process prevents adaptive and informative scheduling decisions. A recent trend in HPC is to intertwine AI to better leverage the investment of supercomputers. This embrace of AI provides opportunities to design more intelligent scheduling methods. In this dissertation, we propose an efficient and practical cluster scheduling framework for HPC systems. Our framework leverages AI technologies and considers system heterogeneity. The framework comprises four major components. First, shared network systems such as dragonfly-based systems are vulnerable to performance variability due to network sharing. To mitigate workload interference on these shared network systems, we explore a dedicated scheduling policy. Next, emerging workloads in HPC have diverse resource requirements instead of being CPU-centric. To cater to this, we design an intelligent scheduling agent for multi-resource scheduling in HPC leveraging the advanced multi-objective reinforcement learning (MORL) algorithm. Subsequently, we address the issues with existing state encoding approaches in RL-driven scheduling, which either lack critical scheduling information or suffer from poor scalability. To this end, we present an efficient and scalable encoding model. Lastly, the lack of interpretability of RL methods poses a significant challenge to deploying RL-driven scheduling in production systems. In response, we provide a simple, deterministic, and easily understandable model for interpreting RL-driven scheduling. The proposed models and algorithms are evaluated with real job traces from production supercomputers. Experimental results show our schemes can effectively improve job scheduling in terms of both user satisfaction and system utilization.
Show less
- Title
- Utilizing Concurrent Data Accesses for Data-Driven and AI Applications
- Creator
- Lu, Xiaoyang
- Date
- 2024
- Description
-
In the evolving landscape of data-driven and AI applications, the imperative for reducing data access delay has never been more critical,...
Show moreIn the evolving landscape of data-driven and AI applications, the imperative for reducing data access delay has never been more critical, especially as these applications increasingly underpin modern daily life. Traditionally, architectural optimizations in computing systems have concentrated on data locality, utilizing temporal and spatial locality to enhance data access performance by maximizing data and data block reuse. However, as poor locality is a common characteristic of data-driven and AI applications, utilizing data access concurrency emerges as a promising avenue to optimize the performance of evolving data-driven and AI application workloads.This dissertation advocates utilizing concurrent data accesses to enhance performance in data-driven and AI applications, addressing a significant research gap in the integration of data concurrency for performance improvement. It introduces a suite of innovative case studies, including a prefetching framework that dynamically adjusts aggressiveness based on data concurrency, a cache partitioning framework that balances application demands with concurrency, a concurrency-aware cache management framework to reduce costly cache misses, a holistic cache management framework that considers both data locality and concurrency to fine-tune decisions, and an accelerator design for sparse matrix multiplication that optimizes adaptive execution flow and incorporates concurrency-aware cache optimizations.Our comprehensive evaluations demonstrate that the implemented concurrency-aware frameworks significantly enhance the performance of data-driven and AI applications by leveraging data access concurrency.Specifically, our prefetch framework boosts performance by 17.3%, our cache partitioning framework surpasses locality-based approaches by 15.5%, and our cache management framework achieves a 10.3% performance increase over prior works. Furthermore, our holistic cache management framework enhances performance further, achieving a 13.7% speedup. Additionally, our sparse matrix multiplication accelerator outperforms existing accelerators by a factor of 2.1.As optimizing data locality in data-driven and AI applications becomes increasingly challenging, this dissertation demonstrates that utilizing concurrency can still yield significant performance enhancements, offering new insights and actionable examples for the field. This dissertation not only bridges the identified research gap but also establishes a foundation for further exploration of the full potential of concurrency in data-driven and AI applications and architectures, aiming at fulfilling the evolving performance demands of modern and future computing systems.
Show less
- Title
- Multimodal Learning and Generation Toward a Multisensory and Creative AI System
- Creator
- Zhu, Ye
- Date
- 2023
- Description
-
We are perceiving and communicating with the world in a multisensory manner, where different information sources are sophisticatedly processed...
Show moreWe are perceiving and communicating with the world in a multisensory manner, where different information sources are sophisticatedly processed and interpreted by separate parts of the human brain to constitute a complex, yet harmonious and unified intelligent system. To endow the machines with true intelligence, multimodal machine learning that incorporates data from various modalities including vision, audio, and text, has become an increasingly popular research area with emerging technical advances in recent years. Under the context of multimodal learning, the creativity to generate and synthesize novel and meaningful data is a critical criterion to assess machine intelligence.As a step towards a multisensory and creative AI system, we study the problem of multimodal generation in this thesis by exploring the field from multiple perspectives. Firstly, we analyze different data modalities in a comprehensive manner by comparing the data natures, the semantics, and their corresponding mainstream technical designs. We then propose to investigate three multimodal generation application scenarios, namely text generation from visual data, audio generation from visual data, and visual generation from textual data, with diverse approaches to give an overview of the field. For the direction of text generation from visual data, we study a novel multimodal task in which the model is expected to summarize a given video with textual descriptions, under a challenging condition where the video can only be partially seen. We propose to supplement the missing visual information via a dialogue interaction and introduce QA-Cooperative network with a dynamic dialogue history update learning mechanism to tackle the challenge. For the direction of audio generation from visual data, we present a new multimodal task that aims to generate music for a given silent dance video clip. Unlike most existing conditional music generation works that generate specific types of mono-instrumental sounds using symbolic audio representations (e.g., MIDI), and that heavily rely on pre-defined musical synthesizers, we generate dance music in complex styles (e.g., pop, breaking, etc.) by employing a Vector-Quantized (VQ) audio representation via our proposed Dance2Music-GAN (D2M-GAN) framework. For the direction of visual generation from textual data, we tackle a key desideratum in conditional synthesis, which is to achieve high correspondence between the conditioning input and generated output using the state-of-the-art generative model -- Diffusion Probabilistic Model. While most existing methods learn such relationships implicitly, by incorporating the prior into the variational lower bound in model training. In this work, we take a different route by explicitly enhancing input-output connections by maximizing their mutual information, which is achieved by our proposed Conditional Discrete Contrastive Diffusion (CDCD) framework. For each direction, we conduct extensive experiments on multiple multimodal datasets and demonstrate that all of our proposed frameworks are able to effectively and substantially improve task performance in their corresponding contexts.
Show less