Resilience Enhancement of Critical Cyber-Physical Systems with Advanced Network Control
Description
Critical infrastructures are the systems whose failures would have a debilitating impact on national security, economics, public health or safety, or any combination of those matters. It is important to improve those systems' resilience, which is the ability to reduce the magnitude and/or... Show moreCritical infrastructures are the systems whose failures would have a debilitating impact on national security, economics, public health or safety, or any combination of those matters. It is important to improve those systems' resilience, which is the ability to reduce the magnitude and/or duration of disruptive events. However, today’s critical infrastructures, such as electrical power system and transportation system, are deploying advanced control applications with increasing scale and complexity, which leads to the migration of their underlying communication infrastructures from simple and proprietary networks to off-the-shelf network technologies (e.g., IP-based protocols and standards) to handle the intensive and heterogeneous traffic flows. On one hand, this migration provides an opportunity for both academic and industry communities to develop novel ideas on top of existing schemes; on the other hand, it exposes more vulnerabilities for cyber-attacks. Moreover, since the large-scale power system may choose leased networks from Internet service providers (which is a critical infrastructure itself), there exists an interdependency relationship between power and communication infrastructures, where the power transmission control requires message delivery services while the network devices rely on the power supply. These problems raise research challenges to improve the system resilience of critical cyber-physical systems.In this thesis, we focus on resilience enhancement of critical infrastructures from the communication network's aspects. The application domain includes both power and transportation systems. For power systems, we first apply advanced network control techniques (i.e., software-defined network (SDN) and fibbing control scheme) in the transmission grid communication network to improve the grid status restoration process under network failures and cyber-attacks. We develop a unified system model that contains both transmission grid monitoring system (i.e., phasor measurement unit (PMU) network) and communication network, and formalize a mixed-integer linear programming (MILP) problem to minimize the recovery time of system observability with the power and communication domain constraints. We evaluate the system performance regarding the recovery plan generation and installation using IEEE standard systems. However, the advanced network-based control scheme could also lead to problems, since it requires a power supply for the network devices. Thus, we investigate the interdependency relationship between the power grid and communication network and its impact on system resilience. We conduct a survey work that summarizes existing research based on two dimensions: objectives (i.e., failure analysis, vulnerability analysis, failure mitigation, and failure recovery) and methodologies (i.e., analytical solutions, co-simulation, and empirical studies). We also identify the limitations of existing works and propose potential research opportunities in this demanding area. Lastly, based on the review work, we conduct research that focuses on fast power distribution system restoration that involves interdependency constraints. When a natural disaster happens, both power and communication components might be damaged. Furthermore, since they are dependent on each other's service to function correctly, the failures may propagate to the hardware/software that are not affected initially. In this work, we focus on the recovery stage where the failed components in the system are already fully detected and isolated. We construct a mathematical model of the co-existing power and communication system and use optimization techniques to produce a crew dispatch plan that restores power as fast as possible by coordinating damage repairing, switch operation, and communication supply processes. We evaluate the restoration efficiency on the IEEE standard system using both analytical analysis and discrete-event simulation.For the second application domain, railway transportation system, we focus on evaluating the resilience of its communication system that exchanges control and monitoring messages with both on-board driver cabin and remote control center. We use advanced discrete-event simulation techniques to achieve a high-fidelity model of the network which makes the evaluation more concrete and realistic. For the Ethernet-based on-board train communication network (TCN), we develop a parallel simulation platform according to the IEC standard and use it to conduct a case study of a double-tagging VLAN attack on this control network. Another component of the railway communication system is the train-to-ground network that enables the communication between the driving system on the train and the control center that issues commands such as the movement authority messages. We customize the NS3 network simulator to model the LTE-based protocol with a real high-speed train trace dataset from public sources. We evaluate the resilience of the cellular network specifically on the handover process, which happens when the train travels from one base station to another. Due to the high-speed nature, the handover success rate is impacted and there are many protocol-based solutions proposed in this research area. We use the high-fidelity simulation model to evaluate some of them and compare the pros and cons. Show less