Search results
(1 - 2 of 2)
- Title
- SCALABLE RESOURCE MANAGEMENT IN CLOUD COMPUTING
- Creator
- Sadooghi, Iman
- Date
- 2017, 2017-05
- Description
-
The exponential growth of data and application complexity has brought new challenges in the distributed computing field. Scientific...
Show moreThe exponential growth of data and application complexity has brought new challenges in the distributed computing field. Scientific applications are growing more diverse with various workloads, including traditional MPI high performance computing (HPC) to fine-grained loosely coupled many-task computing (MTC). Traditionally, these workloads have been shown to run well on supercomputers and highly-tuned HPC Clusters. The advent of Cloud computing has brought the attention of scientists to exploit these resources for scientific applications at a potentially lower cost. We investigate the nature of the cloud and its ability to run scientific applications efficiently. Delivering high throughput and low latency for the various types of workloads at large scales has driven us to design and implement new job scheduling and execution systems that are fully distributed and have the ability to run in public clouds. We discuss the design and implementation of a job scheduling and execution system (CloudKon). CloudKon is optimized to exploit the cloud resources efficiently through a variety of cloud services (Amazon SQS and DynamoDB) in order to get the best performance and utilization. It also supports various workloads including MTC and HPC applications concurrently. To further improve the performance and the flexibility of CloudKon, we designed and implemented a fully distributed message queue (Fabriq) that delivers an order of magnitude better performance than the Amazon Simple Queuing System (SQS). Designing Fabriq helped us expand our scheduling system to many other distributed system including non-Amazon clouds. Having Fabriq as a building block, we were able to design and implement a multipurpose task scheduling and execution framework (Albatross) that is able to efficiently run various types workloads at larger scales. Albatross provides data locality and task execution dependency. Those features enable Albatross to natively run MapReduce workloads. We evaluated CloudKon with synthetic MTC workloads, synthetic HPC workloads, and synthetic MapReduce applications on the Amazon AWS cloud with up to 1K instances. Fabriq was also evaluated with synthetic workloads on Amazon AWS cloud with up to 128 instances. Performance evaluations of Albatross show its ability to outperform Spark and Hadoop on different scenarios.
Ph.D. in Computer Science
Show less
- Title
- Towards Trustworthy Multiagent and Machine Learning Systems
- Creator
- Xie, Shangyu
- Date
- 2022
- Description
-
This dissertation aims to systematically research the "trustworthy" Multiagent and Machine Learning systems in the context of the Internet of...
Show moreThis dissertation aims to systematically research the "trustworthy" Multiagent and Machine Learning systems in the context of the Internet of Things (IoT) system, which mainly consists of two aspects: data privacy and robustness. Specifically, data privacy concerns about the protection of the data in one given system, i.e., the data identified to be sensitive or private cannot be disclosed directly to others; robustness refers to the ability of the system to defend/mitigate the potential attacks/threats, i.e., maintaining the stable and normal operation of one system.Starting from the smart grid, a representative multiagent system in the IoT, I demonstrate two works on improving data privacy and robustness in aspects of different applications, load balancing and energy trading, which integrates secure multiparty computation (SMC) protocols for normal computation to ensure data privacy. More significantly, the schemes can be readily extended to other applications in IoT, e.g., connected vehicles, mobile sensing systems.For the machine learning, I have studied two main areas, i.e., computer vision and natural language processing with the privacy and robustness correspondingly. I first present the comprehensive robustness evaluation study of the DNN-based video recognition systems with two novel proposed attacks in both test and training phase, i.e., adversarial and poisoning attacks. Besides, I also propose the adaptive defenses to fully evaluate such two attacks, which can thus further advance the robustness of system. I also propose the privacy evaluation for the language systems and show the practice to reveal and address the privacy risks in the language models. Finally, I demonstrate a private and efficient data computation framework with the cloud computing technology to provide more robust and private IoT systems.
Show less