Search results
(1 - 2 of 2)
- Title
- COOPERATIVE BATCH SCHEDULING FOR HPC SYSTEMS
- Creator
- Yang, Xu
- Date
- 2017, 2017-05
- Description
-
The batch scheduler is an important system software serving as the interface between users and HPC systems. Users submit their jobs via batch...
Show moreThe batch scheduler is an important system software serving as the interface between users and HPC systems. Users submit their jobs via batch scheduling portal and the batch scheduler makes scheduling decision for each job based on its request for system resources and system availability. Jobs submitted to HPC systems are usually parallel applications and their lifecycle consists of multiple running phases, such as computation, communication and input/output data. Thus, the running of such parallel applications could involve various system resources, such as power, network bandwidth, I/O bandwidth, storage, etc. And most of these system resources are shared among concurrently running jobs. However, Today's batch schedulers do not take the contention and interference between jobs over these resources into consideration for making scheduling decisions, which has been identified as one of the major culprits for both the system and application performance variability. In this work, we propose a cooperative batch scheduling framework for HPC systems. The motivation of our work is to take important factors about jobs and the system, such as job power, job communication characteristics and network topology, for making orchestrated scheduling decisions to reduce the contention between concurrently running jobs and to alleviate the performance variability. Our contributions are the design and implementation of several coordinated scheduling models and algorithms for addressing some chronic issues in HPC systems. The proposed models and algorithms in this work have been evaluated by the means of simulation using workload traces and application communication traces collected from production HPC systems. Preliminary experimental results show that our models and algorithms can effectively improve the application and the system overall performance, HPC facilities' operation cost, and alleviate the performance variability caused by job interference.
Ph.D. in Computer Science, May 2017
Show less
- Title
- SCALABLE INDEXING AND SEARCHING ON DISTRIBUTED FILE SYSTEMS
- Creator
- Ijagbone, Itua
- Date
- 2016, 2016-05
- Description
-
Scientific applications and other High Performance applications generate large amounts of data. It’s said that unstructured data comprises...
Show moreScientific applications and other High Performance applications generate large amounts of data. It’s said that unstructured data comprises more than 90% of the world’s information [IDC2011], and it’s growing 60% annually [Grantz2008]. The large amounts of data generated from computation leads to data been dispersed over the file system. Problems begin to exist when we need to locate these files for later use. For small amount of files this might not be an issue but as the number of files begin to grow as well as the increase in size of these files, it becomes difficult locating these files on the file system using ordinary methods like GNU Grep [8], which is commonly used in High Performance Computing and Many-Task Computing environments. It is as a result of this problem that we have chosen this thesis to tackle the problem of finding files in a distributed system environment. Our work leverages the FusionFS [1] distributed file system and the Apache Lucene [10] centralized indexing engine as a fundamental building block. We designed and implemented a distributed search interface within the FusionFS file system that makes both indexing and searching the index across a distributed system simple. We have evaluated our system up to 64 nodes, compared it with Grep, Hadoop, and Cloudera, and have shown that FusionFS’s indexing capabilities have lower overheads and faster response times.
M.S. in Computer Science, May 2016
Show less