Search results
(1 - 20 of 64)
Pages
- Title
- MULTI-DIMENSIONAL BATCH SCHEDULING FRAMEWORK FOR HIGH-END SUPERCOMPUTERS
- Creator
- Zhou, Zhou
- Date
- 2016, 2016-05
- Description
-
In the field of high performance computing (HPC), batch scheduling plays a critical role. They determine when and how to process the various...
Show moreIn the field of high performance computing (HPC), batch scheduling plays a critical role. They determine when and how to process the various jobs waiting for service. Conventional batch schedulers allocate user jobs solely based on their CPU footprints. However, for a given user job, it requires many different resources during its execution, such as power, network, I/O bandwidth, etc. Today’s job schedulers rarely take into account these resource requirements which sometimes turn out to be the Achilles’ heel of system-wide performance. In this research, we propose a multi-dimensional batch scheduling framework for high-end supercomputers. Our research aims to treat these common but often ignored resources (e.g., power, network, bandwidth) as schedulable resource and further transform each scheduling into a multi-objective optimization process. Our main contributions consist of a set of scheduling models and policies, aiming at addressing the issues in batch scheduling for large-scale production supercomputers. We evaluate our design by means of trace-based simulations using real workload and performance traces from production systems. Experimental results show our methods can effectively improve batch scheduling regarding user satisfaction, system performance and operating cost.
Ph.D. in Computer Science, May 2016
Show less
- Title
- BIG DATA SYSTEM INFRASTRUCTURE AT EXTREME SCALES
- Creator
- Zhao, Dongfang
- Date
- 2015, 2015-07
- Description
-
Rapid advances in digital sensors, networks, storage, and computation along with their availability at low cost is leading to the creation of...
Show moreRapid advances in digital sensors, networks, storage, and computation along with their availability at low cost is leading to the creation of huge collections of data { dubbed as Big Data. This data has the potential for enabling new insights that can change the way business, science, and governments deliver services to their consumers and can impact society as a whole. This has led to the emergence of the Big Data Computing paradigm focusing on sensing, collection, storage, management and analysis of data from variety of sources to enable new value and insights. To realize the full potential of Big Data Computing, we need to address several challenges and develop suitable conceptual and technological solutions for dealing them. Today's and tomorrow's extreme-scale computing systems, such as the world's fastest supercomputers, are generating orders of magnitude more data by a variety of scienti c computing applications from all disciplines. This dissertation addresses several big data challenges at extreme scales. First, we quantitatively studied through simulations the predicted performance of existing systems at future scales (for example, exascale 1018 ops). Simulation results suggested that current systems would likely fail to deliver the needed performance at exascale. Then, we proposed a new system architecture and implemented a prototype that was evaluated on tens of thousands nodes on par with the scale of today's largest supercomputers. Micro benchmarks and real-world applications demonstrated the e ectiveness of the proposed architecture: the prototype achieved up to two orders of magnitude higher data movement rate than existing approaches. Moreover, the system prototype was incorporated with features that were not well supported in conventional systems, such as distributed metadata management, distributed caching, lightweight provenance, transparent compression, acceleration through GPU encoding, and parallel serialization. Towards exploring the proposed architecture at millions of node scales, simulations were conducted and evaluated with a variety of workloads, showing near linear scalability and orders of magnitude better performance than today's state-of-the-art storage systems.
Ph.D. in Computer Science, July 2015
Show less
- Title
- AUTOMATIC SUMMARIZATION OF CLINICAL ABSTRACTS FOR EVIDENCE-BASED MEDICINE
- Creator
- Summerscales, Rodney L.
- Date
- 2013, 2013-12
- Description
-
The practice of evidence-based medicine (EBM) encourages health professionals to make informed treatment decisions based on a careful analysis...
Show moreThe practice of evidence-based medicine (EBM) encourages health professionals to make informed treatment decisions based on a careful analysis of current research. However, after caring for their patients, medical practitioners have little time to spend reading even a small fraction of the rapidly growing body of medical research literature. As a result, physicians must often rely on potentially outdated knowledge acquired in medical school. Systematic reviews of the literature exist for speci c clinical questions, but these must be manually created and updated as new research is published. Abstracts from well-written clinical research papers contain key information regarding the design and results of clinical trials. Unfortunately, the free text nature of abstracts makes it di cult for computer systems to use and time consuming for humans to read. I present a software system that reads abstracts from randomized controlled trials, extracts key clinical entities, computes the e ectiveness of the proposed interventions and compiles this information into machine readable and human readable summaries. This system uses machine learning and natural language processing techniques to extract the key clinical information describing the trial and its results. It extracts the names and sizes of treatment groups, population demographics, outcome measured in the trial and outcome results for each treatment group. Using the extracted outcome measurements, the system calculates key summary measures used by physicians when evaluating the e ectiveness of treatments. It computes absolute risk reduction (ARR) and number needed to treat (NNT) values complete with con dence intervals. The extracted information and computed statistics are automatically compiled into XML and HTML summaries that describe the details and results of the trial. xiii Extracting the necessary information needed to calculate these measures is not trivial. While there have been various approaches to generating summaries of medical research, this work has mostly focused on extracting trial characteristics (e.g. population demographics, intervention/outcome information). No one has attempted to extract all of the information needed, nor has anyone attempted to solve many of the tasks needed to reliably calculate the summary statistics.
PH.D in Computer Science, December 2013
Show less
- Title
- COVERAGE AND CONNECTIVITY IN WIRELESS NETWORKS
- Creator
- Xu, Xiaohua
- Date
- 2012-04-25, 2012-05
- Description
-
The limited energy resources, instability, and lacking central control in wireless networks motivates the study of connected dominating set ...
Show moreThe limited energy resources, instability, and lacking central control in wireless networks motivates the study of connected dominating set (CDS) which serves as rout- ing backbone to support service discovery, and area monitoring and also broadcasting. The construction of CDS involves both coverage and connectivity. We ¯rst study sev- eral problems related to coverage. Given are a set of nodes and targets in a plane, the problem Minimum Wireless Cover (MWC) seeks the fewest nodes to cover the targets. If all nodes are associated with some positive prices, the problem Cheapest Wireless Cover (CWC) seeks a cheapest set of nodes to cover the targets. If all nodes have bounded lives, the problem Max-Life Wireless Cover (MLWC) seeks wireless coverage schedule of maximum life subject to the life constraints of individ- ual nodes. We present a polynomial time approximation scheme (PTAS) for MWC, and two randomized approximation algorithms for CWC and MLWC respectively. Given a node-weighted graph, the problem Minimum-Weighted Dominating Set (MWDS) is to ¯nd a minimum-weighted vertex subset such that, for any vertex, it is contained in this subset or it has a neighbor contained in this set. We will propose a (4+²)-approximation algorithm for MWDS in unit disk graphs. Meanwhile, for the connecting part, given a node-weighted connected graph and a subset of terminals, the problem Node-Weighted Steiner Tree (NWST) seeks a lightest tree connecting a given set of terminals in a node-weighted graph. We present three approximation algorithms for NWST restricted to UDGs. This dissertation also explores the applications of CDS, and develops e±cient algorithms for the applications such as real-time aggregation scheduling in wireless networks. Given a set of periodic aggregation queries, each query has its own period , and the subset of source nodes Si containing the data, we ¯rst propose a family of e±cient and e®ective real-time scheduling protocols that can answer every job of each query task within a relative delay under resource constraints by addressing the following tightly coupled tasks: routing, transmission plan constructions, node activity scheduling, and packet scheduling. Based on our protocol design, we further propose schedulability test schemes to e±ciently and e®ectively test whether, for a set of queries, each query job can be ¯nished within a ¯nite delay. We also conduct extensive simulations to validate the proposed protocol and evaluate its practical performance. The simulations corroborate our theoretical analysis.
Ph.D. in Computer Science, May 2012
Show less
- Title
- AN INTEGRATED DATA ACCESS SYSTEM FOR BIG COMPUTING
- Creator
- Yang, Xi
- Date
- 2016, 2016-07
- Description
-
Big data has entered every corner of the fields of science and engineering and becomes a part of human society. Scientific research and...
Show moreBig data has entered every corner of the fields of science and engineering and becomes a part of human society. Scientific research and commercial practice are increasingly depending on the combined power of high-performance computing (HPC) and high-performance data analytics. Due to its importance, several commercial computing environments have been developed in recent years to support big data applications. MapReduce is a popular mainstream paradigm for large-scale data analytics. MapReduce-based data analytic tools commonly rely on underlying MapReduce file systems (MRFS), such as Hadoop Distributed File System (HDFS), to manage massive amounts of data. In the same time, conventional scientific applications usually run on HPC environments, such as Message Passing Interface (MPI), and their data are kept in parallel file systems (PFS), such as Lustre and GPFS, for high-speed computing and data consistency. As scientific applications become data intensive and big data applications become computing hungry, there is a surging interest and need to integrate HPC power and data processing power to support HPC on big data, the so-called big computing. A fundamental issue of big computing is the integration of data management and interoperability between the conventional HPC ecosystem and the newly emerged data processing/analytic ecosystem. However, data sharing between PFS and MRFS is limited currently, due to semantics mismatches, lacking communication middleware, and the diverged design philosophies and goals, etc. Also, challenges also exist in cross-platform task scheduling and parallelism. At the application layer, the data model mismatch between the raw data kept on file systems and the data management software of an application impedes cross-platform data processing as well. To support cross-platform integration, we propose and develop the Integrated Data Access System (IDAS) for big computing. IDAS extends the accessibilities of programming models and integrates the HPC environment with the data processing MapReduce/Hadoop environment. Under IDAS, MPI applications and MapReduce applications can share and exchange data under PFS and MRFS transparently and efficiently. Through this sharing and exchange, MPI and MapReduce applications can collaboratively provide both high-performance computing and data processing power for a given application. IDAS achieves its goal with several steps. First, IDAS enhances MPI-IO so that MPI-based applications can access data stored in HDFS efficiently. Here the term efficient means that HDFS is enhanced to support MPI-based applications. For instance, we have enhanced HDFS to transparently support N-to-1 file write for better write concurrency. Second, IDAS enhances Hadoop framework to enable MapReduce-based applications process data that resides on PFS transparently. Please notice that we have carefully chosen the term “enhance” here. That is MPI-based applications not only can access data stored on HDFS but also can continue access data stored on PFS. The same is for MapReduce-based applications. Through these enhancements, we achieve seamless data sharing. In addition, we have integrated data accessing with several application tools. In particular, we have integrated image plotting, query, and data subsetting within one application, for Earth Science data analysis. Many data centers prefer erasure-coding rather than triplication to achieve data durability, which trades data availability for lower storage cost. To this end, we have also investigated performance optimization of the erasure coded Hadoop system, to enhance Hadoop system in IDAS.
Ph.D. in Computer Science, July 2016
Show less
- Title
- QUALITY-OF-SERVICE AWARE SCHEDULING AND DEFECT TOLERANCE IN REAL-TIME EMBEDDED SYSTEMS
- Creator
- Li, Zheng
- Date
- 2015, 2015-05
- Description
-
For real-time embedded systems, such as control systems used in medical, automotive and avionics industry, tasks deployed on such systems...
Show moreFor real-time embedded systems, such as control systems used in medical, automotive and avionics industry, tasks deployed on such systems often have stringent real-time, reliability and energy consumption constraints. How to schedule real-time tasks under various QoS constraints is a challenging issue that has drawn attention from the research community for decades. In this thesis, we study task execution strategies that not only minimize system energy consumption but also guarantee task deadlines and reliability satisfaction. We first consider the scenario when all tasks are of the same criticality. For this case, two task execution strategies, i.e. checkpointing based and task re-execution based strategies are developed. Second, considering the scenario when tasks are of different criticalities, a heuristic search based energy minimization strategy is also proposed. When tasks are of different criticalities, a commonly used approach to guaranteeing high-criticality task deadlines is to remove low-criticality tasks whenever the system is overloaded. With such an approach, the QoS provided to low-criticality tasks is rather poor, it can cause low-criticality tasks to have high deadline miss rate and less accumulated execution time. To overcome this shortcoming, we develop a time reservation based scheduling algorithm and a two-step optimization algorithm to meet high-criticality task deadlines, while minimizing low-criticality task deadline miss rate and maximizing their accumulated execution time, respectively. As many-core techniques mature, many real-time embedded systems are built upon many-core platforms. However, many-core platforms have high wear-out failure rate. Hence, the last issue to be addressed in the thesis is how to replace defective cores on many-core platforms so that deployed applications’ real-time properties can be maintained. We develop an offline and an online application-aware system reconfiguration strategy to minimize the impact of the physical layer changes on deployed real-time applications. All the developed approaches are evaluated through extensive simulations. The results indicate that the developed approaches are more effective in addressing the identified problems compared to the existing ones in the literature.
Ph.D. in Computer Science, May 2015
Show less
- Title
- COOPERATIVE BATCH SCHEDULING FOR HPC SYSTEMS
- Creator
- Yang, Xu
- Date
- 2017, 2017-05
- Description
-
The batch scheduler is an important system software serving as the interface between users and HPC systems. Users submit their jobs via batch...
Show moreThe batch scheduler is an important system software serving as the interface between users and HPC systems. Users submit their jobs via batch scheduling portal and the batch scheduler makes scheduling decision for each job based on its request for system resources and system availability. Jobs submitted to HPC systems are usually parallel applications and their lifecycle consists of multiple running phases, such as computation, communication and input/output data. Thus, the running of such parallel applications could involve various system resources, such as power, network bandwidth, I/O bandwidth, storage, etc. And most of these system resources are shared among concurrently running jobs. However, Today's batch schedulers do not take the contention and interference between jobs over these resources into consideration for making scheduling decisions, which has been identified as one of the major culprits for both the system and application performance variability. In this work, we propose a cooperative batch scheduling framework for HPC systems. The motivation of our work is to take important factors about jobs and the system, such as job power, job communication characteristics and network topology, for making orchestrated scheduling decisions to reduce the contention between concurrently running jobs and to alleviate the performance variability. Our contributions are the design and implementation of several coordinated scheduling models and algorithms for addressing some chronic issues in HPC systems. The proposed models and algorithms in this work have been evaluated by the means of simulation using workload traces and application communication traces collected from production HPC systems. Preliminary experimental results show that our models and algorithms can effectively improve the application and the system overall performance, HPC facilities' operation cost, and alleviate the performance variability caused by job interference.
Ph.D. in Computer Science, May 2017
Show less
- Title
- POINT CLOUD FUSION BETWEEEN AERIAL AND VEHICLE LIDAR
- Creator
- Guangyao, Ma
- Date
- 2015, 2015-05
- Description
-
Because of the increasing requirement of precision in region of 3-D map, we began to use LiDAR to establish a more accurate map. There still...
Show moreBecause of the increasing requirement of precision in region of 3-D map, we began to use LiDAR to establish a more accurate map. There still exist some problems although we have already made a great progress in this area. One of them, which I tried to process during my thesis study, is that we have two points source - Aerial LiDAR Data( Points gotten by Airplane ) and Vehicle LiDAR Data( Points gotten by Vehicle ) - while both of them have a different density and cannot be merged well. This process - Fusion-is kindly similar to registration, the difference is that the points we would like to merge are generated from different devices and have only few points pairs in the same region. For example, the Aerial LiDAR data has a higher points density in the roofs and ground, but lower in the walls. In the meanwhile, the Vehicle LiDAR data has a lot of points in the walls and ground region. It is beneficial to minimize the difference between these two point sets since the process is necessary for modeling, registration and so on. Therefore, my thesis is to minimize the difference between these two data sources, a procedure of Fusion. The main idea is to read the LiDAR data into data structure of Point Cloud, sample their density to the similar level, and select several corresponding special region pairs( we named these regions -chunks, e.g. Median strip and boundaries of road ) with sufficient interesting points to do fusion. Interesting points indicate the points with one and more special features among all points. And, the algorithm we used to implement the fusion is ICP( Iterative Closet Point Algorithm). Not similar to Registration of Point Cloud, research in the Fusion area is rare. Therefore, the existing algorithms are not well suitable in this project. I deduce some new algorithms during my research since the original ICP Algorithm cannot work well. Both Update Equation and Objective Function are modified. In this thesis, PCL( Point Cloud Library ) is mainly used to implement the basic function, such as nding the nearest points and sampling point cloud, and Eigen library to write the core functions( e.g. Modified Iterative Closest Point Alg ). I also use libLAS library to implement the IO operations and MeshLab to visualize the point cloud after modification.
M.S. in Computer Science, May 2015
Show less
- Title
- SYSTEM SUPPORT FOR RESILIENCE IN LARGE-SCALE PARALLEL SYSTEMS: FROM CHECKPOINTING TO MAPREDUCE
- Creator
- Jin, Hui
- Date
- 2012-05-31, 2012-05
- Description
-
High-Performance Computing (HPC) has passed the Petascale mark and is moving forward to Exascale. As the system ensemble size continues to...
Show moreHigh-Performance Computing (HPC) has passed the Petascale mark and is moving forward to Exascale. As the system ensemble size continues to grow, the occurrence of failures is the norm rather than the exception during the execution of parallel applications. Resilience is widely recognized as one of the key obstacles towards Exascale computing. Checkpointing is currently the de-facto fault tolerant mechanism for parallel applications. However, parallel checkpointing at scale usually generates bursts of concurrent I/O requests, imposes considerable overhead to I/O subsystems, and limits the scalability of parallel applications. Despite the doubt in the feasibility of checkpointing continues to increase, there is still no promising alternative on the horizon yet to replace checkpointing. MapReduce is a new programming model for massive data processing. It has demonstrated a compelling potential in reshaping the landscape of HPC from various perspectives. The resilience of MapReduce applications and its potential in benefiting HPC fault tolerance are active research topics that require extensive investigation. This thesis work targets at building a systematic framework to support resilience in large-scale parallel systems. We address the identified checkpointing performance issue through a three-fold approach: reduce the I/O overhead, exploit storage alternatives, and determine the optimistic checkpointing frequency. This three-fold approach is achieved with three different mechanisms, namely system coordination and scheduling, the utilization of MapReduce framework, and stochastic modeling. To deal with the increasing concerns about MapReduce resilience, we also strive to improve the reliability of MapReduce applications, and investigate the tradeoffs in the programming model selection (e.g., MPI v.s. MapReduce) from the perspective of resilience. This thesis provides a thorough study and a practical solution for solving the outstanding resilience problem of large-scale MPI-based HPC applications and beyond. It makes a noticeable contribution to the state-of-the-art and opens a new research direction for many to follow.
Ph.D. in Computer Science, May 2012
Show less
- Title
- WIRELESS SCHEDULING IN MULTI-CHANNEL MULTI-RADIO MULTIHOP WIRELESS NETWORKS
- Creator
- Wang, Zhu
- Date
- 2014, 2014-07
- Description
-
Maximum multi ow (MMF) and maximum concurrent multi ow (MCMF) in multi-channel multi-radio (MC-MR) wireless networks have been well-studied in...
Show moreMaximum multi ow (MMF) and maximum concurrent multi ow (MCMF) in multi-channel multi-radio (MC-MR) wireless networks have been well-studied in the literature. They are NP-hard even in single-channel single-radio (SC-SR) wireless networks when all nodes have uniform (and xed) interference radii and the positions of all nodes are available. This disertation studies maximum multi ow (MMF) and maximum concur- rent multi ow (MCMF) in muliti-channel multi-radio multihop wireless networks under the protocol interference model in the bidirectional mode or the unidirectional mode. We introduce a ne-grained network representation of multi-channel multi- radio multihop wireless networks and present some essential topological properties of its associated con ict graph. It was proved that if the number of channels is bounded by a constant (which is typical in practical networks), both MMF and MCMF admit a polynomial-time ap- proximation scheme under the protocol interference model in the bidirectional mode or the unidirectional mode with some additional mild conditions. However, the run- ning time of these algorithms grows quickly with the number of radios per node (at least in the sixth order) and the number of channels (at least in the cubic order). Such poor scalability stems intrinsically from the exploding size of the ne-grained network representation upon which those algorithms are built. In Chapter 2 of this dissertation, we introduce a new structure, termed as concise con ict graph, on the node-level links directly. Such structure succinctly captures the essential advantage of multiple radios and multiple channels. By exploring and exploiting the rich structural properties of the concise con ict graphs, we are able to develop fast and scalable link scheduling algorithms for either minimizing the communication latency or maximizing the (concurrent) multi ow. These algorithms have running time growing linearly in both the number of radios per node and the number of channels, while not sacri cing the approximation bounds. While the algorithms we develop in Chapter 2 admit a polynomial-time ap- proximation scheme (PTAS) when the number of channels is bounded by a constant, such PTAS is quite infeasible practically. Other than the PTAS, all other known approximation algorithms, in both SC-SR wireless networks and MC-MR wireless networks, resorted to solve a polynomial-sized linear program (LP) exactly. The s- calability of their running time is fundamentally limited by the general-purposed LP solvers. In Chapter 3 of this dissertation, we rst introduce the concept of interference costs and prices of a path and explore their relations with the maximum (concurrent) multi ow. Then we develop purely combinatorial approximation algorithms which compute a sequence of least interference-cost routing paths along which the ows are routed. These algorithms are faster and simpler, and achieve nearly the same approximation bounds known in the literature. This dissertation also explores the stability analysis of two link scheduling in MC-MR wireless networks under the protocol interference model in the bidirectional mode or the unidirectional mode. Longest-queue- rst (LQF) link scheduling is a greedy link scheduling in multihop wireless networks. Its stability performance in single-channel single-radio (SC-SR) wireless networks has been well studied recently. However, its stability performance in multi-channel multi-radio (MC-MR) wireless networks is largely under-explored. We present a stability subregion with closed form of the LQF scheduling in MC-MR wireless networks, which is within a constant factor of the network stability region. We also obtain constant lower bounds on the efficiency ratio of the LQF scheduling in MC-MR wireless networks under the protocol interference model in the bidirectional mode or unidirectional mode. Static greedy link schedulings have much simpler implementation than dy- namic greedy link schedulings such as Longest-queue-frst (LQF) link scheduling. However, its stability performance in multi-channel multi-radio (MC-MR) wireless networks is largely under-explored. In this dissertation, we present a stability subre- gion with closed form of a static greedy link scheduling in MC-MR wireless networks under the protocol interference model in the bidirectional mode. By adopting some special static link orderings, the stability subregion is within a constant factor of the stable capacity region of the network. We also obtain constant lower bounds on the throughput efficiency ratios of the static greedy link schedulings in some special static link orderings.
Ph.D. in Computer Science, July 2014
Show less
- Title
- COMPRESSIVE SENSING AND RECONSTRUCTION : THEORY AND APPLICATIONS
- Creator
- Krishnamurthy, Ritvik Nadig
- Date
- 2014, 2014-07
- Description
-
Conventional approach in acquisition and reconstruction of images from frequency domain strictly follow the Nyquist sampling theorem. The...
Show moreConventional approach in acquisition and reconstruction of images from frequency domain strictly follow the Nyquist sampling theorem. The principle states that the sampling frequency required for complete reconstruction of a signal is at least twice the maximum frequency of the original signal. This dissertation studies an emerging theory called Compressive Sensing or Compressive Sampling which goes against the conventional wisdom. Theoretically, it is possible to reconstruct images or signals accurately from a number of samples which is far smaller than the Nyquist samples. Compressive Sensing has proven to have farther implications than merely reducing sampling frequency of the signal. Possibility of new data acquisition methods from analog domain to digital form using fewer sensors, image acquisition using much smaller sensors array, to name a few. This novel theory combines sampling and compression methods thereby reducing the data acquisition resources, such as number of sensors, storage memory for collected samples and maximum operating frequency. This dissertation presents some insights into reconstruction of grey scale images and audio signals using OMP and CoSaMP algorithms. It also delves into some of the key mathematical insights underlying this new theory and explains some of the interactions between Compressive Sensing and related elds such as statistics, coding theory and theoretical computer science. viii
M.S. in Computer Engineering, July 2014
Show less
- Title
- APPLICATION-AWARE OPTIMIZATIONS FOR BIG DATA ACCESS
- Creator
- Yin, Yanlong
- Date
- 2014, 2014-07
- Description
-
Many High-Performance Computing (HPC) applications spend a significant portion of their execution time in accessing data from les and they are...
Show moreMany High-Performance Computing (HPC) applications spend a significant portion of their execution time in accessing data from les and they are becoming increasingly data-intensive. For them, I/O performance is a significant bottleneck leading to wastage of CPU cycles and the corresponding wasted energy consumption. Various optimization techniques exist to improve data access performance. However, the existing general-purpose optimization techniques are not able to satisfy diverse applications' demands. On the other hand, the application-specific optimization pro- cess is usually a difficult task due to the complexity involved in understanding the parallel I/O system and the applications' I/O behaviors. To address these challenges, this thesis proposes an application-aware data access optimization framework and claims that it is feasible and useful to utilize applications' characteristics to improve the performance and efficiency of the parallel I/O system. Under this framework, an optimization may consist of several basic but challenging steps, including capturing the application's characteristics, identifying the causality of I/O performance degra- dation, and delivering optimization solutions. To make these steps easier, we design and implement the IOSIG toolkit as an essential system support for the default par- allel I/O system. The toolkit is able to pro le the applications' I/O behaviors and then generate comprehensive characteristics through trace analysis. With the help of IOSIG, we design several optimization techniques on data layout optimization, data reorganization, and I/O scheduling. The proposed framework has significant poten- tial to boost application-aware I/O optimization. The results prove that the proposed optimization techniques can significantly improve the data access performance.
Ph.D. in Computer Science, July 2014
Show less
- Title
- THE EUML-ARC PROGRAMMING MODEL
- Creator
- Marth, Kevin
- Date
- 2014, 2014-07
- Description
-
The EUML-ARC programming model shows that the increasing parallelism available on multi-core processors requires evolutionary (not...
Show moreThe EUML-ARC programming model shows that the increasing parallelism available on multi-core processors requires evolutionary (not revolutionary) changes in software design. The EUML-ARC programming model combines and extends software technology available even before the introduction of multi-core processors to provide software engineers with the ability to specify software systems that expose abstract platform-independent parallelism. The EUML-ARC programming model is a synthesis of Executable UML, the Actor model, role-based modeling, split objects, and aspect-based coordination. Computation in the EUML-ARC programming model is structured in terms of semantic entities composed of actor-based agents whose behaviors are expressed in hierarchical state machines. An entity is composed of a base intrinsic agent and multiple extrinsic role agents, all with dedicated conceptual threads of control. Entities interact through their role agents in the context of featureoriented collaborations orchestrated by coordinator agents. The conceptual threads of control associated with the agents in a software system expose both intra-entity and inter-entity parallelism that is mapped by the EUML-ARC model compiler to the hardware threads available on the target multi-core processor. The hardware and software e ciency achieved with representative benchmark systems show that the EUML-ARC programming model and its compiler can exploit multi-core parallelism while providing a productive model-driven approach to software development.
Ph.D. in Computer Science, July 2014
Show less
- Title
- EFFICIENT SCORING AND RANKING OF EXPLANATION FOR DATA EXCHANGE ERRORS IN VAGABOND
- Creator
- Wang, Zhen
- Date
- 2014, 2014-05
- Description
-
Data exchange has been widely used in big data era. One challenge for data exchange is to identify the true cause of data errors during the...
Show moreData exchange has been widely used in big data era. One challenge for data exchange is to identify the true cause of data errors during the schema translation. The huge amount of data and schemas make it nearly impossible to find “the” correct solution. Vagabond system is developed to address this problem and use best-effort methods to rank data exchange error explanations base on the likelihood that they are the correct solutions. Ranking done on scoring functions that model some aspects of explanation sets. Examples of these properties include complexity(size of explana- tion), and side effect size(number of correct data values that will be affected by the changes). The thesis introduced three new scoring functions to increase the applicability of Vagabond under various data exchange scenarios. We prove that the monotonicity property required by Vagabond may not hold for some of the new scoring functions, so a new generic ranker is also introduced to efficiently rank error explanations for these new scoring functions as well as for future scoring functions that have boundary property. We can efficiently compute upper or lower bounds on the score of partial solutions. We also completed some performance experiments on the new scoring functions and the new ranker. The experiment result proves that the new scoring functions introduced in this thesis have a scalable performance.
M.S. in Computer Science, May 2014
Show less
- Title
- CAPACITY BOUNDS FOR LARGE SCALE WIRELESS SENSOR NETWORKS
- Creator
- Tang, Shaojie
- Date
- 2012-11-20, 2012-12
- Description
-
We study the network capacity of large scale wireless sensor networks under both Gaussian Channel model and Protocol Interference Model. To...
Show moreWe study the network capacity of large scale wireless sensor networks under both Gaussian Channel model and Protocol Interference Model. To study network capacity under gaussian channel model, we assume n wireless nodes {v1, v2, · · · , vn} are randomly or arbitrarily distributed in a square region Ba with side-length a. We randomly choose ns multicast sessions. For each source node vi, we randomly select k points pi,j (1 ≤ j ≤ k) in Ba and the node which is closest to pi,j will serve as a destination node of vi. The per-flow unicast(multicast) capacity is defined as the minimum data rate of all unicast(multicast) sessions in this network. We derive the achievable upper bounds on unicast capacity and a upper bound(partial achievable) on multicast capacity of the wireless networks under and Gaussian Channel model. We found that the unicast(multicast) capacity for wireless networks under both two models has three regimes. Under protocol interference model, we assume that n wireless nodes are randomly deployed in a square region with side-length a and all nodes have the uniform transmission range r and uniform interference range R > r. We further assume that each wireless node can transmit/receive at W bits/second over a common wireless channel. For each node vi, we randomly pick k − 1 nodes from the other n − 1 nodes as the receivers of the multicast session rooted at node vi. The aggregated multicast capacity is defined as the total data rate of all multicast sessions in the network. In this work we derive matching asymptotic upper bounds and lower bounds on multicast capacity of large scale random wireless networks under protocol interference model.
PH.D in Computer Science, December 2012
Show less
- Title
- REPRODUCIBLE NETWORK RESEARCH WITH A HIGH-FIDELITY SOFTWARE-DEFINED NETWORK TESTBED
- Creator
- Wu, Xiaoliang
- Date
- 2017, 2017-05
- Description
-
Network research, as an experimental science, ought to be reproducible. However, it is not a standard practice to share models, methods or...
Show moreNetwork research, as an experimental science, ought to be reproducible. However, it is not a standard practice to share models, methods or software code to support experimental evaluation and reproducibility when publishing a network research paper. In this work, we advocate reproducible networking experiments by building a unique testbed consisting of container-based network emulation and physical devices. The testbed provides a realistic and scalable platform for reproducing network research. The testbed supports large-scale network experiments using lightweight virtualization techniques and capable of running across distributed physical machines. We utilize the testbed to reproduce network experiments, and demonstrate the e↵ectiveness by comparing the results with the original published network experiments, such as Hedera, a scalable and adaptive traffic flow scheduling system in data center networks.
M.S. in Computer Science, May 2017
Show less
- Title
- SCALABLE RESOURCE MANAGEMENT SYSTEM SOFTWARE FOR EXTREMESCALE DISTRIBUTED SYSTEMS
- Creator
- Wang, Ke
- Date
- 2015, 2015-07
- Description
-
Distributed systems are growing exponentially in the computing capacity. On the high-performance computing (HPC) side, supercomputers are...
Show moreDistributed systems are growing exponentially in the computing capacity. On the high-performance computing (HPC) side, supercomputers are predicted to reach exascale with billion-way parallelism around the end of this decade. Scientific applications running on supercomputers are becoming more diverse, including traditional large-scale HPC jobs, small-scale HPC ensemble runs, and fine-grained many-task computing (MTC) workloads. Similar challenges are cropping up in cloud computing as data-centers host ever growing larger number of servers exceeding many top HPC systems in production today. The applications commonly found in the cloud are ushering in the era of big data, resulting in billions of tasks that involve processing increasingly large amount of data. However, the resource management system (RMS) software of distributed systems is still designed around the decades-old centralized paradigm, which is far from satisfying the ever-growing needs of performance and scalability towards extreme scales, due to the limited capacity of a centralized server. This huge gap between the processing capacity and the performance needs has driven us to develop next-generation RMSs that are magnitudes more scalable. In this dissertation, we first devise a general system software taxonomy to explore the design choices of system software, and propose that key-value stores could serve as a building block. We then design distributed RMS on top of key-value stores. We propose a fully distributed architecture and a data-aware work stealing technique for the MTC resource management, and develop the SimMatrix simulator to explore the distributed designs, which informs the real implementation of the MATRIX task execution framework. We also propose a partition-based architecture and resource sharing techniques for the HPC resource management, and implement them by building the Slurm++ real workload manager and the SimSlurm++ simulator. We study the distributed designs through real systems up to thousands of nodes, and through simulations up to millions of nodes. Results show that the distributed paradigm has significant advantages over centralized one. We envision that the contributions of this dissertation will be both evolutionary and revolutionary to the extreme-scale computing community, and will lead to a plethora of following research work and innovations towards tomorrow’s extremescale systems.
Ph.D. in Computer Science, July 2015
Show less
- Title
- A HARDWARE-IN-THE-LOOP SOFTWARE-DEFINED NETWORKING TESTING AND EVALUATION FRAMEWORK
- Creator
- Yang, Qi
- Date
- 2017, 2017-05
- Description
-
The transformation of innovative research ideas to production systems is highly dependent on the capability of performing realistic and...
Show moreThe transformation of innovative research ideas to production systems is highly dependent on the capability of performing realistic and reproducible network experiments. Simulation testbeds o↵er scalability, reproducibility but lack fidelity due to model abstraction and simplification, while physical testbeds o↵er high fidelity but lack reproducibility and often technically challenging and economically infeasible to perform large-scale experiments. In this work, we present a hybrid testbed consisting of container-based network emulation and physical devices to advocate high fidelity and reproducible networking experiments. In particular, the testbed integrates network emulators (Mininet) [5], a distributed control environment (ONOS) [1], physical switches (Pica8) and end-hosts (Raspberry Pi and commodity servers). The testbed (1) o↵ers functional fidelity through unmodified code execution on an emulated network, (2) supports large-scale network experiments using lightweight OS-level virtualization techniques and capable of running across distributed physical machines, (3) provides the topology flexibility, and (4) enhances the repeatability and reproducibility of network experiments. We validate the fidelity of the hybrid testbed through extensive experiments under di↵erent network conditions (e.g., varying topology and traffic pattern), and compare the results with the benchmark data collected on physical devices.
M.S. in Computer Science, May 2017
Show less
- Title
- A STEP TOWARD SUPPORTING LONG-RUNNING APPLICATIONS WITH REAL-TIME CONSTRAINTS ON HYBRID CLOUDS
- Creator
- Wu, Hao
- Date
- 2017, 2017-05
- Description
-
The advancement of computer and network technology has brought the world into a new computer cloud era. The ”pay-as-you-go” business model and...
Show moreThe advancement of computer and network technology has brought the world into a new computer cloud era. The ”pay-as-you-go” business model and the service oriented models allow users to have ”unlimited” resources if needed and free from infrastructure maintenance and software upgrades. Cloud services are currently among the top-ranked high growth areas in computing and are seeing an acceleration in enterprise adoption with the worldwide market predicted to reach more than $270b in 2020. According to Google, currently more than 95% of the web services are deployed on cloud.Many di↵erent types of applications are deployed on computer clouds. However, due to inherent performance uncertainty within computer clouds, as of today, applications with real-time and high QoS constraints still operate on traditional computer systems and are not able to benefit from elastic computer clouds.. The thesis focuses on both theoretical analysis and real system implementation on the problem of guaranteeing real-time application’s deadline requirement while minimizing the application’s execution cost on hybrid clouds. Four major problems have been addressed towards moving applications with real-time constraint on hybrid computer clouds. 1). A minimal slack time and minimal distance (MSMD) scheduling algorithm is developed to minimize the resources needed to guarantee an application’s end-to-end deadline requirement using computer clouds. 2). A VM Instance Hour Minimization (IHM) algorithm is developed to reduce the application’s execution cost for given schedules. The proposed IHM algorithm can be integrated with common scheduling algorithm used in the literature. In addition, we also evaluated the feasibility of utilizing spot instance to further reduce the application’s execution cost while not sacrificing QoS guarantees. 3). A reference model for virtual machine launching overhead is developed to predict both system utilization and timing overhead during the VM launching process. 4). A hybrid cloud management tool that integrates the developed algorithms and reference model is developed to support running long-running applications with real-time constraints on hybrid clouds.
Ph.D. in Computer Science, May 2017
Show less
- Title
- PERFORMANCE ANALYSIS AND OPTIMIZATION OF LARGE-SCALE SCIENTIFIC APPLICATIONS
- Creator
- Wu, Jingjin
- Date
- 2013, 2013-07
- Description
-
Scientific applications are critical for solving complex problems in many areas of research, and often require a large amount of computing...
Show moreScientific applications are critical for solving complex problems in many areas of research, and often require a large amount of computing resources in terms of both runtime and memory. Massively parallel supercomputers with ever increasing computing power are being built to satisfy the need of large-scale scientific applications. With the advent of petascale era, there is an enlarged gap between the computing power of supercomputers and the parallel scalability of many applications. To take full advantage of the massive parallelism of supercomputers, it is indispensable to improve the scalability of large-scale scientific applications through performance analysis and optimization. This thesis work is motivated by cell-based AMR (Adaptive Mesh Refinement) cosmology simulations, in particular, the Adaptive Refinement Tree (ART) application. Performance analysis is performed to identify its scaling bottleneck, a performance emulator is designed for efficient evaluation of different load balancing schemes, and topology mapping strategies are explored for performance improvements. More importantly, the exploration of topology mapping mechanisms leads to a generic methodology for network and multicore aware topology mapping, and a set of efficient mapping algorithms for popular topologies. These have been implemented in a topology mapping library – TOPOMap, which can be used to support MPI topology functions.
PH.D in Computer Science, July 2013
Show less