Search results
(1 - 2 of 2)
- Title
- IMPROVING FAULT TOLERANCE FOR EXTREME SCALE SYSTEMS
- Creator
- Berrocal, Eduardo
- Date
- 2017, 2017-05
- Description
-
Mean Time Between Failures (MTBF), now calculated in days or hours, is expected to drop to minutes on exascale machines. In this thesis, a new...
Show moreMean Time Between Failures (MTBF), now calculated in days or hours, is expected to drop to minutes on exascale machines. In this thesis, a new approach for failure prediction based on the Void Search (VS) algorithm is presented . VS is used primarily in astrophysics for nding areas of space that have a very low den- sity of galaxies. We explore its potential for failure prediction using environmental information and compare it to well known prediction methods. Another important issue for the HPC community is that next-generation supercomputers are expected to have more components and consume several times less energy per operation. Hence, supercomputer designers are pushing the limits of miniaturization and energy-saving strategies. Consequently, the number of soft errors is expected to increase dramati- cally in the coming years. While mechanisms are in place to correct or at least detect soft errors, a percentage of those errors pass unnoticed by the hardware. Techniques that leverage certain properties of iterative HPC applications (such as the smoothness of the evolution of a particular dataset) can be used to detect silent errors at the application level. Results show that it is possible to detect a large number of corruptions (i.e., above 90% in some cases) with less than 100% overhead using these techniques. Nevertheless, these data-analytic solutions are still far from fully pro- tecting applications to a level comparable with more expensive solutions such as full replication. In this thesis, partial replication is explored to overcome this limitation. More speci cally, it has been observed that not all processes of an MPI application experience the same level of data variability at exactly the same time. Thus, one can smartly choose and replicate only those processes for which the lightweight data- analytic detectors would perform poorly. Results indicate that this new approach can protect the MPI applications analyzed with 7{70% less overhead (depending on the application) than that of full duplication with similar detection recall.
Ph.D. in Computer Science, May 2017
Show less
- Title
- MODELING OF MAMMALIAN CELL CULTURE
- Creator
- Jackson, Robert David
- Date
- 2019
- Description
-
This work uses two different techniques for modeling mammalian cell culture: Differential Equation (DE) based Modeling and Agent-Based...
Show moreThis work uses two different techniques for modeling mammalian cell culture: Differential Equation (DE) based Modeling and Agent-Based Modeling (ABM). The development of both models was done in free open-source software instead of the traditional software that requires the purchase of licenses. The DE model was developed in Python and can predict total, viable, and dead cell densities, glucose, lactate, glutamine, ammonia, and product titer. To expand on the detail level capabilities of previous DE models it has added temperature, pH, and dissolved oxygen dependence. The ABM can predict viable cell density, glucose, lactate, and the distribution of the three experimentally detectable cell cycle phases G1G0, S, and G2M. The ABM was developed for high-performance computing to improve on a previous ABM and allow for running at a hundred-fold smaller run-time with a much higher capacity for the amount of agents that can be simulated.
Show less