As supercomputers gain more parallelism at exponential rates, the storage infrastructure performance is increasing at a significantly lower... Show moreAs supercomputers gain more parallelism at exponential rates, the storage infrastructure performance is increasing at a significantly lower rate due to relatively centralized management. This implies that the data management and data flow between the storage and compute resources is becoming the new bottleneck for large-scale applications. Similarly, cloud based distributed systems introduce other challenges stemming from the dynamic nature of cloud applications. This dissertation addresses several challenges on storage systems at extreme scales for supercomputers and clouds by designing and implementing a zero-hop distributed NoSQL storage system (ZHT), which has been tuned for the requirements of high-end computing systems. ZHT aims to be a building block for scalable distributed systems. The goals of ZHT are delivering high availability, good fault tolerance, light-weight design, persistence, dynamic joins and leaves, high throughput, and low latencies, at extreme scales (millions of nodes). We have evaluated ZHT’s performance under a variety of systems, ranging from a Linux cluster with 64-nodes, an Amazon EC2 virtual cluster up to 96-nodes, to an IBM Blue Gene/P supercomputer with 8K-nodes. This work also presents several real systems that have adopted ZHT as well as other NoSQL systems, namely ZHT/Q, FusionFS, IStore, MATRIX, Slurm++, Fabriq, FREIDAState, and WaggleDB, all of these real systems have been significantly simplified due to NoSQL storage systems, and have been shown to outperform other leading systems by orders of magnitude in some cases. Through our work, we have shown how NoSQL storage systems can help on both performance and scalability at large scales in such a variety of environments. Ph.D. in Computer Science, December 2015 Show less