Search results

Title: BIG DATA SYSTEM INFRASTRUCTURE AT EXTREME SCALES
Creator: Zhao, Dongfang
Date: 2015, 2015-07
Description: Rapid advances in digital sensors, networks, storage, and computation along with their availability at low cost is leading to the creation of...
Show moreRapid advances in digital sensors, networks, storage, and computation along with their availability at low cost is leading to the creation of huge collections of data { dubbed as Big Data. This data has the potential for enabling new insights that can change the way business, science, and governments deliver services to their consumers and can impact society as a whole. This has led to the emergence of the Big Data Computing paradigm focusing on sensing, collection, storage, management and analysis of data from variety of sources to enable new value and insights. To realize the full potential of Big Data Computing, we need to address several challenges and develop suitable conceptual and technological solutions for dealing them. Today's and tomorrow's extreme-scale computing systems, such as the world's fastest supercomputers, are generating orders of magnitude more data by a variety of scienti c computing applications from all disciplines. This dissertation addresses several big data challenges at extreme scales. First, we quantitatively studied through simulations the predicted performance of existing systems at future scales (for example, exascale 1018 ops). Simulation results suggested that current systems would likely fail to deliver the needed performance at exascale. Then, we proposed a new system architecture and implemented a prototype that was evaluated on tens of thousands nodes on par with the scale of today's largest supercomputers. Micro benchmarks and real-world applications demonstrated the e ectiveness of the proposed architecture: the prototype achieved up to two orders of magnitude higher data movement rate than existing approaches. Moreover, the system prototype was incorporated with features that were not well supported in conventional systems, such as distributed metadata management, distributed caching, lightweight provenance, transparent compression, acceleration through GPU encoding, and parallel serialization. Towards exploring the proposed architecture at millions of node scales, simulations were conducted and evaluated with a variety of workloads, showing near linear scalability and orders of magnitude better performance than today's state-of-the-art storage systems.
Ph.D. in Computer Science, July 2015
Show less

Title: POWER GRID VERIFICATION ON CLOUD
Creator: Gupte, Naval
Date: 2016, 2016-05
Description: Reliability and performance of modern ICs is becoming increasingly susceptible to supply voltage variations. Increased demand for low voltage...
Show moreReliability and performance of modern ICs is becoming increasingly susceptible to supply voltage variations. Increased demand for low voltage integrated circuits has made power grid analysis extremely critical and indispensable in modern design flows. Efficient validation of on-chip power distribution network is computationally demanding because of increasing grid sizes. Power grid simulation is critical for analysis and verification of power supply noises for robust and reliable IC designs. Computational demands to simulate power grids for ICs with increasing complexity is never-ending. Cloud computing platforms can be leveraged to mitigate costs associated with making these resources available. However, since simulation data usually contains sensitive design information, simulating on third-party platforms lead to major security concerns. In this study, we propose a framework for secure power grid simulation on Cloud. A transformation algorithm to hide current excitations is presented, while still allowing a majority of computations to be completed on Cloud. We employ multiple compression strategies to significantly reduce communication and storage overheads. Experiments show that our framework can achieve similar turn-around time as an insecure simulator on Cloud, while securing current excitations and output voltage vectors with reasonable communication and computational overheads. Vectorless technique to grid verification estimates worst-case voltage noises without detailed enumeration of load current excitations. We study voltage noise assessment in RLC models of VDD and GND networks in integrated power grids. Abstract grid model is utilized to abbreviate runtime, while transient constraints capture transitory circuit behaviour. Heuristics are employed to extract constraints that restrict power consumption profiles to realistic scenarios. Multiple linear programming problems are formulated to evaluate bounds on voltage overshoots and undershoots. We propose ways to mitigate storage and computational requirements on processing resources, enabling users to deploy computations on economical Cloud Computing platforms. Recommended solution is parallelizable, thereby reducing the overall verification time. Data compression is applied to fully exploit the compute capabilities of contemporary processors for higher throughputs. Experimental results suggest that the proposed technique is practical and scalable for industrial grids.
Ph.D. in Electrical Engineering, May 2016
Show less

Title: DISTRIBUTED NOSQL STORAGE FOR EXTREME-SCALE SYSTEM SERVICES IN CLOUDS AND SUPERCOMPUTERS
Creator: Li, Tonglin
Date: 2015, 2015-12
Description: As supercomputers gain more parallelism at exponential rates, the storage infrastructure performance is increasing at a significantly lower...
Show moreAs supercomputers gain more parallelism at exponential rates, the storage infrastructure performance is increasing at a significantly lower rate due to relatively centralized management. This implies that the data management and data flow between the storage and compute resources is becoming the new bottleneck for large-scale applications. Similarly, cloud based distributed systems introduce other challenges stemming from the dynamic nature of cloud applications. This dissertation addresses several challenges on storage systems at extreme scales for supercomputers and clouds by designing and implementing a zero-hop distributed NoSQL storage system (ZHT), which has been tuned for the requirements of high-end computing systems. ZHT aims to be a building block for scalable distributed systems. The goals of ZHT are delivering high availability, good fault tolerance, light-weight design, persistence, dynamic joins and leaves, high throughput, and low latencies, at extreme scales (millions of nodes). We have evaluated ZHT’s performance under a variety of systems, ranging from a Linux cluster with 64-nodes, an Amazon EC2 virtual cluster up to 96-nodes, to an IBM Blue Gene/P supercomputer with 8K-nodes. This work also presents several real systems that have adopted ZHT as well as other NoSQL systems, namely ZHT/Q, FusionFS, IStore, MATRIX, Slurm++, Fabriq, FREIDAState, and WaggleDB, all of these real systems have been significantly simplified due to NoSQL storage systems, and have been shown to outperform other leading systems by orders of magnitude in some cases. Through our work, we have shown how NoSQL storage systems can help on both performance and scalability at large scales in such a variety of environments.
Ph.D. in Computer Science, December 2015
Show less

Title: BIG DATA AS A SERVICE WITH PRIVACY AND SECURITY
Creator: Hou, Jiahui
Date: 2020
Description: With the increase of data production sources like IoT devices (e.g., smartwatches, smartphones) and data from smart home (health sensor,...
Show moreWith the increase of data production sources like IoT devices (e.g., smartwatches, smartphones) and data from smart home (health sensor, energy sensors), truly mind-boggling amounts of data are generated daily. Building a big data as a service system, that combines big data technologies and cloud computing, will enhance the huge value of big data and tremendously boost the economic growth in various areas. Big data as a service has evolved into a booming market, but with the emergence of larger privacy and security challenges. Privacy and security concerns limit the development of big data as a service and increasingly become one of the main reasons why most data are not shared and well utilized. This dissertation aims to build a new incrementally deployable middleware for the current and future big data as a service eco-system in order to guarantee privacy and security. This middleware will retain privacy and security in the data querying and ensure privacy preservation in data analysis. In addition, emerging cloud computing contributes to providing valuable services associated with machine learning (ML) techniques. We consider privacy issues in both traditional queries and ML queries (i.e., ML classification) in this dissertation. The final goal is to design and develop a demonstrable system that can be deployed in the big data as a service system in order to guarantee the privacy of data/ service owners as well as users, enabling secure data analysis and services.Firstly, we consider a private dataset composed of a set of individuals, and the data is outsourced to a remote cloud server. We revisit the classic query auditing problem in the outsourcing scenario. Secondly, we study privacy preserving neural network classification where source data is randomly partitioned. Thirdly, we concern the privacy of confidential training dataset and models which are typically trained in a centralized cloud server but publicly accessible, \ie online ML-as-a-Service (MLaaS). Lastly, we consider the offline MLaaS systems. We design, implement, and evaluate a secure ML framework to enable MLaaS on clients' edge devices, where a ``encrypted'' ML models are stored locally.
Show less

repository.iit

Search the repository

Enabled Filters

Refine Results

Type

Date

Department

Subject

Creator

Rights