Search results
(1 - 3 of 3)
- Title
- BIG DATA SYSTEM INFRASTRUCTURE AT EXTREME SCALES
- Creator
- Zhao, Dongfang
- Date
- 2015, 2015-07
- Description
-
Rapid advances in digital sensors, networks, storage, and computation along with their availability at low cost is leading to the creation of...
Show moreRapid advances in digital sensors, networks, storage, and computation along with their availability at low cost is leading to the creation of huge collections of data { dubbed as Big Data. This data has the potential for enabling new insights that can change the way business, science, and governments deliver services to their consumers and can impact society as a whole. This has led to the emergence of the Big Data Computing paradigm focusing on sensing, collection, storage, management and analysis of data from variety of sources to enable new value and insights. To realize the full potential of Big Data Computing, we need to address several challenges and develop suitable conceptual and technological solutions for dealing them. Today's and tomorrow's extreme-scale computing systems, such as the world's fastest supercomputers, are generating orders of magnitude more data by a variety of scienti c computing applications from all disciplines. This dissertation addresses several big data challenges at extreme scales. First, we quantitatively studied through simulations the predicted performance of existing systems at future scales (for example, exascale 1018 ops). Simulation results suggested that current systems would likely fail to deliver the needed performance at exascale. Then, we proposed a new system architecture and implemented a prototype that was evaluated on tens of thousands nodes on par with the scale of today's largest supercomputers. Micro benchmarks and real-world applications demonstrated the e ectiveness of the proposed architecture: the prototype achieved up to two orders of magnitude higher data movement rate than existing approaches. Moreover, the system prototype was incorporated with features that were not well supported in conventional systems, such as distributed metadata management, distributed caching, lightweight provenance, transparent compression, acceleration through GPU encoding, and parallel serialization. Towards exploring the proposed architecture at millions of node scales, simulations were conducted and evaluated with a variety of workloads, showing near linear scalability and orders of magnitude better performance than today's state-of-the-art storage systems.
Ph.D. in Computer Science, July 2015
Show less
- Title
- Informed Consent in Digital Data Management
- Creator
- Hildt, Elisabeth, Laas, Kelly
- Date
- 2022, 2022-01-03
- Publisher
- Springer, Cham
- Description
-
This article discusses the role of informed consent, a well-known concept and standard established in the field of medicine, in ethics codes...
Show moreThis article discusses the role of informed consent, a well-known concept and standard established in the field of medicine, in ethics codes relating to digital data management. It analyzes the significance allotted to informed consent and informed consent-related principles in ethics codes, policies, and guidelines by presenting the results of a study focused on 31 ethics codes, policies, and guidelines held as part of the Ethics Codes Collection. The analysis reveals that up to now, there is a limited number of codes of ethics, policies, and guidelines on digital data management. Informed consent often is a central component in these codes and guidelines. While there undoubtedly are significant similarities between informed consent in medicine and digital data management, in ethics codes and guidelines, informed consent-related standards in some fields such as marketing are weaker and less strict. The article concludes that informed consent is an essential standard in digital data management that can help effectively shape future practices in the field. However, a more detailed reflection on the specific content and role of informed consent and informed consent-related standards in the various areas of digital data management is needed to avoid the weakening and dilution of standards in contexts where there are no clear legal regulations.
Show less - Collection
- Codes of Ethics and Ethical Guidelines: Emerging Technologies and Changing Fields
- Title
- BIG DATA AS A SERVICE WITH PRIVACY AND SECURITY
- Creator
- Hou, Jiahui
- Date
- 2020
- Description
-
With the increase of data production sources like IoT devices (e.g., smartwatches, smartphones) and data from smart home (health sensor,...
Show moreWith the increase of data production sources like IoT devices (e.g., smartwatches, smartphones) and data from smart home (health sensor, energy sensors), truly mind-boggling amounts of data are generated daily. Building a big data as a service system, that combines big data technologies and cloud computing, will enhance the huge value of big data and tremendously boost the economic growth in various areas. Big data as a service has evolved into a booming market, but with the emergence of larger privacy and security challenges. Privacy and security concerns limit the development of big data as a service and increasingly become one of the main reasons why most data are not shared and well utilized. This dissertation aims to build a new incrementally deployable middleware for the current and future big data as a service eco-system in order to guarantee privacy and security. This middleware will retain privacy and security in the data querying and ensure privacy preservation in data analysis. In addition, emerging cloud computing contributes to providing valuable services associated with machine learning (ML) techniques. We consider privacy issues in both traditional queries and ML queries (i.e., ML classification) in this dissertation. The final goal is to design and develop a demonstrable system that can be deployed in the big data as a service system in order to guarantee the privacy of data/ service owners as well as users, enabling secure data analysis and services.Firstly, we consider a private dataset composed of a set of individuals, and the data is outsourced to a remote cloud server. We revisit the classic query auditing problem in the outsourcing scenario. Secondly, we study privacy preserving neural network classification where source data is randomly partitioned. Thirdly, we concern the privacy of confidential training dataset and models which are typically trained in a centralized cloud server but publicly accessible, \ie online ML-as-a-Service (MLaaS). Lastly, we consider the offline MLaaS systems. We design, implement, and evaluate a secure ML framework to enable MLaaS on clients' edge devices, where a ``encrypted'' ML models are stored locally.
Show less