Explaining why an answer is in the result of a query or why it is missing from the result is important for many applications including... Show moreExplaining why an answer is in the result of a query or why it is missing from the result is important for many applications including auditing, debugging data and queries, hypothetical reasoning about data, and data exploration. Both types of questions, i.e., why and why-not provenance, have been studied extensively, but mostly in isolation. A recent study shows that unification of why and why-not provenance can be achieved by developing a provenance model for queries with negation. In many complex queries, negation is natural and yields more expressive power. Thus, supporting both types of provenance and negation together can be useful for, e.g., debugging (missing) data over complex queries with negation. However, why-not provenance and — to a lesser degree — why provenance, can be very large resulting in severe scalability and usability challenges.In this thesis, we introduce a framework that unifies why and why-not provenance. We develop a graph-based provenance model that is powerful enough to encode the evaluation of queries with negation (First-Order queries). We demonstrate that our model generalizes a wide range of provenance models from the literature. Using our model, we present the first practical approach that efficiently generates explanations, i.e., parts of the provenance that are relevant to the query outputs of interest. Furthermore, we present a novel approximate summarization technique to address the scalability and usability challenges. Our technique efficiently computes pattern-based provenance summaries that balance informativeness, conciseness, and completeness. To achieve scalability, we integrate sampling techniques into provenance capture and summarization. We implement these techniques in our PUG (Provenance Unification through Graphs) system which runs on top of a relational database. We demonstrate through extensive experiments that our approach scales to large datasets and produces comprehensive and meaningful (summaries of) provenance. Show less