[go: up one dir, main page]

Repository logo
 

Supporting Reliable Data Analysis by Evaluating All Reasonable Analytic Decisions

Loading...
Thumbnail Image

Date

Authors

Liu, Yang

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Analysts make many, sometimes arbitrary, decisions throughout the data analysis pipeline, yet different choices can lead to divergent conclusions. The flexibility of making analytic decisions can inflate false positive rates and lead to non-replicable findings. In this dissertation, we first characterize how researchers make analytic decisions in their analysis pipeline. We confirm that researchers may experiment with choices in search of desirable results, but also identify other reasons why researchers explore alternatives yet omit findings. A promising approach to address decision flexibility is multiverse analysis – rather than fixating on a single analytic path, a multiverse analysis evaluates all “reasonable” analytic decisions in parallel and interprets results collectively. We introduce tools and techniques that lower the barriers for analysts to author, run, and interpret multiverse analyses. We present the Boba DSL, a domain-specific language that represents the structure of the decision space, providing critical context for subsequent system components. We introduce the Boba Monitor, a dashboard that leverages approximation algorithms under the hood to enable monitoring progress and diagnosing issues while the multiverse is still running. We contribute the Boba Visualizer, a visual analysis system that aids users in interpreting the outcomes of all analytic paths, with judicious design choices that push users towards reducing rather than suppressing uncertainty. Finally, we discuss case studies where model quality issues change what one can reasonably take away from the multiverse and justify an iterative workflow. We hope that our findings will help inspire the design of both improved analysis tools and community standards.

Description

Thesis (Ph.D.)--University of Washington, 2022

Keywords

analytic decisions, multiverse analysis, reproducibility, statistical analysis, Computer science

Citation

DOI