NSF Org: |
DMS Division Of Mathematical Sciences |
Recipient: |
|
Initial Amendment Date: | June 9, 2020 |
Latest Amendment Date: | June 9, 2020 |
Award Number: | 2015552 |
Award Instrument: | Standard Grant |
Program Manager: |
Yong Zeng
yzeng@nsf.gov (703)292-7299 DMS Division Of Mathematical Sciences MPS Direct For Mathematical & Physical Scien |
Start Date: | July 1, 2020 |
End Date: | June 30, 2024 (Estimated) |
Total Intended Award Amount: | $250,000.00 |
Total Awarded Amount to Date: | $250,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1960 KENNY RD COLUMBUS OH US 43210-1016 (614)688-8735 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
1958 Neil Ave. Columbus OH US 43210-1016 |
Primary Place of Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | STATISTICS |
Primary Program Source: |
|
Program Reference Code(s): | |
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.049 |
ABSTRACT
A scientific mission of critical importance is to transform massive data into actionable knowledge, which largely centers on understanding causal relationships. Causal inference has become one of three main tasks in data science, in addition to descriptive and predictive analyses. This research project aims to close existing gaps in estimation of heterogeneous causal effects and will make more statistical tools available for analyzing massive observational data. It will blend the conventional statistical approaches to causal inference with the fast-growing machine learning techniques and provide researchers and policy makers with powerful methodological tools to better evaluate the impact of interventions and thus to optimize decision making. Doctoral students in Statistics and Biostatistics will be involved in the development and implementation of the methods.
This project concerns the development of a stream of innovative Bayesian semiparametric methods for efficient and robust causal inference in the presence of effect heterogeneity in large observational datasets. Conventional statistical approaches have a strong tie to randomized experiments, which enjoy easy causal interpretation but may suffer in terms of efficiency. Moreover, recently developed nonparametric regression and machine learning methods focus primarily on outcome modelling and prediction, which may encounter troubles from confounding and are often more difficult to interpret. Furthermore, hidden bias from unmeasured confounding is a major concern in observational studies. The status quo sensitivity analysis for assessing hidden bias does not accommodate complex data structures. The PIs will develop a robust Bayesian semiparametric framework for incorporating the treatment assignment process into the outcome modelling. The framework can easily accommodate complex heterogenous effects or hierarchical structures in massive observational data, adequately take advantage of experts? knowledge and existing causal theory on how the intervention might work, and effectively assess the impact due to potential unmeasured confounders. Propensity scores will be incorporated in potential outcome models via Gaussian process priors and connections with the conventional matching estimators will be established. Moreover, the impact of unmeasured confounding will be assessed through Bayesian sensitivity analysis.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Intellectual Merit:
Causal inference lies at the core of many scientific endeavors. For example, how effective will a new medication be for a specific group of patients, or how much can a new policy reduce the crime rate? These questions require moving beyond mere associations to address causality—if we intervene in a specific way, what will the outcome be?
Traditionally, statistical developments have relied heavily on randomized experiments. However, in the era of big data, many large datasets are either entirely observational or have a significant observational component. Causal inference for observational datasets is more challenging than for randomized experiments for several reasons: key covariates may be distributed differently between intervention groups; unmeasured confounders might exist and are correlated with both the treatment and the outcome; and treated and control groups may exhibit different functional patterns with respect to the response. These differences contribute to two widely recognized challenges of causal inference — the adjustment for confounders and the modeling of outcomes — which must be addressed simultaneously. Moreover, the complexities of modern datasets necessitate flexible models to capture heterogeneity. However, both traditional design-based methods and many nonparametric regression-based approaches struggle to incorporate such information.
We addressed causal inference for large observational datasets by developing innovative Bayesian semiparametric methods that account for effect heterogeneity. These methods remove confounding factors by incorporating propensity scores and model potential outcomes using flexible semiparametric Gaussian process regressions. They can accommodate complex subgroup or hierarchical structures within massive observational data and effectively integrate external information or expert knowledge about causal relationships. Our causal estimators align with conventional propensity score matching estimators under certain prior specifications, while offering substantially improved efficiency and better identification of heterogeneous effects in general scenarios. Moreover, they provide clearer interpretations and more robust estimates compared to many machine learning estimators.
We also evaluated the impact of unmeasured confounding through Bayesian sensitivity analyses. In the presence of heterogeneity, the treatment effect is determined by a function of covariates rather than a single parameter, requiring more sophisticated evaluation strategies. We simulated and incorporated unmeasured confounders in both the response and treatment models, with coefficients serving as sensitivity parameters. The values of these sensitivity parameters were then compared with the marginal effects of observed covariates, providing researchers with a benchmark for determining problematic levels of confounding.
This project effectively combined the potential outcome framework with semiparametric Bayesian models, making significant contributions in formalizing and solving complex causal inference problems in observational studies.
Broader Impacts:
The project has had a significant impact on human resource development, fostering the academic growth of eleven doctoral students in the Statistics and Biostatistics PhD programs at Ohio State University. Seven of these students are female, and four have since graduated and transitioned into industry or academic positions. Through their involvement in this project, they gained substantial experience in Bayesian causal inference, laying a strong foundation for their development into independent, analytically proficient statisticians with a comprehensive understanding of the field.
Also, the project has provided valuable opportunities for training and professional development. The knowledge gained from this research was integrated into a graduate-level course on Causal Inference, attended by both master’s and PhD students from diverse disciplines, including Statistics, Biostatistics, and Epidemiology. This course offered students firsthand exposure to cutting-edge research in causal inference.
The models developed through this project have a broad range of applications, spanning sectors such as medicine, education, and business. Through the PIs' proactive engagement with various communities, the project's results have been widely disseminated through presentations, scholarly publications, and collaborative research initiatives. Furthermore, our Bayesian nonparametric causal inference model has been implemented in R, with the computational code freely accessible to the public, promoting openness and collaboration.
Last Modified: 08/31/2024
Modified by: Xinyi Xu
Please report errors in award information by writing to: awardsearch@nsf.gov.