[go: up one dir, main page]

Award Abstract # 2015552
Robust Bayesian Semiparametric Inference of Heterogeneous Causal Effects in Observational Studies

NSF Org: DMS
Division Of Mathematical Sciences
Recipient: OHIO STATE UNIVERSITY, THE
Initial Amendment Date: June 9, 2020
Latest Amendment Date: June 9, 2020
Award Number: 2015552
Award Instrument: Standard Grant
Program Manager: Yong Zeng
yzeng@nsf.gov
 (703)292-7299
DMS
 Division Of Mathematical Sciences
MPS
 Direct For Mathematical & Physical Scien
Start Date: July 1, 2020
End Date: June 30, 2024 (Estimated)
Total Intended Award Amount: $250,000.00
Total Awarded Amount to Date: $250,000.00
Funds Obligated to Date: FY 2020 = $250,000.00
History of Investigator:
  • Xinyi Xu (Principal Investigator)
    xinyi@stat.osu.edu
  • Steven MacEachern (Co-Principal Investigator)
  • Bo Lu (Co-Principal Investigator)
Recipient Sponsored Research Office: Ohio State University
1960 KENNY RD
COLUMBUS
OH  US  43210-1016
(614)688-8735
Sponsor Congressional District: 03
Primary Place of Performance: Ohio State University
1958 Neil Ave.
Columbus
OH  US  43210-1016
Primary Place of Performance
Congressional District:
03
Unique Entity Identifier (UEI): DLWBSLWAJWR1
Parent UEI: MN4MDDMN8529
NSF Program(s): STATISTICS
Primary Program Source: 01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s): 126900
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.049

ABSTRACT

A scientific mission of critical importance is to transform massive data into actionable knowledge, which largely centers on understanding causal relationships. Causal inference has become one of three main tasks in data science, in addition to descriptive and predictive analyses. This research project aims to close existing gaps in estimation of heterogeneous causal effects and will make more statistical tools available for analyzing massive observational data. It will blend the conventional statistical approaches to causal inference with the fast-growing machine learning techniques and provide researchers and policy makers with powerful methodological tools to better evaluate the impact of interventions and thus to optimize decision making. Doctoral students in Statistics and Biostatistics will be involved in the development and implementation of the methods.

This project concerns the development of a stream of innovative Bayesian semiparametric methods for efficient and robust causal inference in the presence of effect heterogeneity in large observational datasets. Conventional statistical approaches have a strong tie to randomized experiments, which enjoy easy causal interpretation but may suffer in terms of efficiency. Moreover, recently developed nonparametric regression and machine learning methods focus primarily on outcome modelling and prediction, which may encounter troubles from confounding and are often more difficult to interpret. Furthermore, hidden bias from unmeasured confounding is a major concern in observational studies. The status quo sensitivity analysis for assessing hidden bias does not accommodate complex data structures. The PIs will develop a robust Bayesian semiparametric framework for incorporating the treatment assignment process into the outcome modelling. The framework can easily accommodate complex heterogenous effects or hierarchical structures in massive observational data, adequately take advantage of experts? knowledge and existing causal theory on how the intervention might work, and effectively assess the impact due to potential unmeasured confounders. Propensity scores will be incorporated in potential outcome models via Gaussian process priors and connections with the conventional matching estimators will be established. Moreover, the impact of unmeasured confounding will be assessed through Bayesian sensitivity analysis.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 25)
Nattino, Giovanni and Song, Chi and Lu, Bo "Polymatching algorithm in observational studies with multiple treatment groups" Computational Statistics & Data Analysis , v.167 , 2022 https://doi.org/10.1016/j.csda.2021.107364 Citation Details
Plascak, Jesse J. and Rundle, Andrew G. and Xu, Xinyi and Mooney, Stephen J. and Schootman, Mario and Lu, Bo and Roy, Jason and Stroup, Antoinette M. and Llanos, Adana A. "Associations between neighborhood disinvestment and breast cancer outcomes within a populous state registry" Cancer , v.128 , 2022 https://doi.org/10.1002/cncr.33900 Citation Details
Ni, Ai and Lin, Zihan and Lu, Bo "Stratified Restricted Mean Survival Time Model for Marginal Causal Effect in Observational Survival Data" Annals of Epidemiology , v.64 , 2021 https://doi.org/10.1016/j.annepidem.2021.09.016 Citation Details
Zhang, Yuyang and Schnell, Patrick and Song, Chi and Huang, Bin and Lu, Bo "Subgroup causal effect identification and estimation via matching tree" Computational Statistics & Data Analysis , v.159 , 2021 https://doi.org/10.1016/j.csda.2021.107188 Citation Details
Wu, Peng and Xu, Xinyi and Tong, Xingwei and Jiang, Qing and Lu, Bo "Semiparametric estimation for average causal effects using propensity score-based spline" Journal of Statistical Planning and Inference , v.212 , 2021 https://doi.org/10.1016/j.jspi.2020.10.004 Citation Details
Plascak, Jesse J. and Mooney, Stephen J. and Schootman, Mario and Rundle, Andrew G. and Llanos, Adana A.M. and Qin, Bo and Hong, Chi-Chen and Demissie, Kitaw and Bandera, Elisa V and Xu, Xinyi "Validating a spatio-temporal model of observed neighborhood physical disorder" Spatial and Spatio-temporal Epidemiology , v.41 , 2022 https://doi.org/10.1016/j.sste.2022.100506 Citation Details
Cho, M.H. and Kurtek, S. and MacEachern, S.N. "Aggregated pairwise classification of elastic planar shapes" The annals of applied statistics , v.15 , 2021 https://doi.org/doi.org/10.1214/21-AOAS1452 Citation Details
Kim, Eunseop and MacEachern, Steven N. and Peruggia, Mario "Empirical likelihood for the analysis of experimental designs" Journal of Nonparametric Statistics , 2023 https://doi.org/10.1080/10485252.2023.2206919 Citation Details
MacEachern, SN and Lee, J "Invited discussion of ?Evaluating sensitivity to the stick-breaking prior in Bayesian nonparametrics?" Bayesian analysis , v.18 , 2023 Citation Details
Quintana, Fernando A. and Müller, Peter and Jara, Alejandro and MacEachern, Steven N. "The Dependent Dirichlet Process and Related Models" Statistical Science , v.37 , 2022 https://doi.org/10.1214/20-STS819 Citation Details
Sinnott, JA and MacEachern, SN and Peruggia, M "Rediscovering a little known fact about the t-test and the F-test: algebraic, geometric, distributional and graphical considerations" Statistica , v.82 , 2022 Citation Details
(Showing: 1 - 10 of 25)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Intellectual Merit:

Causal inference lies at the core of many scientific endeavors. For example, how effective will a new medication be for a specific group of patients, or how much can a new policy reduce the crime rate? These questions require moving beyond mere associations to address causality—if we intervene in a specific way, what will the outcome be?

Traditionally, statistical developments have relied heavily on randomized experiments. However, in the era of big data, many large datasets are either entirely observational or have a significant observational component. Causal inference for observational datasets is more challenging than for randomized experiments for several reasons: key covariates may be distributed differently between intervention groups; unmeasured confounders might exist and are correlated with both the treatment and the outcome; and treated and control groups may exhibit different functional patterns with respect to the response. These differences contribute to two widely recognized challenges of causal inference — the adjustment for confounders and the modeling of outcomes — which must be addressed simultaneously.  Moreover, the complexities of modern datasets necessitate flexible models to capture heterogeneity. However, both traditional design-based methods and many nonparametric regression-based approaches struggle to incorporate such information.

We addressed causal inference for large observational datasets by developing innovative Bayesian semiparametric methods that account for effect heterogeneity. These methods remove confounding factors by incorporating propensity scores and model potential outcomes using flexible semiparametric Gaussian process regressions.  They can accommodate complex subgroup or hierarchical structures within massive observational data and effectively integrate external information or expert knowledge about causal relationships.  Our causal estimators align with conventional propensity score matching estimators under certain prior specifications, while offering substantially improved efficiency and better identification of heterogeneous effects in general scenarios. Moreover, they provide clearer interpretations and more robust estimates compared to many machine learning estimators.

We also evaluated the impact of unmeasured confounding through Bayesian sensitivity analyses. In the presence of heterogeneity, the treatment effect is determined by a function of covariates rather than a single parameter, requiring more sophisticated evaluation strategies. We simulated and incorporated unmeasured confounders in both the response and treatment models, with coefficients serving as sensitivity parameters. The values of these sensitivity parameters were then compared with the marginal effects of observed covariates, providing researchers with a benchmark for determining problematic levels of confounding.

This project effectively combined the potential outcome framework with semiparametric Bayesian models, making significant contributions in formalizing and solving complex causal inference problems in observational studies.

 

Broader Impacts:

The project has had a significant impact on human resource development, fostering the academic growth of eleven doctoral students in the Statistics and Biostatistics PhD programs at Ohio State University.  Seven of these students are female, and four have since graduated and transitioned into industry or academic positions. Through their involvement in this project, they gained substantial experience in Bayesian causal inference, laying a strong foundation for their development into independent, analytically proficient statisticians with a comprehensive understanding of the field.

Also, the project has provided valuable opportunities for training and professional development. The knowledge gained from this research was integrated into a graduate-level course on Causal Inference, attended by both master’s and PhD students from diverse disciplines, including Statistics, Biostatistics, and Epidemiology. This course offered students firsthand exposure to cutting-edge research in causal inference.

The models developed through this project have a broad range of applications, spanning sectors such as medicine, education, and business. Through the PIs' proactive engagement with various communities, the project's results have been widely disseminated through presentations, scholarly publications, and collaborative research initiatives. Furthermore, our Bayesian nonparametric causal inference model has been implemented in R, with the computational code freely accessible to the public, promoting openness and collaboration.

 


Last Modified: 08/31/2024
Modified by: Xinyi Xu

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page