A Huber loss-based super learner with applications to healthcare expenditures

Z Wu, D Benkeser - arXiv preprint arXiv:2205.06870, 2022 - arxiv.org
arXiv preprint arXiv:2205.06870, 2022arxiv.org
Complex distributions of the healthcare expenditure pose challenges to statistical modeling
via a single model. Super learning, an ensemble method that combines a range of
candidate models, is a promising alternative for cost estimation and has shown benefits over
a single model. However, standard approaches to super learning may have poor
performance in settings where extreme values are present, such as healthcare expenditure
data. We propose a super learner based on the Huber loss, a" robust" loss function that …
Complex distributions of the healthcare expenditure pose challenges to statistical modeling via a single model. Super learning, an ensemble method that combines a range of candidate models, is a promising alternative for cost estimation and has shown benefits over a single model. However, standard approaches to super learning may have poor performance in settings where extreme values are present, such as healthcare expenditure data. We propose a super learner based on the Huber loss, a "robust" loss function that combines squared error loss with absolute loss to down-weight the influence of outliers. We derive oracle inequalities that establish bounds on the finite-sample and asymptotic performance of the method. We show that the proposed method can be used both directly to optimize Huber risk, as well as in finite-sample settings where optimizing mean squared error is the ultimate goal. For this latter scenario, we provide two methods for performing a grid search for values of the robustification parameter indexing the Huber loss. Simulations and real data analysis demonstrate appreciable finite-sample gains in cost prediction and causal effect estimation using our proposed method.
arxiv.org