Healthcare Cost Regressions: Going Beyond the Mean to Estimate the Full Distribution
通过准蒙特卡洛实验比较14种医疗成本分布建模方法,发现无单一方法占优,尾部概率预测存在偏差与精度的权衡,分布方法在大样本下潜力显著。
Understanding the data generating process behind healthcare costs remains a key empirical issue. Although much research to date has focused on the prediction of the conditional mean cost, this can potentially miss important features of the full distribution such as tail probabilities. We conduct a quasi-Monte Carlo experiment using the English National Health Service inpatient data to compare 14 approaches in modelling the distribution of healthcare costs: nine of which are parametric and have commonly been used to fit healthcare costs, and five others are designed specifically to construct a counterfactual distribution. Our results indicate that no one method is clearly dominant and that there is a trade-off between bias and precision of tail probability forecasts. We find that distributional methods demonstrate significant potential, particularly with larger sample sizes where the variability of predictions is reduced. Parametric distributions such as log-normal, generalised gamma and generalised beta of the second kind are found to estimate tail probabilities with high precision but with varying bias depending upon the cost threshold being considered.