Estimating Average Treatment Effects With Propensity Scores Estimated With Four Machine Learning Procedures: Simulation Results in High Dimensional Settings and With Time to Event Outcomes
24 Pages Posted: 16 Nov 2018
Date Written: September 21, 2018
Background: The increased availability of claims data allows one to build high dimensional datasets, rich in covariates, for accurately estimating treatment effects in medical and epidemiological cohort studies. This paper shows the full potential of machine learning for the estimation of average treatment effects with propensity score methods in a context rich and high dimensional datasets.
Methods: Four different methods are used to estimate average treatment effects in the context of time to event outcomes. The four methods explored in this study are LASSO, Random Forest, Gradient Descent Boosting and Artificial Neural networks. Simulations based on an actual medical claims data set are used to assess the efficiency of these methods. The simulations are performed with over 100, 000 observations and 1,100 explanatory variables. Each method is tested on 500 datasets that are created from the original dataset, allowing us to report the mean and standard deviation of estimated average treatment effects.
Results: The results are very promising for all four methods; however, LASSO, Random Forest and Gradient Boosting seem to be performing better than Random Forest.
Conclusion: Machine Learning methods can be helpful for observational studies that use the propensity score when a very large number of covariates are available, the total number of observations is large, and the dependent event rare. This is an important result given the availability of big data related to Health Economics and Outcomes Research (HEOR) around the world.
Keywords: machine Learning, propensity score, claims data, impact of treatment on treated
JEL Classification: C01, C13, C31, C34, C45, C53, I11
Suggested Citation: Suggested Citation