An Introduction to Machine Learning for Panel Data
93 Pages Posted: 11 Dec 2020 Last revised: 28 Jan 2021
Date Written: October 23, 2020
Machine learning has dramatically expanded the range of tools for evaluating economic panel data. This paper applies a variety of machine-learning methods to the Boston housing dataset, an iconic proving ground for machine learning. Though machine learning often lacks the overt interpretability of linear regression, methods based on decision trees score the relative importance of dataset features. In addition to addressing the theoretical tradeoff between bias and variance, this paper discusses practices rarely followed in traditional economics: the splitting of data into training, validation, and test sets; the scaling of data; and the preference for retaining all data. The choice between traditional and machine-learning methods hinges on practical rather than mathematical considerations. In settings emphasizing interpretative clarity through the scale and sign of regression coefficients, machine learning may best play an ancillary role. Wherever predictive accuracy is paramount, however, or where heteroskedasticity or high dimensionality might impair the clarity of linear methods, machine learning can deliver superior results.
Keywords: Machine learning, bias-variance tradeoff, decision trees, random forests, extra trees, XGBoost, learning ensembles, boosting, support vector machines, neural networks
JEL Classification: C18, C23, C33, C45, R31
Suggested Citation: Suggested Citation