Split Decisions: Practical Machine Learning for Empirical Legal Scholarship
46 Pages Posted: 11 Dec 2020 Last revised: 6 Feb 2021
Date Written: November 16, 2020
Multivariable regression may be the most prevalent and useful task in social science. Empirical legal studies rely heavily on the ordinary least squares method. Conventional regression methods have attained credibility in court, but by no means do they dictate legal outcomes. Using the iconic Boston housing study as a source of price data, this Article introduces machine-learning regression methods. Although decision trees and forest ensembles lack the overt interpretability of linear regression, these methods reduce the opacity of black-box techniques by scoring the relative importance of dataset features. This Article will also address the theoretical tradeoff between bias and variance, as well as the importance of training, cross-validation, and reserving a holdout dataset for testing.
Suggested Citation: Suggested Citation