Identifying and Treating Outliers in Finance
Financial Management, Forthcoming
64 Pages Posted: 16 Jun 2017 Last revised: 24 Apr 2021
Date Written: December 14, 2018
Outliers represent a fundamental challenge in empirical finance research. We investigate whether the routine techniques used in finance research to identify and treat outliers are appropriate for the data structures we observe in practice. Specifically, we propose a multivariate identification strategy that can effectively detect outliers. We also introduce an estimator that minimizes the bias outliers cause in both cross-sectional and panel regressions and provide outlier mitigation guidance. Using replications of four recently published studies in premier finance journals, we show how adjusting for multivariate outliers can lead to significantly different results.
NOTE: Updated (04/22/2021) outlier robust estimator packages available via STATA's SSC.
Note that moremata should be updated (to the latest version on SSC) for these packages to work. These packages are:
1) robstat that estimates various classic and robust measures of location, scale, skewness, and kurtosis, and, optionally, performs robust tests for normality.
Jann, B., V. Verardi, C. Vermandele (2018). robstat: Stata module to estimate robust univariate statistics. Available from http://ideas.repec.org/c/boc/bocode/s458524.html.
2) robmv that computes several estimators of multivariate location and scatter (MCD, MVE, M, S, MM, and S-D). Post-estimation command predict can be used after robmv to generate variables identifying multivariate outliers, containing robust distances, etc.
Jann, B., V. Verardi, C. Vermandele (2021). robmv: Stata module for robust multivariate estimation of location and covariance. Available from http://ideas.repec.org/c/boc/bocode/s458895.html
3) robreg that provides a number of robust estimators for linear regression models (LS, QUANTILE, MVE, LTS, M,S and MM) and has been modified with respect to the previous version. It now allows time-series operators, factor variables, clustered standard errors etc.
Jann, B. (2021). robreg: Stata module providing robust regression estimators. Available from http://ideas.repec.org/c/boc/bocode/s458931.html.
4) xtrobreg that provides robust pairwise-differences estimators for panel data. The “convert” subcommand allows to transform the data permanently and when applying robreg manually one gets (using weights) the equivalent of xtrobreg. The advantage of the conversion is that pairwise differences could then be used with any estimator available in Stata.
Jann, B., V. Verardi (2021). xtrobreg: Stata module providing pairwise-differences and first-differences robust regression estimators. Available from http://ideas.repec.org/c/boc/bocode/s458937.html.
As far as point 4 is concerned, xtrobreg supersedes the xtrobust command developed for this paper. It is conceptually similar but is not identical in particular for unbalanced data, for the way in which it deals with missing values and gaps and the way in which dummy variables are treated.
Keywords: Replication, Research Design, Financial Data, Winzorize, Outliers, Robust Regression
JEL Classification: C31, C52, C87, G31, G32, G34, G38
Suggested Citation: Suggested Citation