Predicting Returns with Text Data
61 Pages Posted: 20 May 2019 Last revised: 12 Oct 2020
Date Written: September 30, 2020
We introduce a new text-mining methodology that extracts information from news articles to predict asset returns. Unlike more common sentiment scores used for stock return prediction (e.g., those sold by commercial vendors or built with dictionary-based methods), our supervised learning framework constructs a score that is specifically adapted to the problem of return prediction. Our method proceeds in three steps: 1) isolating a list of terms via predictive screening, 2) assigning prediction weights to these words via topic modeling, and 3) aggregating terms into an article-level predictive score via penalized likelihood. We derive theoretical guarantees on the accuracy of estimates from our model with minimal assumptions. In our empirical analysis, we study one of the most actively monitored streams of news articles in the financial system--the Dow Jones Newswires--and show that our supervised text model excels at extracting return-predictive signals in this context. Information in newswires is assimilated into prices with an ineffcient delay that is broadly consistent with limits-to-arbitrage (i.e., more severe for smaller and more volatile firms) yet can be exploited in a real-time trading strategy with reasonable turnover and net of transaction costs.
Keywords: Text Mining, Machine Learning, Return Predictability, Sentiment Analysis, Screening, Topic Modeling, Penalized Likelihood
Suggested Citation: Suggested Citation