The Linguistic Properties of Award-winning Annual Reports
76 Pages Posted: 13 May 2020
Date Written: April 14, 2020
We develop and test a model of high quality annual report discourse. The model is trained and evaluated on reports published between 2007 and 2018 by London Stock Exchange-listed firms shortlisted for an award by corporate reporting experts. We use methods from computational linguistics to identify an initial set of 19 features that distinguish quality according to what management say (i.e.: content) and how they say it (i.e.: language structure). We supplement these features with popular bag-of words proxies drawn from extant research (document length, reading ease, net tone, forward-looking content, and uncertainty). Stepwise regression yields a parsimonious quality model comprising 10 features that suggest more strategy-related commentary, less focus on growth, and greater language accessibility that promotes cognitive processing (evidenced by more relevancy markers, greater connectivity, more exclusive forms of language, and fewer grammatical words). The model predicts over 70% of shortlisting cases in out-of-sample tests and outperforms a baseline model comprising popular bag-of-words features.
Keywords: annual report discourse, semantic annotation, corpus linguistics, bag-of-words, prediction model
JEL Classification: M40
Suggested Citation: Suggested Citation