A Missing Data Paradox for Nearest Neighbor Recommender Systems
7 Pages Posted: 4 Jan 2009 Last revised: 19 Aug 2018
Date Written: October 1, 2007
Recommender systems typically work over sparse matrices. Although most methods assume so, these matrices' entries are often not missing at random (NMAR). How problematic is this? We present a puzzle. Some methods explicitly account for NMAR processes. This has been shown to improve predictions. Many methods, however, assume that entries are missing at random (MAR). While they may be wrong in that assumption, we show they may benefit nonetheless from its being violated. Given that some data must go missing, NMAR can often pick the "right" values to preserve (i.e. it preserves the more informative data). Thus despite the perception that NMAR is bad, it can often improve recommendations. This may explain some of the historical success of collaborative filtering even when this assumption has been violated.
Keywords: recommender systems, collaborative filtering, predictive modeling, missing data
Suggested Citation: Suggested Citation