Detecting Anomalous Online Reviewers: An Unsupervised Approach Using Mixture Models
Posted: 24 Oct 2018
Date Written: October 1, 2018
Online reviews and discussions play a significant role in influencing decisions made by users in day-to-day life. However, the presence of reviewers who deliberately post fake or deceptive reviews for financial or other gains negatively impacts both users and businesses. Unfortunately, automatically detecting such reviewers is well known to be a challenging problem, particularly since fake reviews do not seem out-of-context as compared to genuine reviews. In this paper, we present a fully unsupervised approach to detect anomalous behavior in online reviewers. We propose a novel hierarchical approach for this task, in which we (1) derive distributions for key features that define reviewer behavior and (2) combine these distributions into a finite mixture model. Our approach is highly generalizable, allows us to seamlessly combine both univariate and multivariate distributions into a unified anomaly detection system and most importantly requires no explicit labeling (spam/not spam) of the data. We evaluate our approach on real-world customer reviews for restaurants taken from Yelp.com. Our newly developed approach using Gaussian mixture models and one-class support vector machines outperforms prior unsupervised anomaly detection approaches. Furthermore, we also show that our approach outperforms recently developed state-of-the-art unsupervised methods based on probabilistic graphical models for identifying fake reviewers on Yelp.
Keywords: Opinion Spam, Unsupervised Learning, Anomaly Detection, Mixture Models
Suggested Citation: Suggested Citation