A Model for Clustering Data from Heterogeneous Dissimilarities

European Journal of Operations Research, Forthcoming

Georgetown McDonough School of Business Research Paper No. 2740618

33 Pages Posted: 4 Mar 2016

See all articles by Everton Santi

Everton Santi

Universidade Federal do Rio Grande do Norte (UFRN)

Daniel Aloise

Universidade Federal do Rio Grande do Norte (UFRN)

Simon J. Blanchard

Georgetown University - Robert Emmett McDonough School of Business

Date Written: March 1, 2016

Abstract

Clustering algorithms partition a set of n objects into p groups (called clusters), such that objects assigned to the same groups are homogeneous according to some criteria. To derive these clusters, the data input required is often a single n × n dissimilarity matrix. Yet for many applications, more than one instance of the dissimilarity matrix is available and so to conform to model requirements, it is common practice to aggregate (e.g., sum up, average) the matrices. This aggregation practice results in clustering solutions that mask the true nature of the original data. In this paper we introduce a clustering model which, to handle the heterogeneity, uses all available dissimilarity matrices and identifies for groups of individuals clustering objects in a similar way. The model is a nonconvex problem and difficult to solve exactly, and we thus introduce a Variable Neighborhood Search heuristic to provide solutions efficiently. Computational experiments and an empirical application to perception of chocolate candy show that the heuristic algorithm is efficient and that the proposed model is suited for recovering heterogeneous data. Implications for clustering researchers are discussed.

Keywords: Data mining, clustering, heterogeneity, optimization, heuristics, p-median

Suggested Citation

Santi, Everton and Aloise, Daniel and Blanchard, Simon J., A Model for Clustering Data from Heterogeneous Dissimilarities (March 1, 2016). European Journal of Operations Research, Forthcoming, Georgetown McDonough School of Business Research Paper No. 2740618, Available at SSRN: https://ssrn.com/abstract=2740618

Everton Santi

Universidade Federal do Rio Grande do Norte (UFRN) ( email )

PO Box 1524
Natal-RN, 59078970
Brazil

Daniel Aloise (Contact Author)

Universidade Federal do Rio Grande do Norte (UFRN) ( email )

PO Box 1524
Natal-RN, 59078970
Brazil

Simon J. Blanchard

Georgetown University - Robert Emmett McDonough School of Business ( email )

3700 O Street, NW
Washington, DC 20057
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
25
Abstract Views
290
PlumX Metrics