Header logo is

A Sober Look at Clustering Stability


Conference Paper


Stability is a common tool to verify the validity of sample based algorithms. In clustering it is widely used to tune the parameters of the algorithm, such as the number k of clusters. In spite of the popularity of stability in practical applications, there has been very little theoretical analysis of this notion. In this paper we provide a formal definition of stability and analyze some of its basic properties. Quite surprisingly, the conclusion of our analysis is that for large sample size, stability is fully determined by the behavior of the objective function which the clustering algorithm is aiming to minimize. If the objective function has a unique global minimizer, the algorithm is stable, otherwise it is unstable. In particular we conclude that stability is not a well-suited tool to determine the number of clusters - it is determined by the symmetries of the data which may be unrelated to clustering parameters. We prove our results for center-based clusterings and for spectral clustering, and support our conclusions by many examples in which the behavior of stability is counter-intuitive.

Author(s): Ben-David, S. and von Luxburg, U. and Pal, D.
Book Title: COLT 2006
Journal: Learning Theory: 19th Annual Conference on Learning Theory (COLT 2006)
Pages: 5-19
Year: 2006
Month: September
Day: 0
Editors: Lugosi, G. , H.-U. Simon
Publisher: Springer

Department(s): Empirical Inference
Bibtex Type: Conference Paper (inproceedings)

DOI: 10.1007/11776420_4
Event Name: 19th Annual Conference on Learning Theory
Event Place: Pittsburgh, PA, USA

Address: Berlin, Germany
Digital: 0
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik

Links: PDF


  title = {A Sober Look at Clustering Stability},
  author = {Ben-David, S. and von Luxburg, U. and Pal, D.},
  journal = {Learning Theory: 19th Annual Conference on Learning Theory (COLT 2006)},
  booktitle = {COLT 2006},
  pages = {5-19},
  editors = {Lugosi, G. , H.-U. Simon},
  publisher = {Springer},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Berlin, Germany},
  month = sep,
  year = {2006},
  month_numeric = {9}