Relating clustering stability to properties of cluster boundaries

2008

Conference Paper

ei


In this paper, we investigate stability-based methods for cluster model selection, in particular to select the number K of clusters. The scenario under consideration is that clustering is performed by minimizing a certain clustering quality function, and that a unique global minimizer exists. On the one hand we show that stability can be upper bounded by certain properties of the optimal clustering, namely by the mass in a small tube around the cluster boundaries. On the other hand, we provide counterexamples which show that a reverse statement is not true in general. Finally, we give some examples and arguments why, from a theoretic point of view, using clustering stability in a high sample setting can be problematic. It can be seen that distribution-free guarantees bounding the difference between the finite sample stability and the “true stability” cannot exist, unless one makes strong assumptions on the underlying distribution.

Author(s): Ben-David, S. and von Luxburg, U.
Book Title: COLT 2008
Journal: Proceedings of the 21st Annual Conference on Learning Theory (COLT 2008)
Pages: 379-390
Year: 2008
Month: July
Day: 0
Editors: Servedio, R. A., T. Zhang
Publisher: Omnipress

Department(s): Empirical Inference
Bibtex Type: Conference Paper (inproceedings)

Address: Madison, WI, USA
Digital: 0
Event Name: 21st Annual Conference on Learning Theory
Event Place: Helsinki, Finland
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik

Links: PDF
Web

BibTex

@inproceedings{5105,
  title = {Relating clustering stability to properties of cluster boundaries},
  author = {Ben-David, S. and von Luxburg, U.},
  journal = {Proceedings of the 21st Annual Conference on Learning Theory (COLT 2008)},
  booktitle = {COLT 2008},
  pages = {379-390},
  editors = {Servedio, R. A., T. Zhang},
  publisher = {Omnipress},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Madison, WI, USA},
  month = jul,
  year = {2008},
  month_numeric = {7}
}