Clustering in Very High Dimensions

Fionn Murtagh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

High dimensional data typify pattern recognition problems in bioinformatics, information retrieval, and various other fields. We discuss metric space properties in the context of four scenarios: increased dimensionality leading to larger
dissimilarities; uniform and Gaussian distributed points in the context of increasing dimensionality; and how pivot-based search can be understood in high dimensions. Conclusions include: (i) preprocessing using an ultrametric data structure (i.e., resulting from a hierarchical clustering) can lead to far faster proximity searching, among other operations; (ii) a locally ultrametric topology is targeted by pivot-based branch and bound searching; but (iii) high dimensional, structureless data (e.g., uniformly or Gaussian distributed) also become ultrametric.
Original languageEnglish
Title of host publicationProceedings of the 2005 UK Workshop on Computational Intelligence
Subtitle of host publicationUKCI 2005
EditorsBoris Mirkin, George Magoulas
PublisherBirkbeck, University of London
Pages226-231
Number of pages6
Publication statusPublished - 2005
Externally publishedYes
EventUK Workshop on Computational Intelligence - Birkbeck, University of London, London, United Kingdom
Duration: 5 Sep 20057 Sep 2005
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.3521&rep=rep1&type=pdf

Conference

ConferenceUK Workshop on Computational Intelligence
Abbreviated titleUKCI 2005
CountryUnited Kingdom
CityLondon
Period5/09/057/09/05
Internet address

Fingerprint Dive into the research topics of 'Clustering in Very High Dimensions'. Together they form a unique fingerprint.

  • Cite this

    Murtagh, F. (2005). Clustering in Very High Dimensions. In B. Mirkin, & G. Magoulas (Eds.), Proceedings of the 2005 UK Workshop on Computational Intelligence: UKCI 2005 (pp. 226-231). Birkbeck, University of London.