Linear Storage and Potentially Constant Time Hierarchical Clustering Using the Baire Metric and Random Spinning Paths

Fionn Murtagh, Pedro Contreras

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

We study how random projections can be used with large data sets in order (1) to cluster the data using a fast, binning approach which is characterized in terms of direct inducing of a hierarchy through use of the Bairemetric; and (2) based on clusters found, selecting subsets of the original data for further analysis. In this work, we focus on random projection that is used for processing high dimensional data. A random projection, outputting a random permutation of the observation set, provides a random spanning path. We show how a spanning path relates to contiguity- or adjacency-constrained clustering.We study performance properties of hierarchical clustering constructed from random spanning paths, and we introduce a novel visualization of the results.

Original languageEnglish
Title of host publicationAnalysis of Large and Complex Data
PublisherKluwer Academic Publishers
Pages43-52
Number of pages10
ISBN (Print)9783319252247
DOIs
Publication statusPublished - 4 Aug 2016
Externally publishedYes
Event2nd European Conference on Data Analysis: Analysis of Large and Complex Data - Bremen, Germany
Duration: 2 Jul 20144 Jul 2014
Conference number: 2
https://dblp.org/db/conf/ecda/ecda2014

Conference

Conference2nd European Conference on Data Analysis
Abbreviated titleECDA2014
Country/TerritoryGermany
CityBremen
Period2/07/144/07/14
Internet address

Fingerprint

Dive into the research topics of 'Linear Storage and Potentially Constant Time Hierarchical Clustering Using the Baire Metric and Random Spinning Paths'. Together they form a unique fingerprint.

Cite this