Linear Storage and Potentially Constant Time Hierarchical Clustering Using the Baire Metric and Random Spinning Paths

Fionn Murtagh, Pedro Contreras

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

We study how random projections can be used with large data sets in order (1) to cluster the data using a fast, binning approach which is characterized in terms of direct inducing of a hierarchy through use of the Bairemetric; and (2) based on clusters found, selecting subsets of the original data for further analysis. In this work, we focus on random projection that is used for processing high dimensional data. A random projection, outputting a random permutation of the observation set, provides a random spanning path. We show how a spanning path relates to contiguity- or adjacency-constrained clustering.We study performance properties of hierarchical clustering constructed from random spanning paths, and we introduce a novel visualization of the results.

LanguageEnglish
Title of host publicationAnalysis of Large and Complex Data
PublisherKluwer Academic Publishers
Pages43-52
Number of pages10
ISBN (Print)9783319252247
DOIs
Publication statusPublished - 4 Aug 2016
Externally publishedYes
Event2nd European Conference on Data Analysis - Bremen, Germany
Duration: 2 Jul 20144 Jul 2014
Conference number: 2

Conference

Conference2nd European Conference on Data Analysis
Abbreviated titleECDA 2014
CountryGermany
CityBremen
Period2/07/144/07/14

Fingerprint

Random Projection
Hierarchical Clustering
Time Constant
Visualization
Metric
Path
Processing
Contiguity
Binning
Random Permutation
Adjacency
High-dimensional Data
Large Data Sets
Clustering
Subset
Hierarchical clustering

Cite this

Murtagh, Fionn ; Contreras, Pedro. / Linear Storage and Potentially Constant Time Hierarchical Clustering Using the Baire Metric and Random Spinning Paths. Analysis of Large and Complex Data. Kluwer Academic Publishers, 2016. pp. 43-52
@inproceedings{1035e6f91b05434194c64ec443bd5b40,
title = "Linear Storage and Potentially Constant Time Hierarchical Clustering Using the Baire Metric and Random Spinning Paths",
abstract = "We study how random projections can be used with large data sets in order (1) to cluster the data using a fast, binning approach which is characterized in terms of direct inducing of a hierarchy through use of the Bairemetric; and (2) based on clusters found, selecting subsets of the original data for further analysis. In this work, we focus on random projection that is used for processing high dimensional data. A random projection, outputting a random permutation of the observation set, provides a random spanning path. We show how a spanning path relates to contiguity- or adjacency-constrained clustering.We study performance properties of hierarchical clustering constructed from random spanning paths, and we introduce a novel visualization of the results.",
author = "Fionn Murtagh and Pedro Contreras",
year = "2016",
month = "8",
day = "4",
doi = "10.1007/978-3-319-25226-1_4",
language = "English",
isbn = "9783319252247",
pages = "43--52",
booktitle = "Analysis of Large and Complex Data",
publisher = "Kluwer Academic Publishers",
address = "Netherlands",

}

Murtagh, F & Contreras, P 2016, Linear Storage and Potentially Constant Time Hierarchical Clustering Using the Baire Metric and Random Spinning Paths. in Analysis of Large and Complex Data. Kluwer Academic Publishers, pp. 43-52, 2nd European Conference on Data Analysis, Bremen, Germany, 2/07/14. https://doi.org/10.1007/978-3-319-25226-1_4

Linear Storage and Potentially Constant Time Hierarchical Clustering Using the Baire Metric and Random Spinning Paths. / Murtagh, Fionn; Contreras, Pedro.

Analysis of Large and Complex Data. Kluwer Academic Publishers, 2016. p. 43-52.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Linear Storage and Potentially Constant Time Hierarchical Clustering Using the Baire Metric and Random Spinning Paths

AU - Murtagh, Fionn

AU - Contreras, Pedro

PY - 2016/8/4

Y1 - 2016/8/4

N2 - We study how random projections can be used with large data sets in order (1) to cluster the data using a fast, binning approach which is characterized in terms of direct inducing of a hierarchy through use of the Bairemetric; and (2) based on clusters found, selecting subsets of the original data for further analysis. In this work, we focus on random projection that is used for processing high dimensional data. A random projection, outputting a random permutation of the observation set, provides a random spanning path. We show how a spanning path relates to contiguity- or adjacency-constrained clustering.We study performance properties of hierarchical clustering constructed from random spanning paths, and we introduce a novel visualization of the results.

AB - We study how random projections can be used with large data sets in order (1) to cluster the data using a fast, binning approach which is characterized in terms of direct inducing of a hierarchy through use of the Bairemetric; and (2) based on clusters found, selecting subsets of the original data for further analysis. In this work, we focus on random projection that is used for processing high dimensional data. A random projection, outputting a random permutation of the observation set, provides a random spanning path. We show how a spanning path relates to contiguity- or adjacency-constrained clustering.We study performance properties of hierarchical clustering constructed from random spanning paths, and we introduce a novel visualization of the results.

UR - http://www.scopus.com/inward/record.url?scp=84981521246&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-25226-1_4

DO - 10.1007/978-3-319-25226-1_4

M3 - Conference contribution

SN - 9783319252247

SP - 43

EP - 52

BT - Analysis of Large and Complex Data

PB - Kluwer Academic Publishers

ER -