Abstract
We study how random projections can be used with large data sets in order (1) to cluster the data using a fast, binning approach which is characterized in terms of direct inducing of a hierarchy through use of the Bairemetric; and (2) based on clusters found, selecting subsets of the original data for further analysis. In this work, we focus on random projection that is used for processing high dimensional data. A random projection, outputting a random permutation of the observation set, provides a random spanning path. We show how a spanning path relates to contiguity- or adjacency-constrained clustering.We study performance properties of hierarchical clustering constructed from random spanning paths, and we introduce a novel visualization of the results.
Original language | English |
---|---|
Title of host publication | Analysis of Large and Complex Data |
Publisher | Kluwer Academic Publishers |
Pages | 43-52 |
Number of pages | 10 |
ISBN (Print) | 9783319252247 |
DOIs | |
Publication status | Published - 4 Aug 2016 |
Externally published | Yes |
Event | 2nd European Conference on Data Analysis: Analysis of Large and Complex Data - Bremen, Germany Duration: 2 Jul 2014 → 4 Jul 2014 Conference number: 2 https://dblp.org/db/conf/ecda/ecda2014 |
Conference
Conference | 2nd European Conference on Data Analysis |
---|---|
Abbreviated title | ECDA2014 |
Country/Territory | Germany |
City | Bremen |
Period | 2/07/14 → 4/07/14 |
Internet address |