### Abstract

We study how random projections can be used with large data sets in order (1) to cluster the data using a fast, binning approach which is characterized in terms of direct inducing of a hierarchy through use of the Bairemetric; and (2) based on clusters found, selecting subsets of the original data for further analysis. In this work, we focus on random projection that is used for processing high dimensional data. A random projection, outputting a random permutation of the observation set, provides a random spanning path. We show how a spanning path relates to contiguity- or adjacency-constrained clustering.We study performance properties of hierarchical clustering constructed from random spanning paths, and we introduce a novel visualization of the results.

Original language | English |
---|---|

Title of host publication | Analysis of Large and Complex Data |

Publisher | Kluwer Academic Publishers |

Pages | 43-52 |

Number of pages | 10 |

ISBN (Print) | 9783319252247 |

DOIs | |

Publication status | Published - 4 Aug 2016 |

Externally published | Yes |

Event | 2nd European Conference on Data Analysis - Bremen, Germany Duration: 2 Jul 2014 → 4 Jul 2014 Conference number: 2 |

### Conference

Conference | 2nd European Conference on Data Analysis |
---|---|

Abbreviated title | ECDA 2014 |

Country | Germany |

City | Bremen |

Period | 2/07/14 → 4/07/14 |

### Fingerprint

### Cite this

*Analysis of Large and Complex Data*(pp. 43-52). Kluwer Academic Publishers. https://doi.org/10.1007/978-3-319-25226-1_4

}

*Analysis of Large and Complex Data.*Kluwer Academic Publishers, pp. 43-52, 2nd European Conference on Data Analysis, Bremen, Germany, 2/07/14. https://doi.org/10.1007/978-3-319-25226-1_4

**Linear Storage and Potentially Constant Time Hierarchical Clustering Using the Baire Metric and Random Spinning Paths.** / Murtagh, Fionn; Contreras, Pedro.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Linear Storage and Potentially Constant Time Hierarchical Clustering Using the Baire Metric and Random Spinning Paths

AU - Murtagh, Fionn

AU - Contreras, Pedro

PY - 2016/8/4

Y1 - 2016/8/4

N2 - We study how random projections can be used with large data sets in order (1) to cluster the data using a fast, binning approach which is characterized in terms of direct inducing of a hierarchy through use of the Bairemetric; and (2) based on clusters found, selecting subsets of the original data for further analysis. In this work, we focus on random projection that is used for processing high dimensional data. A random projection, outputting a random permutation of the observation set, provides a random spanning path. We show how a spanning path relates to contiguity- or adjacency-constrained clustering.We study performance properties of hierarchical clustering constructed from random spanning paths, and we introduce a novel visualization of the results.

AB - We study how random projections can be used with large data sets in order (1) to cluster the data using a fast, binning approach which is characterized in terms of direct inducing of a hierarchy through use of the Bairemetric; and (2) based on clusters found, selecting subsets of the original data for further analysis. In this work, we focus on random projection that is used for processing high dimensional data. A random projection, outputting a random permutation of the observation set, provides a random spanning path. We show how a spanning path relates to contiguity- or adjacency-constrained clustering.We study performance properties of hierarchical clustering constructed from random spanning paths, and we introduce a novel visualization of the results.

UR - http://www.scopus.com/inward/record.url?scp=84981521246&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-25226-1_4

DO - 10.1007/978-3-319-25226-1_4

M3 - Conference contribution

SN - 9783319252247

SP - 43

EP - 52

BT - Analysis of Large and Complex Data

PB - Kluwer Academic Publishers

ER -