Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets

Fionn Murtagh, Pedro Contreras

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Citation (Scopus)

Abstract

Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. "Structure" can be understood as symmetry and a range of symmetries are expressed by hierarchy. Such symmetries directly point to invariants, that pinpoint intrinsic properties of the data and of the background empirical domain of interest. We review many aspects of hierarchy here, including ultrametric topology, generalized ultrametric, linkages with lattices and other discrete algebraic structures and with p-adic number representations. By focusing on symmetries in data we have a powerful means of structuring and analyzing massive, high dimensional data stores. We illustrate the powerfulness of hierarchical clustering in case studies in chemistry and finance, and we provide pointers to other published case studies.

LanguageEnglish
Title of host publicationData Mining
Subtitle of host publicationFoundations and Intelligent Paradigms
EditorsDawn Holmes, Lakhmi Jain
PublisherSpringer Verlag
Chapter5
Pages95-130
Number of pages36
Volume1: Clustering, Association and Classification
ISBN (Electronic)9783642231667
ISBN (Print)9783642231650
DOIs
Publication statusPublished - 1 Dec 2012
Externally publishedYes

Publication series

NameIntelligent Systems Reference Library
Volume23
ISSN (Print)1868-4394
ISSN (Electronic)1868-4408

Fingerprint

Finance
Data mining
Topology
finance
data analysis
chemistry
Symmetry
Hierarchical clustering

Cite this

Murtagh, F., & Contreras, P. (2012). Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets. In D. Holmes, & L. Jain (Eds.), Data Mining: Foundations and Intelligent Paradigms (Vol. 1: Clustering, Association and Classification, pp. 95-130). (Intelligent Systems Reference Library; Vol. 23). Springer Verlag. https://doi.org/10.1007/978-3-642-23166-7_5
Murtagh, Fionn ; Contreras, Pedro. / Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets. Data Mining: Foundations and Intelligent Paradigms. editor / Dawn Holmes ; Lakhmi Jain. Vol. 1: Clustering, Association and Classification Springer Verlag, 2012. pp. 95-130 (Intelligent Systems Reference Library).
@inbook{6d284f3b52ea48388a72f673db2f4e78,
title = "Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets",
abstract = "Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. {"}Structure{"} can be understood as symmetry and a range of symmetries are expressed by hierarchy. Such symmetries directly point to invariants, that pinpoint intrinsic properties of the data and of the background empirical domain of interest. We review many aspects of hierarchy here, including ultrametric topology, generalized ultrametric, linkages with lattices and other discrete algebraic structures and with p-adic number representations. By focusing on symmetries in data we have a powerful means of structuring and analyzing massive, high dimensional data stores. We illustrate the powerfulness of hierarchical clustering in case studies in chemistry and finance, and we provide pointers to other published case studies.",
keywords = "clustering, complexity, Data analytics, hierarchy, information storage and retrieval, multivariate data analysis, p-adic, pattern recognition, ultrametric topology",
author = "Fionn Murtagh and Pedro Contreras",
year = "2012",
month = "12",
day = "1",
doi = "10.1007/978-3-642-23166-7_5",
language = "English",
isbn = "9783642231650",
volume = "1: Clustering, Association and Classification",
series = "Intelligent Systems Reference Library",
publisher = "Springer Verlag",
pages = "95--130",
editor = "Dawn Holmes and Lakhmi Jain",
booktitle = "Data Mining",

}

Murtagh, F & Contreras, P 2012, Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets. in D Holmes & L Jain (eds), Data Mining: Foundations and Intelligent Paradigms. vol. 1: Clustering, Association and Classification, Intelligent Systems Reference Library, vol. 23, Springer Verlag, pp. 95-130. https://doi.org/10.1007/978-3-642-23166-7_5

Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets. / Murtagh, Fionn; Contreras, Pedro.

Data Mining: Foundations and Intelligent Paradigms. ed. / Dawn Holmes; Lakhmi Jain. Vol. 1: Clustering, Association and Classification Springer Verlag, 2012. p. 95-130 (Intelligent Systems Reference Library; Vol. 23).

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets

AU - Murtagh, Fionn

AU - Contreras, Pedro

PY - 2012/12/1

Y1 - 2012/12/1

N2 - Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. "Structure" can be understood as symmetry and a range of symmetries are expressed by hierarchy. Such symmetries directly point to invariants, that pinpoint intrinsic properties of the data and of the background empirical domain of interest. We review many aspects of hierarchy here, including ultrametric topology, generalized ultrametric, linkages with lattices and other discrete algebraic structures and with p-adic number representations. By focusing on symmetries in data we have a powerful means of structuring and analyzing massive, high dimensional data stores. We illustrate the powerfulness of hierarchical clustering in case studies in chemistry and finance, and we provide pointers to other published case studies.

AB - Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. "Structure" can be understood as symmetry and a range of symmetries are expressed by hierarchy. Such symmetries directly point to invariants, that pinpoint intrinsic properties of the data and of the background empirical domain of interest. We review many aspects of hierarchy here, including ultrametric topology, generalized ultrametric, linkages with lattices and other discrete algebraic structures and with p-adic number representations. By focusing on symmetries in data we have a powerful means of structuring and analyzing massive, high dimensional data stores. We illustrate the powerfulness of hierarchical clustering in case studies in chemistry and finance, and we provide pointers to other published case studies.

KW - clustering

KW - complexity

KW - Data analytics

KW - hierarchy

KW - information storage and retrieval

KW - multivariate data analysis

KW - p-adic

KW - pattern recognition

KW - ultrametric topology

UR - http://www.scopus.com/inward/record.url?scp=84885621983&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-23166-7_5

DO - 10.1007/978-3-642-23166-7_5

M3 - Chapter

SN - 9783642231650

VL - 1: Clustering, Association and Classification

T3 - Intelligent Systems Reference Library

SP - 95

EP - 130

BT - Data Mining

A2 - Holmes, Dawn

A2 - Jain, Lakhmi

PB - Springer Verlag

ER -

Murtagh F, Contreras P. Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets. In Holmes D, Jain L, editors, Data Mining: Foundations and Intelligent Paradigms. Vol. 1: Clustering, Association and Classification. Springer Verlag. 2012. p. 95-130. (Intelligent Systems Reference Library). https://doi.org/10.1007/978-3-642-23166-7_5