Fast, linear time, m-adic hierarchical clustering for search and retrieval using the Baire metric, with linkages to generalized ultrametrics, hashing, formal concept analysis, and precision of data measurement

F. Murtagh, P. Contreras

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

We describe many vantage points on the Baire metric and its use in clustering data, or its use in preprocessing and structuring data in order to support search and retrieval operations. In some cases, we proceed directly to clusters and do not directly determine the distances. We show how a hierarchical clustering can be read directly from one pass through the data. We offer insights also on practical implications of precision of datameasurement. As a mechanism for treating multidimensional data, including very high dimensional data, we use random projections.

Original languageEnglish
Pages (from-to)46-56
Number of pages11
JournalP-Adic Numbers, Ultrametric Analysis, and Applications
Volume4
Issue number1
DOIs
Publication statusPublished - 1 Jan 2012
Externally publishedYes

Fingerprint

Formal Concept Analysis
Hashing
Hierarchical Clustering
Linkage
Linear Time
Retrieval
Random Projection
Metric
Multidimensional Data
Data Clustering
High-dimensional Data
Preprocessing

Cite this

@article{5f088524b08e4159bb67395a27d2d8db,
title = "Fast, linear time, m-adic hierarchical clustering for search and retrieval using the Baire metric, with linkages to generalized ultrametrics, hashing, formal concept analysis, and precision of data measurement",
abstract = "We describe many vantage points on the Baire metric and its use in clustering data, or its use in preprocessing and structuring data in order to support search and retrieval operations. In some cases, we proceed directly to clusters and do not directly determine the distances. We show how a hierarchical clustering can be read directly from one pass through the data. We offer insights also on practical implications of precision of datameasurement. As a mechanism for treating multidimensional data, including very high dimensional data, we use random projections.",
keywords = "algorithm, clustering, computational complexity, data analysis, metric, retrieval, search, storage, ultrametric",
author = "F. Murtagh and P. Contreras",
year = "2012",
month = "1",
day = "1",
doi = "10.1134/S2070046612010062",
language = "English",
volume = "4",
pages = "46--56",
journal = "P-Adic Numbers, Ultrametric Analysis, and Applications",
issn = "2070-0466",
publisher = "Springer Science + Business Media",
number = "1",

}

TY - JOUR

T1 - Fast, linear time, m-adic hierarchical clustering for search and retrieval using the Baire metric, with linkages to generalized ultrametrics, hashing, formal concept analysis, and precision of data measurement

AU - Murtagh, F.

AU - Contreras, P.

PY - 2012/1/1

Y1 - 2012/1/1

N2 - We describe many vantage points on the Baire metric and its use in clustering data, or its use in preprocessing and structuring data in order to support search and retrieval operations. In some cases, we proceed directly to clusters and do not directly determine the distances. We show how a hierarchical clustering can be read directly from one pass through the data. We offer insights also on practical implications of precision of datameasurement. As a mechanism for treating multidimensional data, including very high dimensional data, we use random projections.

AB - We describe many vantage points on the Baire metric and its use in clustering data, or its use in preprocessing and structuring data in order to support search and retrieval operations. In some cases, we proceed directly to clusters and do not directly determine the distances. We show how a hierarchical clustering can be read directly from one pass through the data. We offer insights also on practical implications of precision of datameasurement. As a mechanism for treating multidimensional data, including very high dimensional data, we use random projections.

KW - algorithm

KW - clustering

KW - computational complexity

KW - data analysis

KW - metric

KW - retrieval

KW - search

KW - storage

KW - ultrametric

UR - http://www.scopus.com/inward/record.url?scp=84966728646&partnerID=8YFLogxK

U2 - 10.1134/S2070046612010062

DO - 10.1134/S2070046612010062

M3 - Article

VL - 4

SP - 46

EP - 56

JO - P-Adic Numbers, Ultrametric Analysis, and Applications

JF - P-Adic Numbers, Ultrametric Analysis, and Applications

SN - 2070-0466

IS - 1

ER -