Overcoming the curse of dimensionality in clustering by means of the wavelet transform

Fionn Murtagh, Jean Luc Starck, Michael W. Berry

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

We use a redundant wavelet transform analysis to detect clusters in high-dimensional data spaces. We overcome Bellman's `curse of dimensionality' in such problems by (i) using some canonical ordering of observation and variable (document and term) dimensions in our data, (ii) applying a wavelet transform to such canonically ordered data, (iii) modelling the noise in wavelet space, (iv) defining significant component parts of the data as opposed to insignificant or noisy component parts, and (v) reading off the resultant clusters. The overall complexity of this innovative approach is linear in the data dimensionality. We describe a number of examples and test cases, including the clustering of high-dimensional hypertext data.

LanguageEnglish
Pages107-120
Number of pages14
JournalComputer Journal
Volume43
Issue number2
DOIs
Publication statusPublished - 1 Jan 2000
Externally publishedYes

Fingerprint

Wavelet transforms
Data structures

Cite this

Murtagh, Fionn ; Starck, Jean Luc ; Berry, Michael W. / Overcoming the curse of dimensionality in clustering by means of the wavelet transform. In: Computer Journal. 2000 ; Vol. 43, No. 2. pp. 107-120.
@article{ea92785b20114d91b5936a45199a28fb,
title = "Overcoming the curse of dimensionality in clustering by means of the wavelet transform",
abstract = "We use a redundant wavelet transform analysis to detect clusters in high-dimensional data spaces. We overcome Bellman's `curse of dimensionality' in such problems by (i) using some canonical ordering of observation and variable (document and term) dimensions in our data, (ii) applying a wavelet transform to such canonically ordered data, (iii) modelling the noise in wavelet space, (iv) defining significant component parts of the data as opposed to insignificant or noisy component parts, and (v) reading off the resultant clusters. The overall complexity of this innovative approach is linear in the data dimensionality. We describe a number of examples and test cases, including the clustering of high-dimensional hypertext data.",
author = "Fionn Murtagh and Starck, {Jean Luc} and Berry, {Michael W.}",
year = "2000",
month = "1",
day = "1",
doi = "10.1093/comjnl/43.2.107",
language = "English",
volume = "43",
pages = "107--120",
journal = "Computer Journal",
issn = "0010-4620",
publisher = "Oxford University Press",
number = "2",

}

Overcoming the curse of dimensionality in clustering by means of the wavelet transform. / Murtagh, Fionn; Starck, Jean Luc; Berry, Michael W.

In: Computer Journal, Vol. 43, No. 2, 01.01.2000, p. 107-120.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Overcoming the curse of dimensionality in clustering by means of the wavelet transform

AU - Murtagh, Fionn

AU - Starck, Jean Luc

AU - Berry, Michael W.

PY - 2000/1/1

Y1 - 2000/1/1

N2 - We use a redundant wavelet transform analysis to detect clusters in high-dimensional data spaces. We overcome Bellman's `curse of dimensionality' in such problems by (i) using some canonical ordering of observation and variable (document and term) dimensions in our data, (ii) applying a wavelet transform to such canonically ordered data, (iii) modelling the noise in wavelet space, (iv) defining significant component parts of the data as opposed to insignificant or noisy component parts, and (v) reading off the resultant clusters. The overall complexity of this innovative approach is linear in the data dimensionality. We describe a number of examples and test cases, including the clustering of high-dimensional hypertext data.

AB - We use a redundant wavelet transform analysis to detect clusters in high-dimensional data spaces. We overcome Bellman's `curse of dimensionality' in such problems by (i) using some canonical ordering of observation and variable (document and term) dimensions in our data, (ii) applying a wavelet transform to such canonically ordered data, (iii) modelling the noise in wavelet space, (iv) defining significant component parts of the data as opposed to insignificant or noisy component parts, and (v) reading off the resultant clusters. The overall complexity of this innovative approach is linear in the data dimensionality. We describe a number of examples and test cases, including the clustering of high-dimensional hypertext data.

UR - http://www.scopus.com/inward/record.url?scp=0033750892&partnerID=8YFLogxK

U2 - 10.1093/comjnl/43.2.107

DO - 10.1093/comjnl/43.2.107

M3 - Article

VL - 43

SP - 107

EP - 120

JO - Computer Journal

T2 - Computer Journal

JF - Computer Journal

SN - 0010-4620

IS - 2

ER -