Hierarchical Matching and Regression with Application to Photometric Redshift Estimation

Research output: Contribution to journalArticle

Abstract

This work emphasizes that heterogeneity, diversity, discontinuity, and discreteness in data is to be exploited in classification and regression problems. A global a priori model may not be desirable. For data analytics in cosmology, this is motivated by the variety of cosmological objects such as elliptical, spiral, active, and merging galaxies at a wide range of redshifts. Our aim is matching and similarity-based analytics that takes account of discrete relationships in the data. The information structure of the data is represented by a hierarchy or tree where the branch structure, rather than just the proximity, is important. The representation is related to p-adic number theory. The clustering or binning of the data values, related to the precision of the measurements, has a central role in this methodology. If used for regression, our approach is a method of cluster-wise regression, generalizing nearest neighbour regression. Both to exemplify this analytics approach, and to demonstrate computational benefits, we address the well-known photometric redshift or 'photo-z' problem, seeking to match Sloan Digital Sky Survey (SDSS) spectroscopic and photometric redshifts.

Original languageEnglish
Pages (from-to)145-155
Number of pages11
JournalProceedings of the International Astronomical Union
Volume12
Issue numberS325
DOIs
Publication statusPublished - 1 Oct 2016
Externally publishedYes

Fingerprint

regression analysis
number theory
active galaxies
elliptical galaxies
spiral galaxies
hierarchies
cosmology
proximity
discontinuity
methodology
galaxies

Cite this

@article{0663ecdbf624400d93c4fca1277d1227,
title = "Hierarchical Matching and Regression with Application to Photometric Redshift Estimation",
abstract = "This work emphasizes that heterogeneity, diversity, discontinuity, and discreteness in data is to be exploited in classification and regression problems. A global a priori model may not be desirable. For data analytics in cosmology, this is motivated by the variety of cosmological objects such as elliptical, spiral, active, and merging galaxies at a wide range of redshifts. Our aim is matching and similarity-based analytics that takes account of discrete relationships in the data. The information structure of the data is represented by a hierarchy or tree where the branch structure, rather than just the proximity, is important. The representation is related to p-adic number theory. The clustering or binning of the data values, related to the precision of the measurements, has a central role in this methodology. If used for regression, our approach is a method of cluster-wise regression, generalizing nearest neighbour regression. Both to exemplify this analytics approach, and to demonstrate computational benefits, we address the well-known photometric redshift or 'photo-z' problem, seeking to match Sloan Digital Sky Survey (SDSS) spectroscopic and photometric redshifts.",
keywords = "Cluster-wise regression, inherent hierarchical properties of data, p-adic and m-adic number representation",
author = "Fionn Murtagh",
year = "2016",
month = "10",
day = "1",
doi = "10.1017/S1743921317001569",
language = "English",
volume = "12",
pages = "145--155",
journal = "Proceedings of the International Astronomical Union",
issn = "1743-9213",
publisher = "Cambridge University Press",
number = "S325",

}

Hierarchical Matching and Regression with Application to Photometric Redshift Estimation. / Murtagh, Fionn.

In: Proceedings of the International Astronomical Union, Vol. 12, No. S325, 01.10.2016, p. 145-155.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Hierarchical Matching and Regression with Application to Photometric Redshift Estimation

AU - Murtagh, Fionn

PY - 2016/10/1

Y1 - 2016/10/1

N2 - This work emphasizes that heterogeneity, diversity, discontinuity, and discreteness in data is to be exploited in classification and regression problems. A global a priori model may not be desirable. For data analytics in cosmology, this is motivated by the variety of cosmological objects such as elliptical, spiral, active, and merging galaxies at a wide range of redshifts. Our aim is matching and similarity-based analytics that takes account of discrete relationships in the data. The information structure of the data is represented by a hierarchy or tree where the branch structure, rather than just the proximity, is important. The representation is related to p-adic number theory. The clustering or binning of the data values, related to the precision of the measurements, has a central role in this methodology. If used for regression, our approach is a method of cluster-wise regression, generalizing nearest neighbour regression. Both to exemplify this analytics approach, and to demonstrate computational benefits, we address the well-known photometric redshift or 'photo-z' problem, seeking to match Sloan Digital Sky Survey (SDSS) spectroscopic and photometric redshifts.

AB - This work emphasizes that heterogeneity, diversity, discontinuity, and discreteness in data is to be exploited in classification and regression problems. A global a priori model may not be desirable. For data analytics in cosmology, this is motivated by the variety of cosmological objects such as elliptical, spiral, active, and merging galaxies at a wide range of redshifts. Our aim is matching and similarity-based analytics that takes account of discrete relationships in the data. The information structure of the data is represented by a hierarchy or tree where the branch structure, rather than just the proximity, is important. The representation is related to p-adic number theory. The clustering or binning of the data values, related to the precision of the measurements, has a central role in this methodology. If used for regression, our approach is a method of cluster-wise regression, generalizing nearest neighbour regression. Both to exemplify this analytics approach, and to demonstrate computational benefits, we address the well-known photometric redshift or 'photo-z' problem, seeking to match Sloan Digital Sky Survey (SDSS) spectroscopic and photometric redshifts.

KW - Cluster-wise regression

KW - inherent hierarchical properties of data

KW - p-adic and m-adic number representation

UR - http://www.scopus.com/inward/record.url?scp=85020011396&partnerID=8YFLogxK

U2 - 10.1017/S1743921317001569

DO - 10.1017/S1743921317001569

M3 - Article

VL - 12

SP - 145

EP - 155

JO - Proceedings of the International Astronomical Union

JF - Proceedings of the International Astronomical Union

SN - 1743-9213

IS - S325

ER -