Sparse p-adic data coding for computationally efficient and effective big data analytics

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

We develop the theory and practical implementation of p-adic sparse coding of data. Rather than the standard, sparsifying criterion that uses the L0 pseudo-norm, we use the p-adic norm.We require that the hierarchy or tree be node-ranked, as is standard practice in agglomerative and other hierarchical clustering, but not necessarily with decision trees. In order to structure the data, all computational processing operations are direct reading of the data, or are bounded by a constant number of direct readings of the data, implying linear computational time. Through p-adic sparse data coding, efficient storage results, and for bounded p-adic norm stored data, search and retrieval are constant time operations. Examples show the effectiveness of this new approach to content-driven encoding and displaying of data.

Original languageEnglish
Pages (from-to)236-247
Number of pages12
JournalP-Adic Numbers, Ultrametric Analysis, and Applications
Volume8
Issue number3
DOIs
Publication statusPublished - 1 Jul 2016
Externally publishedYes

Fingerprint

P-adic
Coding
Norm
Sparse Coding
Sparse Data
Hierarchical Clustering
Time Constant
Decision tree
Retrieval
Encoding
Vertex of a graph

Cite this

@article{91fae37cf9344d70a327c03508250ddb,
title = "Sparse p-adic data coding for computationally efficient and effective big data analytics",
abstract = "We develop the theory and practical implementation of p-adic sparse coding of data. Rather than the standard, sparsifying criterion that uses the L0 pseudo-norm, we use the p-adic norm.We require that the hierarchy or tree be node-ranked, as is standard practice in agglomerative and other hierarchical clustering, but not necessarily with decision trees. In order to structure the data, all computational processing operations are direct reading of the data, or are bounded by a constant number of direct readings of the data, implying linear computational time. Through p-adic sparse data coding, efficient storage results, and for bounded p-adic norm stored data, search and retrieval are constant time operations. Examples show the effectiveness of this new approach to content-driven encoding and displaying of data.",
keywords = "big data, binary rooted tree, computational and storage complexity, hierarchical clustering, p-adic numbers, ultrametric topology",
author = "F. Murtagh",
year = "2016",
month = "7",
day = "1",
doi = "10.1134/S2070046616030055",
language = "English",
volume = "8",
pages = "236--247",
journal = "P-Adic Numbers, Ultrametric Analysis, and Applications",
issn = "2070-0466",
publisher = "Springer Science + Business Media",
number = "3",

}

TY - JOUR

T1 - Sparse p-adic data coding for computationally efficient and effective big data analytics

AU - Murtagh, F.

PY - 2016/7/1

Y1 - 2016/7/1

N2 - We develop the theory and practical implementation of p-adic sparse coding of data. Rather than the standard, sparsifying criterion that uses the L0 pseudo-norm, we use the p-adic norm.We require that the hierarchy or tree be node-ranked, as is standard practice in agglomerative and other hierarchical clustering, but not necessarily with decision trees. In order to structure the data, all computational processing operations are direct reading of the data, or are bounded by a constant number of direct readings of the data, implying linear computational time. Through p-adic sparse data coding, efficient storage results, and for bounded p-adic norm stored data, search and retrieval are constant time operations. Examples show the effectiveness of this new approach to content-driven encoding and displaying of data.

AB - We develop the theory and practical implementation of p-adic sparse coding of data. Rather than the standard, sparsifying criterion that uses the L0 pseudo-norm, we use the p-adic norm.We require that the hierarchy or tree be node-ranked, as is standard practice in agglomerative and other hierarchical clustering, but not necessarily with decision trees. In order to structure the data, all computational processing operations are direct reading of the data, or are bounded by a constant number of direct readings of the data, implying linear computational time. Through p-adic sparse data coding, efficient storage results, and for bounded p-adic norm stored data, search and retrieval are constant time operations. Examples show the effectiveness of this new approach to content-driven encoding and displaying of data.

KW - big data

KW - binary rooted tree

KW - computational and storage complexity

KW - hierarchical clustering

KW - p-adic numbers

KW - ultrametric topology

UR - http://www.scopus.com/inward/record.url?scp=84981716488&partnerID=8YFLogxK

U2 - 10.1134/S2070046616030055

DO - 10.1134/S2070046616030055

M3 - Article

VL - 8

SP - 236

EP - 247

JO - P-Adic Numbers, Ultrametric Analysis, and Applications

JF - P-Adic Numbers, Ultrametric Analysis, and Applications

SN - 2070-0466

IS - 3

ER -