Using Hadoop to implement a semantic method of assessing the quality of research medical datasets

Stephen Bonner, Ibad Kureshi, Grigoris Antoniou, David Corsair, Laura Moss, Illias Tachmazidis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper a system for storing and querying medical RDF data using Hadoop is developed. This approach enables us to create an inherently parallel framework that will scale the workload across a cluster. Unlike existing solutions, our framework uses highly optimised joining strategies to enable the completion of eight separate SPAQL queries, comprised of over eighty distinct joins, in only two Map/Reduce iterations. Results are presented comparing an optimised version of our solution against Jena TDB, demonstrating the superior performance of our system and its viability for assessing the quality of medical data.

Original languageEnglish
Title of host publicationProceedings of the 3rd ASE International Conference on Big Data Science and Computing, BIGDATASCIENCE 2014
PublisherAssociation for Computing Machinery (ACM)
Volume04-07-August-2014
ISBN (Electronic)9781450328913
DOIs
Publication statusPublished - 4 Aug 2014
Event3rd ASE International Conference on Big Data Science and Computing - Beijing, China
Duration: 4 Aug 20147 Aug 2014
Conference number: 3
http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=36881&copyownerid=62784 (Link to Conference Website)

Conference

Conference3rd ASE International Conference on Big Data Science and Computing
Abbreviated titleBigDataScience2014
CountryChina
CityBeijing
Period4/08/147/08/14
Internet address

Fingerprint

Joining
Semantics

Cite this

Bonner, S., Kureshi, I., Antoniou, G., Corsair, D., Moss, L., & Tachmazidis, I. (2014). Using Hadoop to implement a semantic method of assessing the quality of research medical datasets. In Proceedings of the 3rd ASE International Conference on Big Data Science and Computing, BIGDATASCIENCE 2014 (Vol. 04-07-August-2014). [2644163] Association for Computing Machinery (ACM). https://doi.org/10.1145/2640087.2644163
Bonner, Stephen ; Kureshi, Ibad ; Antoniou, Grigoris ; Corsair, David ; Moss, Laura ; Tachmazidis, Illias. / Using Hadoop to implement a semantic method of assessing the quality of research medical datasets. Proceedings of the 3rd ASE International Conference on Big Data Science and Computing, BIGDATASCIENCE 2014. Vol. 04-07-August-2014 Association for Computing Machinery (ACM), 2014.
@inproceedings{dcb46aa2c8ee40bf92b1bd43c53227eb,
title = "Using Hadoop to implement a semantic method of assessing the quality of research medical datasets",
abstract = "In this paper a system for storing and querying medical RDF data using Hadoop is developed. This approach enables us to create an inherently parallel framework that will scale the workload across a cluster. Unlike existing solutions, our framework uses highly optimised joining strategies to enable the completion of eight separate SPAQL queries, comprised of over eighty distinct joins, in only two Map/Reduce iterations. Results are presented comparing an optimised version of our solution against Jena TDB, demonstrating the superior performance of our system and its viability for assessing the quality of medical data.",
keywords = "Error checking, Hadoop, Map/reduce, Medical data, RDF, SPARQL",
author = "Stephen Bonner and Ibad Kureshi and Grigoris Antoniou and David Corsair and Laura Moss and Illias Tachmazidis",
year = "2014",
month = "8",
day = "4",
doi = "10.1145/2640087.2644163",
language = "English",
volume = "04-07-August-2014",
booktitle = "Proceedings of the 3rd ASE International Conference on Big Data Science and Computing, BIGDATASCIENCE 2014",
publisher = "Association for Computing Machinery (ACM)",
address = "United States",

}

Bonner, S, Kureshi, I, Antoniou, G, Corsair, D, Moss, L & Tachmazidis, I 2014, Using Hadoop to implement a semantic method of assessing the quality of research medical datasets. in Proceedings of the 3rd ASE International Conference on Big Data Science and Computing, BIGDATASCIENCE 2014. vol. 04-07-August-2014, 2644163, Association for Computing Machinery (ACM), 3rd ASE International Conference on Big Data Science and Computing, Beijing, China, 4/08/14. https://doi.org/10.1145/2640087.2644163

Using Hadoop to implement a semantic method of assessing the quality of research medical datasets. / Bonner, Stephen; Kureshi, Ibad; Antoniou, Grigoris; Corsair, David; Moss, Laura; Tachmazidis, Illias.

Proceedings of the 3rd ASE International Conference on Big Data Science and Computing, BIGDATASCIENCE 2014. Vol. 04-07-August-2014 Association for Computing Machinery (ACM), 2014. 2644163.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Using Hadoop to implement a semantic method of assessing the quality of research medical datasets

AU - Bonner, Stephen

AU - Kureshi, Ibad

AU - Antoniou, Grigoris

AU - Corsair, David

AU - Moss, Laura

AU - Tachmazidis, Illias

PY - 2014/8/4

Y1 - 2014/8/4

N2 - In this paper a system for storing and querying medical RDF data using Hadoop is developed. This approach enables us to create an inherently parallel framework that will scale the workload across a cluster. Unlike existing solutions, our framework uses highly optimised joining strategies to enable the completion of eight separate SPAQL queries, comprised of over eighty distinct joins, in only two Map/Reduce iterations. Results are presented comparing an optimised version of our solution against Jena TDB, demonstrating the superior performance of our system and its viability for assessing the quality of medical data.

AB - In this paper a system for storing and querying medical RDF data using Hadoop is developed. This approach enables us to create an inherently parallel framework that will scale the workload across a cluster. Unlike existing solutions, our framework uses highly optimised joining strategies to enable the completion of eight separate SPAQL queries, comprised of over eighty distinct joins, in only two Map/Reduce iterations. Results are presented comparing an optimised version of our solution against Jena TDB, demonstrating the superior performance of our system and its viability for assessing the quality of medical data.

KW - Error checking

KW - Hadoop

KW - Map/reduce

KW - Medical data

KW - RDF

KW - SPARQL

UR - http://www.scopus.com/inward/record.url?scp=84985911894&partnerID=8YFLogxK

U2 - 10.1145/2640087.2644163

DO - 10.1145/2640087.2644163

M3 - Conference contribution

VL - 04-07-August-2014

BT - Proceedings of the 3rd ASE International Conference on Big Data Science and Computing, BIGDATASCIENCE 2014

PB - Association for Computing Machinery (ACM)

ER -

Bonner S, Kureshi I, Antoniou G, Corsair D, Moss L, Tachmazidis I. Using Hadoop to implement a semantic method of assessing the quality of research medical datasets. In Proceedings of the 3rd ASE International Conference on Big Data Science and Computing, BIGDATASCIENCE 2014. Vol. 04-07-August-2014. Association for Computing Machinery (ACM). 2014. 2644163 https://doi.org/10.1145/2640087.2644163