Using Hadoop to implement a semantic method of assessing the quality of research medical datasets

Stephen Bonner, Ibad Kureshi, Grigoris Antoniou, David Corsair, Laura Moss, Illias Tachmazidis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper a system for storing and querying medical RDF data using Hadoop is developed. This approach enables us to create an inherently parallel framework that will scale the workload across a cluster. Unlike existing solutions, our framework uses highly optimised joining strategies to enable the completion of eight separate SPAQL queries, comprised of over eighty distinct joins, in only two Map/Reduce iterations. Results are presented comparing an optimised version of our solution against Jena TDB, demonstrating the superior performance of our system and its viability for assessing the quality of medical data.

Original languageEnglish
Title of host publicationProceedings of the 3rd ASE International Conference on Big Data Science and Computing, BIGDATASCIENCE 2014
PublisherAssociation for Computing Machinery (ACM)
Volume04-07-August-2014
ISBN (Electronic)9781450328913
DOIs
Publication statusPublished - 4 Aug 2014
Event3rd ASE International Conference on Big Data Science and Computing - Beijing, China
Duration: 4 Aug 20147 Aug 2014
Conference number: 3
http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=36881&copyownerid=62784 (Link to Conference Website)

Conference

Conference3rd ASE International Conference on Big Data Science and Computing
Abbreviated titleBigDataScience2014
Country/TerritoryChina
CityBeijing
Period4/08/147/08/14
Internet address

Fingerprint

Dive into the research topics of 'Using Hadoop to implement a semantic method of assessing the quality of research medical datasets'. Together they form a unique fingerprint.

Cite this