Abstract
In this paper a system for storing and querying medical RDF data using Hadoop is developed. This approach enables us to create an inherently parallel framework that will scale the workload across a cluster. Unlike existing solutions, our framework uses highly optimised joining strategies to enable the completion of eight separate SPAQL queries, comprised of over eighty distinct joins, in only two Map/Reduce iterations. Results are presented comparing an optimised version of our solution against Jena TDB, demonstrating the superior performance of our system and its viability for assessing the quality of medical data.
Original language | English |
---|---|
Title of host publication | Proceedings of the 3rd ASE International Conference on Big Data Science and Computing, BIGDATASCIENCE 2014 |
Publisher | Association for Computing Machinery (ACM) |
Volume | 04-07-August-2014 |
ISBN (Electronic) | 9781450328913 |
DOIs | |
Publication status | Published - 4 Aug 2014 |
Event | 3rd ASE International Conference on Big Data Science and Computing - Beijing, China Duration: 4 Aug 2014 → 7 Aug 2014 Conference number: 3 http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=36881©ownerid=62784 (Link to Conference Website) |
Conference
Conference | 3rd ASE International Conference on Big Data Science and Computing |
---|---|
Abbreviated title | BigDataScience2014 |
Country/Territory | China |
City | Beijing |
Period | 4/08/14 → 7/08/14 |
Internet address |
|