Identifying and caching hot triples for efficient RDF query processing

Wei Emma Zhang, Quan Z. Sheng, Kerry Taylor, Yongrui Qin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Resource Description Framework (RDF) has been used as a general model for conceptual description and information modelling. As the growing number and volume of RDF datasets emerged recently, many techniques have been developed for accelerating the query answering process on triple stores, which handle large-scale RDF data. Caching is one of the popular solutions. Non-RDBMS based triple stores, which leverage the intrinsic nature of RDF graphs, are emerging and attracting more research attention in recent years. However, as their fundamental structure is different from RDBMS triple stores, they can not leverage the RDBMS caching mechanism. In this paper, we develop a time-aware frequency based caching algorithm to address this issue. Our approach retrieves the accessed triples by analyzing and expanding previous queries and collects most frequently accessed triples by evaluating their access frequencies using Exponential Smoothing, a forecasting method. We evaluate our approach using real world queries from a publicly available SPARQL endpoint. Our theoretical analysis and empirical results show that the proposed approach outperforms the state-of-the-art approaches with higher hit rates.

LanguageEnglish
Title of host publicationDatabase Systems for Advanced Applications
Subtitle of host publication20th International Conference, DASFAA 2015, Hanoi, Vietnam, April 20-23, 2015, Proceedings, Part II
EditorsMatthias Renz, Cyrus Shahabi, Xiaofang Zhou, Muhammad Aamir Cheema
PublisherSpringer Verlag
Pages259-274
Number of pages16
ISBN (Electronic)9783319181233
ISBN (Print)9783319181226
DOIs
Publication statusPublished - 9 Apr 2015
Externally publishedYes
Event20th International Conference on Database Systems for Advanced Applications - Hanoi, Viet Nam
Duration: 20 Apr 201523 Apr 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9050
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th International Conference on Database Systems for Advanced Applications
Abbreviated titleDASFAA 2015
CountryViet Nam
CityHanoi
Period20/04/1523/04/15

Fingerprint

Data description
Query processing
Caching
Query Processing
Resources
Query
Leverage
Exponential Smoothing
SPARQL
Hits
Forecasting
Theoretical Analysis
Framework
Evaluate
Graph in graph theory
Modeling

Cite this

Zhang, W. E., Sheng, Q. Z., Taylor, K., & Qin, Y. (2015). Identifying and caching hot triples for efficient RDF query processing. In M. Renz, C. Shahabi, X. Zhou, & M. A. Cheema (Eds.), Database Systems for Advanced Applications: 20th International Conference, DASFAA 2015, Hanoi, Vietnam, April 20-23, 2015, Proceedings, Part II (pp. 259-274). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9050). Springer Verlag. https://doi.org/10.1007/978-3-319-18123-3_16
Zhang, Wei Emma ; Sheng, Quan Z. ; Taylor, Kerry ; Qin, Yongrui. / Identifying and caching hot triples for efficient RDF query processing. Database Systems for Advanced Applications: 20th International Conference, DASFAA 2015, Hanoi, Vietnam, April 20-23, 2015, Proceedings, Part II. editor / Matthias Renz ; Cyrus Shahabi ; Xiaofang Zhou ; Muhammad Aamir Cheema. Springer Verlag, 2015. pp. 259-274 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{70a8215c7f4c41dba7cfbf89fbf32582,
title = "Identifying and caching hot triples for efficient RDF query processing",
abstract = "Resource Description Framework (RDF) has been used as a general model for conceptual description and information modelling. As the growing number and volume of RDF datasets emerged recently, many techniques have been developed for accelerating the query answering process on triple stores, which handle large-scale RDF data. Caching is one of the popular solutions. Non-RDBMS based triple stores, which leverage the intrinsic nature of RDF graphs, are emerging and attracting more research attention in recent years. However, as their fundamental structure is different from RDBMS triple stores, they can not leverage the RDBMS caching mechanism. In this paper, we develop a time-aware frequency based caching algorithm to address this issue. Our approach retrieves the accessed triples by analyzing and expanding previous queries and collects most frequently accessed triples by evaluating their access frequencies using Exponential Smoothing, a forecasting method. We evaluate our approach using real world queries from a publicly available SPARQL endpoint. Our theoretical analysis and empirical results show that the proposed approach outperforms the state-of-the-art approaches with higher hit rates.",
keywords = "Caching, Exponential smoothing, Query expansion, RDF",
author = "Zhang, {Wei Emma} and Sheng, {Quan Z.} and Kerry Taylor and Yongrui Qin",
year = "2015",
month = "4",
day = "9",
doi = "10.1007/978-3-319-18123-3_16",
language = "English",
isbn = "9783319181226",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "259--274",
editor = "Matthias Renz and Cyrus Shahabi and Xiaofang Zhou and Cheema, {Muhammad Aamir}",
booktitle = "Database Systems for Advanced Applications",

}

Zhang, WE, Sheng, QZ, Taylor, K & Qin, Y 2015, Identifying and caching hot triples for efficient RDF query processing. in M Renz, C Shahabi, X Zhou & MA Cheema (eds), Database Systems for Advanced Applications: 20th International Conference, DASFAA 2015, Hanoi, Vietnam, April 20-23, 2015, Proceedings, Part II. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9050, Springer Verlag, pp. 259-274, 20th International Conference on Database Systems for Advanced Applications, Hanoi, Viet Nam, 20/04/15. https://doi.org/10.1007/978-3-319-18123-3_16

Identifying and caching hot triples for efficient RDF query processing. / Zhang, Wei Emma; Sheng, Quan Z.; Taylor, Kerry; Qin, Yongrui.

Database Systems for Advanced Applications: 20th International Conference, DASFAA 2015, Hanoi, Vietnam, April 20-23, 2015, Proceedings, Part II. ed. / Matthias Renz; Cyrus Shahabi; Xiaofang Zhou; Muhammad Aamir Cheema. Springer Verlag, 2015. p. 259-274 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9050).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Identifying and caching hot triples for efficient RDF query processing

AU - Zhang, Wei Emma

AU - Sheng, Quan Z.

AU - Taylor, Kerry

AU - Qin, Yongrui

PY - 2015/4/9

Y1 - 2015/4/9

N2 - Resource Description Framework (RDF) has been used as a general model for conceptual description and information modelling. As the growing number and volume of RDF datasets emerged recently, many techniques have been developed for accelerating the query answering process on triple stores, which handle large-scale RDF data. Caching is one of the popular solutions. Non-RDBMS based triple stores, which leverage the intrinsic nature of RDF graphs, are emerging and attracting more research attention in recent years. However, as their fundamental structure is different from RDBMS triple stores, they can not leverage the RDBMS caching mechanism. In this paper, we develop a time-aware frequency based caching algorithm to address this issue. Our approach retrieves the accessed triples by analyzing and expanding previous queries and collects most frequently accessed triples by evaluating their access frequencies using Exponential Smoothing, a forecasting method. We evaluate our approach using real world queries from a publicly available SPARQL endpoint. Our theoretical analysis and empirical results show that the proposed approach outperforms the state-of-the-art approaches with higher hit rates.

AB - Resource Description Framework (RDF) has been used as a general model for conceptual description and information modelling. As the growing number and volume of RDF datasets emerged recently, many techniques have been developed for accelerating the query answering process on triple stores, which handle large-scale RDF data. Caching is one of the popular solutions. Non-RDBMS based triple stores, which leverage the intrinsic nature of RDF graphs, are emerging and attracting more research attention in recent years. However, as their fundamental structure is different from RDBMS triple stores, they can not leverage the RDBMS caching mechanism. In this paper, we develop a time-aware frequency based caching algorithm to address this issue. Our approach retrieves the accessed triples by analyzing and expanding previous queries and collects most frequently accessed triples by evaluating their access frequencies using Exponential Smoothing, a forecasting method. We evaluate our approach using real world queries from a publicly available SPARQL endpoint. Our theoretical analysis and empirical results show that the proposed approach outperforms the state-of-the-art approaches with higher hit rates.

KW - Caching

KW - Exponential smoothing

KW - Query expansion

KW - RDF

UR - http://www.scopus.com/inward/record.url?scp=84942592907&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-18123-3_16

DO - 10.1007/978-3-319-18123-3_16

M3 - Conference contribution

SN - 9783319181226

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 259

EP - 274

BT - Database Systems for Advanced Applications

A2 - Renz, Matthias

A2 - Shahabi, Cyrus

A2 - Zhou, Xiaofang

A2 - Cheema, Muhammad Aamir

PB - Springer Verlag

ER -

Zhang WE, Sheng QZ, Taylor K, Qin Y. Identifying and caching hot triples for efficient RDF query processing. In Renz M, Shahabi C, Zhou X, Cheema MA, editors, Database Systems for Advanced Applications: 20th International Conference, DASFAA 2015, Hanoi, Vietnam, April 20-23, 2015, Proceedings, Part II. Springer Verlag. 2015. p. 259-274. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-18123-3_16