Learning-based SPARQL query performance modeling and prediction

Wei Emma Zhang, Quan Z. Sheng, Yongrui Qin, Kerry Taylor, Lina Yao

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

One of the challenges of managing an RDF database is predicting performance of SPARQL queries before they are executed. Performance characteristics, such as the execution time and memory usage, can help data consumers identify unexpected long-running queries before they start and estimate the system workload for query scheduling. Extensive works address such performance prediction problem in traditional SQL queries but they are not directly applicable to SPARQL queries. In this paper, we adopt machine learning techniques to predict the performance of SPARQL queries. Our work focuses on modeling features of a SPARQL query to a vector representation. Our feature modeling method does not depend on the knowledge of underlying systems and the structure of the underlying data, but only on the nature of SPARQL queries. Then we use these features to train prediction models. We propose a two-step prediction process and consider performances in both cold and warm stages. Evaluations are performed on real world SPRAQL queries, whose execution time ranges from milliseconds to hours. The results demonstrate that the proposed approach can effectively predict SPARQL query performance and outperforms state-of-the-art approaches.
LanguageEnglish
Pages1015-1035
Number of pages21
JournalWorld Wide Web
Volume21
Issue number4
Early online date24 Oct 2017
DOIs
Publication statusPublished - Jul 2018

Fingerprint

Learning systems
Scheduling
Data storage equipment

Cite this

Zhang, Wei Emma ; Sheng, Quan Z. ; Qin, Yongrui ; Taylor, Kerry ; Yao, Lina. / Learning-based SPARQL query performance modeling and prediction. In: World Wide Web. 2018 ; Vol. 21, No. 4. pp. 1015-1035.
@article{1fa8640eb29c46cbb4990f3d677a80ab,
title = "Learning-based SPARQL query performance modeling and prediction",
abstract = "One of the challenges of managing an RDF database is predicting performance of SPARQL queries before they are executed. Performance characteristics, such as the execution time and memory usage, can help data consumers identify unexpected long-running queries before they start and estimate the system workload for query scheduling. Extensive works address such performance prediction problem in traditional SQL queries but they are not directly applicable to SPARQL queries. In this paper, we adopt machine learning techniques to predict the performance of SPARQL queries. Our work focuses on modeling features of a SPARQL query to a vector representation. Our feature modeling method does not depend on the knowledge of underlying systems and the structure of the underlying data, but only on the nature of SPARQL queries. Then we use these features to train prediction models. We propose a two-step prediction process and consider performances in both cold and warm stages. Evaluations are performed on real world SPRAQL queries, whose execution time ranges from milliseconds to hours. The results demonstrate that the proposed approach can effectively predict SPARQL query performance and outperforms state-of-the-art approaches.",
keywords = "Feature modeling, Prediction, Query performance, SPARQL",
author = "Zhang, {Wei Emma} and Sheng, {Quan Z.} and Yongrui Qin and Kerry Taylor and Lina Yao",
note = "This is a post-peer-review, pre-copyedit version of an article published in World Wide Web. The final authenticated version is available online at: http://dx.doi.org/10.1007/s11280-017-0498-1",
year = "2018",
month = "7",
doi = "10.1007/s11280-017-0498-1",
language = "English",
volume = "21",
pages = "1015--1035",
journal = "World Wide Web",
issn = "1386-145X",
publisher = "Springer New York",
number = "4",

}

Zhang, WE, Sheng, QZ, Qin, Y, Taylor, K & Yao, L 2018, 'Learning-based SPARQL query performance modeling and prediction', World Wide Web, vol. 21, no. 4, pp. 1015-1035. https://doi.org/10.1007/s11280-017-0498-1

Learning-based SPARQL query performance modeling and prediction. / Zhang, Wei Emma; Sheng, Quan Z.; Qin, Yongrui; Taylor, Kerry; Yao, Lina.

In: World Wide Web, Vol. 21, No. 4, 07.2018, p. 1015-1035.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Learning-based SPARQL query performance modeling and prediction

AU - Zhang, Wei Emma

AU - Sheng, Quan Z.

AU - Qin, Yongrui

AU - Taylor, Kerry

AU - Yao, Lina

N1 - This is a post-peer-review, pre-copyedit version of an article published in World Wide Web. The final authenticated version is available online at: http://dx.doi.org/10.1007/s11280-017-0498-1

PY - 2018/7

Y1 - 2018/7

N2 - One of the challenges of managing an RDF database is predicting performance of SPARQL queries before they are executed. Performance characteristics, such as the execution time and memory usage, can help data consumers identify unexpected long-running queries before they start and estimate the system workload for query scheduling. Extensive works address such performance prediction problem in traditional SQL queries but they are not directly applicable to SPARQL queries. In this paper, we adopt machine learning techniques to predict the performance of SPARQL queries. Our work focuses on modeling features of a SPARQL query to a vector representation. Our feature modeling method does not depend on the knowledge of underlying systems and the structure of the underlying data, but only on the nature of SPARQL queries. Then we use these features to train prediction models. We propose a two-step prediction process and consider performances in both cold and warm stages. Evaluations are performed on real world SPRAQL queries, whose execution time ranges from milliseconds to hours. The results demonstrate that the proposed approach can effectively predict SPARQL query performance and outperforms state-of-the-art approaches.

AB - One of the challenges of managing an RDF database is predicting performance of SPARQL queries before they are executed. Performance characteristics, such as the execution time and memory usage, can help data consumers identify unexpected long-running queries before they start and estimate the system workload for query scheduling. Extensive works address such performance prediction problem in traditional SQL queries but they are not directly applicable to SPARQL queries. In this paper, we adopt machine learning techniques to predict the performance of SPARQL queries. Our work focuses on modeling features of a SPARQL query to a vector representation. Our feature modeling method does not depend on the knowledge of underlying systems and the structure of the underlying data, but only on the nature of SPARQL queries. Then we use these features to train prediction models. We propose a two-step prediction process and consider performances in both cold and warm stages. Evaluations are performed on real world SPRAQL queries, whose execution time ranges from milliseconds to hours. The results demonstrate that the proposed approach can effectively predict SPARQL query performance and outperforms state-of-the-art approaches.

KW - Feature modeling

KW - Prediction

KW - Query performance

KW - SPARQL

U2 - 10.1007/s11280-017-0498-1

DO - 10.1007/s11280-017-0498-1

M3 - Article

VL - 21

SP - 1015

EP - 1035

JO - World Wide Web

T2 - World Wide Web

JF - World Wide Web

SN - 1386-145X

IS - 4

ER -