This document discusses a machine learning approach to predicting the performance of SPARQL queries over Linked Data without using statistics about the underlying RDF data. It extracts features from SPARQL queries to represent them for machine learning algorithms. The features include algebra features from the SPARQL query expressions and graph pattern features that model the query pattern. Experiments on DBpedia data show the approach can highly accurately predict execution times for common Linked Data queries by training machine learning models on previously executed queries. Future work may incorporate additional features like bandwidth and optimize queries for Linked Data applications and query processing.
Related topics: