Hi all,
I am using Jena/ARQ to perform SPARQL queries on a “virtual RDF dataset”, in the manner of projects such as D2RQ. In other words, I am keeping data in their native form, but exposing the data as triples, which I generate on the fly in response to pattern match requests. I’ve achieved this by extending GraphBase and implementing the graphBaseFind() method.
I’d like to understand how I can optimise SPARQL queries on this dataset using ARQ. For example, let’s say I have a SPARQL query on my dataset:
SELECT ?s WHERE
?s a ex:DataPoint .
?s ex:hasValue 25 .
Let’s further say that I know in advance that the second pattern ("?s ex:hasValue 25") is likely to lead to far fewer matches than the first ("?s a ex:DataPoint”). Therefore it might be optimal to evaluate the second pattern first, to minimise the total number of pattern matches that are attempted.
I read the ARQ documentation on the web (e.g. https://jena.apache.org/documentation/query/arq-query-eval.html) and I could see that this might be relevant but unfortunately it was a little complex for me (I’m a newbie in this area!)
I also found a page on "ARQ-Optimizer" (http://docs.huihoo.com/jena/ARQ/bgp-optimization.html) which mentions optimisation by reordering triple patterns based on a cost function, to minimise the size of intermediate result sets. This seems relevant and the page states that this optimisation is enabled by default in ARQ, but I don’t understand how the cost function is constructed.
In the Linked Data Fragments system for querying distributed data (http://linkeddatafragments.org/), servers return metadata about how many results are expected for each graph pattern. The query engine then decides in which order to attempt the matches before fully evaluating the query. In my synthetic system, I can provide good estimates for how many triples are likely to match a given pattern, which may be helpful in optimisation in an analogous fashion.
I’d be very grateful for any guidance on how I can perform such optimisation in ARQ.
Thanks in advance,
Jon Blower
University of Reading, UK
I am using Jena/ARQ to perform SPARQL queries on a “virtual RDF dataset”, in the manner of projects such as D2RQ. In other words, I am keeping data in their native form, but exposing the data as triples, which I generate on the fly in response to pattern match requests. I’ve achieved this by extending GraphBase and implementing the graphBaseFind() method.
I’d like to understand how I can optimise SPARQL queries on this dataset using ARQ. For example, let’s say I have a SPARQL query on my dataset:
SELECT ?s WHERE
?s a ex:DataPoint .
?s ex:hasValue 25 .
Let’s further say that I know in advance that the second pattern ("?s ex:hasValue 25") is likely to lead to far fewer matches than the first ("?s a ex:DataPoint”). Therefore it might be optimal to evaluate the second pattern first, to minimise the total number of pattern matches that are attempted.
I read the ARQ documentation on the web (e.g. https://jena.apache.org/documentation/query/arq-query-eval.html) and I could see that this might be relevant but unfortunately it was a little complex for me (I’m a newbie in this area!)
I also found a page on "ARQ-Optimizer" (http://docs.huihoo.com/jena/ARQ/bgp-optimization.html) which mentions optimisation by reordering triple patterns based on a cost function, to minimise the size of intermediate result sets. This seems relevant and the page states that this optimisation is enabled by default in ARQ, but I don’t understand how the cost function is constructed.
In the Linked Data Fragments system for querying distributed data (http://linkeddatafragments.org/), servers return metadata about how many results are expected for each graph pattern. The query engine then decides in which order to attempt the matches before fully evaluating the query. In my synthetic system, I can provide good estimates for how many triples are likely to match a given pattern, which may be helpful in optimisation in an analogous fashion.
I’d be very grateful for any guidance on how I can perform such optimisation in ARQ.
Thanks in advance,
Jon Blower
University of Reading, UK