Achieving reasonably performing federated queries

Hi all,

This is partly a summary of my recent experiences with federated queries
and partly a request for your feedback on making /reasonably/ performing
federated queries.

The query in question is here [1]. Essentially there are two endpoints
(which may or may not be the same), and they return the same pattern.
There are millions of triples to get through, so throwing out false
negatives (early on) is quite important. We assume that graph names are
not known and that everything is accessible from the default graph. The
endpoint which dispatches the two queries needs to filter out what's
remaining. There are no common variables. This means that both endpoints
need to do their own thing and then the patterns are joined.

Needless to say, OPTIONALs that are in there are expensive, but they
help a great deal in making sure to use only what's necessary i.e.,
either a refArea doesn't have an exactMatch or if there is an
exactMatch, it contains the domain of the refArea that's at the other
endpoint. Without OPTIONALs, the outer endpoint will end up with more
possibilities to join. Using MINUS is more or less the same.

By default, ARQ uses an optimizer to do a whole bunch of good stuff
that's mostly foreign to me. What I'm aware of however is how it behaves
when it comes SERVICE calls. When the first SERVICE call comes back with
n number of triples, the second SERVICE is called n times. Undoubtedly,
this doesn't sale at all.

To work around this, I've turned off the optimizer with
Optimize.noOptimizer() [2] with a simple class which is called from the
parent endpoint's TDB assembler file. As expected, that allows the
parent to make only two SERVICE calls.

This is the current state of things. I'd like to take it further to get
more out of this, but at this point, I need a different set of eyes.

[I will prepare a chart for this, but this rough explanation might do
for now] As there are different endpoints with different amounts of
data, what I've experienced is that some of the fastest quickest queries
take around 3 seconds. That's typically queries with low number of
joins; ~150x150=22500 possibilities before the last filter kicks in. It
gets heavy quite fast, as I've seen some queries to take 30 seconds or more.

The TDB optimizer stats file is up to date on all endpoints.

I am completely open to how this query can be restructured, or simply
like to hear about your own experiences with federated queries.

[1]
http://csarven.ca/linked-statistical-data-analysis#federated-sparql-query
[2]
http://jena.apache.org/documentation/javadoc/arq/com/hp/hpl/jena/sparql/algebra/optimize/Optimize.html#noOptimizer()

-Sarven
http://csarven.ca/#i

Achieving reasonably performing federated queries

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112