Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

Join on custom LoadFunc not working correctly

$
0
0
Hey guys,

I have a custom Storage function that loads from the Accumulo database
(similar to HBase).
I have the following script that I'm trying to execute:

A = load 'accumulo://table_a'
using org.apache.accumulo.pig.AccumuloStorage('cf:cq1 cf:cq2',
'-loadKey')
as (id: chararray, a: chararray, b: chararray);
B = load 'accumulo://table_b'
using org.apache.accumulo.pig.AccumuloStorage('cf:cq1 cf:cq2',
'-loadKey')
as (id: chararray, a: chararray, b: chararray);
C = join A by a, B by b;
dump C;

When I execute this dataset A is not getting loaded.
If I do the following:
C = join B by b, A by a;
A is loaded, but B is not.

The current work around I have for this is to store A and B into temporary
storage using PigStorage() and load them again to do my join. However,
that's extra read/write phases that I'd like to avoid. In my implementation
of the AccumuloStorage() function, I set pig.noSplitCombination to true.

I'm not sure what the problem with my LoadFunc is and why it's not loading
both datasets correctly.

Any help would be appreciated.

Thanks
Pradeep

Viewing all articles
Browse latest Browse all 5648

Trending Articles