Hey guys,
I have a custom Storage function that loads from the Accumulo database
(similar to HBase).
I have the following script that I'm trying to execute:
A = load 'accumulo://table_a'
using org.apache.accumulo.pig.AccumuloStorage('cf:cq1 cf:cq2',
'-loadKey')
as (id: chararray, a: chararray, b: chararray);
B = load 'accumulo://table_b'
using org.apache.accumulo.pig.AccumuloStorage('cf:cq1 cf:cq2',
'-loadKey')
as (id: chararray, a: chararray, b: chararray);
C = join A by a, B by b;
dump C;
When I execute this dataset A is not getting loaded.
If I do the following:
C = join B by b, A by a;
A is loaded, but B is not.
The current work around I have for this is to store A and B into temporary
storage using PigStorage() and load them again to do my join. However,
that's extra read/write phases that I'd like to avoid. In my implementation
of the AccumuloStorage() function, I set pig.noSplitCombination to true.
I'm not sure what the problem with my LoadFunc is and why it's not loading
both datasets correctly.
Any help would be appreciated.
Thanks
Pradeep
I have a custom Storage function that loads from the Accumulo database
(similar to HBase).
I have the following script that I'm trying to execute:
A = load 'accumulo://table_a'
using org.apache.accumulo.pig.AccumuloStorage('cf:cq1 cf:cq2',
'-loadKey')
as (id: chararray, a: chararray, b: chararray);
B = load 'accumulo://table_b'
using org.apache.accumulo.pig.AccumuloStorage('cf:cq1 cf:cq2',
'-loadKey')
as (id: chararray, a: chararray, b: chararray);
C = join A by a, B by b;
dump C;
When I execute this dataset A is not getting loaded.
If I do the following:
C = join B by b, A by a;
A is loaded, but B is not.
The current work around I have for this is to store A and B into temporary
storage using PigStorage() and load them again to do my join. However,
that's extra read/write phases that I'd like to avoid. In my implementation
of the AccumuloStorage() function, I set pig.noSplitCombination to true.
I'm not sure what the problem with my LoadFunc is and why it's not loading
both datasets correctly.
Any help would be appreciated.
Thanks
Pradeep