Is there a way in Pig to change this configuration
(pig.maxCombinedSplitSize) at different steps inside the *same* Pig script?
For example, when I am LOADing the data I want this value to be low so that
we use the block size effectively many mappers get triggered. (Otherwise,
the job takes too long).
But later when I SPLIT my output, I want split size to be large so we don't
create 4000 small output files. (SPLIT is a mapper only task).
Is there a way to accomplish this?
(pig.maxCombinedSplitSize) at different steps inside the *same* Pig script?
For example, when I am LOADing the data I want this value to be low so that
we use the block size effectively many mappers get triggered. (Otherwise,
the job takes too long).
But later when I SPLIT my output, I want split size to be large so we don't
create 4000 small output files. (SPLIT is a mapper only task).
Is there a way to accomplish this?