Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

CustomPartitioner sometimes placing data in incorrect partition file

$
0
0
*I am loading 3 data sources with data like:*
Source 1 (main data source):
id, id1, userId, type

Source 2 (supporting source used for filtering):
parition_number, id, id1

Source 3 (static set source with all allowed types):
type

*Am using the following pig script, to count the unique userIds by type:*
grains = CROSS source1, source2;

users = JOIN
grains BY (source2::id, source2::id1, source3::type) LEFT OUTER,
source1 BY (id, id1, type);

usersGrouped = GROUP users
BY (metricsGrains::grain1::partitionNumber,
grains:: source2::organizationId,
grains:: source2::networkId,
grains:: source3::browserFormFactor)
PARTITION BY MyCustomPartitioner PARALLEL 32;

counts = FOREACH usersGrouped {
userCountCount = DISTINCT users.(source1::userId);
GENERATE FLATTEN(group), COUNT(userCount);

STORE counts INTO 'output';

*My partitioner is quite simple (It just fetches a hdfs partition from 0 to
31 based on the partition number 1 to 32):*

public class MyCustomPartitioner extends Partitioner<PigNullableWritable,
NullableTuple> {

@Override

public int getPartition(PigNullableWritable partitionWritable,
NullableTuple valueWritable, int numPartitions) {

String partition = partitionWritable.getValueAsPigType().toString();

int inputPartitionNum = Integer.valueOf(partition.substring(1,
partition.indexOf(",")));

int hdfsPartitionNum = inputPartitionNum - 1;

checkState(hdfsPartitionNum >= 0 & hdfsPartitionNum <
numPartitions, "Invalid partition chosen: " + hdfsPartitionNum);

return hdfsPartitionNum;

So, data from input partition 1 should always result in part0000 file,
partition 2 data should go in part0001 file and so on. But sometime,
partition 1 data is resulting in part0005 (any random partition) file. This
is not happening for all the data sets but for some and that too randomly.

I am using Hadoop 2.3 with Pig 0.13. Please advise what could be the issue
here?

Thanks,

Shakti

Viewing all articles
Browse latest Browse all 5648

Trending Articles