Hi,
Is there a way to get the hdfs sink to signal that a file was just closed
and then use that signal to add a partition to hive if one does not exist
already.
Right now, what I do is:
- move files to s3
- run recover partitions <--- step takes forever.
But given that I have so much historical data, it's not feasible to run
recover partitions every single day since it takes forever.
I had much rather add an extra partition whenever I see a file in that
partition for the first time.
I looked around the code base and it seems the Flume-OG had something like
this but I don't see the capability in Flume-NG.
I can see a way to adding this by adding another Callback parameter to the
HdfsEventSink and create a customer wrapper around it.
Any other suggestions ?
Thanks,
Viral
Is there a way to get the hdfs sink to signal that a file was just closed
and then use that signal to add a partition to hive if one does not exist
already.
Right now, what I do is:
- move files to s3
- run recover partitions <--- step takes forever.
But given that I have so much historical data, it's not feasible to run
recover partitions every single day since it takes forever.
I had much rather add an extra partition whenever I see a file in that
partition for the first time.
I looked around the code base and it seems the Flume-OG had something like
this but I don't see the capability in Flume-NG.
I can see a way to adding this by adding another Callback parameter to the
HdfsEventSink and create a customer wrapper around it.
Any other suggestions ?
Thanks,
Viral