Hello friends,
I am new to flume and have written a python script to fetch some data from
social media. My response is JSON. I am seeking help on following issues:
1. I am finding it hard to make python and flume talk. Is it just my
ignorance or it is indeed a long route? AFAIK, I need to understand thrift
API and Avro etc to achieve this. I also read about pipes. Would this be a
simple implementation
2. I am equally comfortable (uncomfortable) in java. Hence wondering if its
better to re-write my application in Java so that I can easily integrate it
with flume. Are there any advantages of having a java application, as all
of hadoop is java?
3. I need to schedule the agent to run on a daily basis. Which of the above
approaches would help me achieve this easily?
4. Going by this -
http://mail-archives.apache.org/mod_mbox/flume-user/201306.mbox/%3CA7B08BAB-C8B8-4B55-B3EC-A80AB4EBB438 [ at ] gmail.com%3Elooks
like we need to manually clean up disk space even with flume. I am
not clear on the advantages I would have with flume over using a simple
cron job to do the task. I can manually write statements like "hadoop fs
-put <location of output file on local> <location on hdfs>" in the cron job
instead.
Appreciate your help and guidance
regards,
Sunita
I am new to flume and have written a python script to fetch some data from
social media. My response is JSON. I am seeking help on following issues:
1. I am finding it hard to make python and flume talk. Is it just my
ignorance or it is indeed a long route? AFAIK, I need to understand thrift
API and Avro etc to achieve this. I also read about pipes. Would this be a
simple implementation
2. I am equally comfortable (uncomfortable) in java. Hence wondering if its
better to re-write my application in Java so that I can easily integrate it
with flume. Are there any advantages of having a java application, as all
of hadoop is java?
3. I need to schedule the agent to run on a daily basis. Which of the above
approaches would help me achieve this easily?
4. Going by this -
http://mail-archives.apache.org/mod_mbox/flume-user/201306.mbox/%3CA7B08BAB-C8B8-4B55-B3EC-A80AB4EBB438 [ at ] gmail.com%3Elooks
like we need to manually clean up disk space even with flume. I am
not clear on the advantages I would have with flume over using a simple
cron job to do the task. I can manually write statements like "hadoop fs
-put <location of output file on local> <location on hdfs>" in the cron job
instead.
Appreciate your help and guidance
regards,
Sunita