Hi all and merry Christmas !
I generate a file using a Pig script embedded in a Java process and
store it using a BinStorage.
Then, I would like to read this file directly from another Java
client, but without starting a Pig script (i.e only by using Hadoop
API and Pig's BinStorage class).
The goal is to achieve some real-time computation by scanning the
file in realtime, and so I cannot offer to start a Pig script to do
the computation, as the time overhead to start the script and get
the result is too long for my realtime objectives (I need a result
in a few seconds).
Of course, I could use a JsonStorage and read my file using a Json
deserializer, but my guess is it would be much slower, and also
painful to handle the various parts generated for the output file
(part-r-XXXXX).
Best regards,
I generate a file using a Pig script embedded in a Java process and
store it using a BinStorage.
Then, I would like to read this file directly from another Java
client, but without starting a Pig script (i.e only by using Hadoop
API and Pig's BinStorage class).
The goal is to achieve some real-time computation by scanning the
file in realtime, and so I cannot offer to start a Pig script to do
the computation, as the time overhead to start the script and get
the result is too long for my realtime objectives (I need a result
in a few seconds).
Of course, I could use a JsonStorage and read my file using a Json
deserializer, but my guess is it would be much slower, and also
painful to handle the various parts generated for the output file
(part-r-XXXXX).
Best regards,