Hi all,
Some context (not an expert Java programmer, and just starting with
AVRO/Flume):
I need to transfer avro files from different servers to HDFS I am trying to
use Flume to do it.
I have a Flume spooldir source (reading the avro files) with an avro sink
and avro sink with a HDFS sink. Like this:
servers | hadoop
spooldir src -> avro sink --------> avro src -> hdfs
When Flume spooldir deserialize the avro files creates an flume event with
two fields: 1) header contains the schema; 2) and in the body field has the
binary Avro record data, not including the schema or the rest of the
container file elements. See the flume docs:
http://flume.apache.org/FlumeUserGuide.html#avro
So the avro sink creates an avro file like this:
{"headers": {"flume.avro.schema.literal":
"{\"type\":\"record\",\"name\":\"User\",\"namespace\":\"example.avro\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"favorite_number\",\"type\":[\"int\",\"null\"]},{\"name\":\"favorite_color\",\"type\":[\"string\",\"null\"]}]}"},
"body": {"bytes": "{BYTES}"}}
So now I am trying to write a serializer since flume only includes an
FlumeEvent serializer creating avro files like the one above, not the
original avro files on the servers.
I am almost there, I got the schema from the header field and the bytes
from the body field.
But now I need to create write the AVRO file based on the bytes, not the
values from the fields, I cannot do: r.put("field", "value") since I don't
have the values, just the bytes.
This is the code:
File file = TESTFILE;
DatumReader<GenericRecord> datumReader = new
GenericDatumReader<GenericRecord>();
DataFileReader<GenericRecord> dataFileReader = new
DataFileReader<GenericRecord>(file, datumReader);
GenericRecord user = null;
while (dataFileReader.hasNext()) {
user = dataFileReader.next(user);
Map headers = (Map) user.get("headers");
Utf8 schemaHeaderKey = new Utf8("flume.avro.schema.literal");
String schema = headers.get(schemaHeaderKey).toString();
ByteBuffer body = (ByteBuffer) user.get("body");
// Writing...
Schema.Parser parser = new Schema.Parser();
Schema schemaSimpleWrapper = parser.parse(schema);
GenericRecord r = new GenericData.Record(schemaSimpleWrapper);
// NOT SURE WHAT COMES NEXT
Is possible to actually create the AVRO files from the value bytes?
I appreciate any help.
Thanks,
Daniel
Some context (not an expert Java programmer, and just starting with
AVRO/Flume):
I need to transfer avro files from different servers to HDFS I am trying to
use Flume to do it.
I have a Flume spooldir source (reading the avro files) with an avro sink
and avro sink with a HDFS sink. Like this:
servers | hadoop
spooldir src -> avro sink --------> avro src -> hdfs
When Flume spooldir deserialize the avro files creates an flume event with
two fields: 1) header contains the schema; 2) and in the body field has the
binary Avro record data, not including the schema or the rest of the
container file elements. See the flume docs:
http://flume.apache.org/FlumeUserGuide.html#avro
So the avro sink creates an avro file like this:
{"headers": {"flume.avro.schema.literal":
"{\"type\":\"record\",\"name\":\"User\",\"namespace\":\"example.avro\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"favorite_number\",\"type\":[\"int\",\"null\"]},{\"name\":\"favorite_color\",\"type\":[\"string\",\"null\"]}]}"},
"body": {"bytes": "{BYTES}"}}
So now I am trying to write a serializer since flume only includes an
FlumeEvent serializer creating avro files like the one above, not the
original avro files on the servers.
I am almost there, I got the schema from the header field and the bytes
from the body field.
But now I need to create write the AVRO file based on the bytes, not the
values from the fields, I cannot do: r.put("field", "value") since I don't
have the values, just the bytes.
This is the code:
File file = TESTFILE;
DatumReader<GenericRecord> datumReader = new
GenericDatumReader<GenericRecord>();
DataFileReader<GenericRecord> dataFileReader = new
DataFileReader<GenericRecord>(file, datumReader);
GenericRecord user = null;
while (dataFileReader.hasNext()) {
user = dataFileReader.next(user);
Map headers = (Map) user.get("headers");
Utf8 schemaHeaderKey = new Utf8("flume.avro.schema.literal");
String schema = headers.get(schemaHeaderKey).toString();
ByteBuffer body = (ByteBuffer) user.get("body");
// Writing...
Schema.Parser parser = new Schema.Parser();
Schema schemaSimpleWrapper = parser.parse(schema);
GenericRecord r = new GenericData.Record(schemaSimpleWrapper);
// NOT SURE WHAT COMES NEXT
Is possible to actually create the AVRO files from the value bytes?
I appreciate any help.
Thanks,
Daniel