Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

Create Avro from bytes, not by fields

$
0
0
Hi all,

Some context (not an expert Java programmer, and just starting with
AVRO/Flume):

I need to transfer avro files from different servers to HDFS I am trying to
use Flume to do it.
I have a Flume spooldir source (reading the avro files) with an avro sink
and avro sink with a HDFS sink. Like this:

servers | hadoop
spooldir src -> avro sink --------> avro src -> hdfs

When Flume spooldir deserialize the avro files creates an flume event with
two fields: 1) header contains the schema; 2) and in the body field has the
binary Avro record data, not including the schema or the rest of the
container file elements. See the flume docs:
http://flume.apache.org/FlumeUserGuide.html#avro

So the avro sink creates an avro file like this:

{"headers": {"flume.avro.schema.literal":
"{\"type\":\"record\",\"name\":\"User\",\"namespace\":\"example.avro\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"favorite_number\",\"type\":[\"int\",\"null\"]},{\"name\":\"favorite_color\",\"type\":[\"string\",\"null\"]}]}"},
"body": {"bytes": "{BYTES}"}}

So now I am trying to write a serializer since flume only includes an
FlumeEvent serializer creating avro files like the one above, not the
original avro files on the servers.

I am almost there, I got the schema from the header field and the bytes
from the body field.
But now I need to create write the AVRO file based on the bytes, not the
values from the fields, I cannot do: r.put("field", "value") since I don't
have the values, just the bytes.

This is the code:

File file = TESTFILE;

DatumReader<GenericRecord> datumReader = new
GenericDatumReader<GenericRecord>();
DataFileReader<GenericRecord> dataFileReader = new
DataFileReader<GenericRecord>(file, datumReader);
GenericRecord user = null;
while (dataFileReader.hasNext()) {
user = dataFileReader.next(user);

Map headers = (Map) user.get("headers");

Utf8 schemaHeaderKey = new Utf8("flume.avro.schema.literal");
String schema = headers.get(schemaHeaderKey).toString();

ByteBuffer body = (ByteBuffer) user.get("body");

// Writing...
Schema.Parser parser = new Schema.Parser();
Schema schemaSimpleWrapper = parser.parse(schema);
GenericRecord r = new GenericData.Record(schemaSimpleWrapper);

// NOT SURE WHAT COMES NEXT

Is possible to actually create the AVRO files from the value bytes?

I appreciate any help.

Thanks,
Daniel

Viewing all articles
Browse latest Browse all 5648

Trending Articles