have a directory with files with somewhat mailformatted logs (NEWLINE
delimited).
I would like to select specific position in each row and use it as a
directory/file name, then store the original content as-is in the files.
Basically re-partition files based on the content.
code below works just fine and does almost what I expect, the problem is
that the substring called "myfile" is now inside of the new file because B
is a tuple, is there a way to store the original relation, in my case A in
the file and use "myfile" as a file name meaning preserve the original
files content as is?
thank you
REGISTER /lib/pig/piggybank.jar;
A = LOAD '/raw/*' USING PigStorage('\n') AS (mytext:chararray);
B = FOREACH A GENERATE SUBSTRING(mytext,5,7) as myfile, mytext;
STORE B INTO '/output' USING
org.apache.pig.piggybank.storage.MultiStorage('/outpu
delimited).
I would like to select specific position in each row and use it as a
directory/file name, then store the original content as-is in the files.
Basically re-partition files based on the content.
code below works just fine and does almost what I expect, the problem is
that the substring called "myfile" is now inside of the new file because B
is a tuple, is there a way to store the original relation, in my case A in
the file and use "myfile" as a file name meaning preserve the original
files content as is?
thank you
REGISTER /lib/pig/piggybank.jar;
A = LOAD '/raw/*' USING PigStorage('\n') AS (mytext:chararray);
B = FOREACH A GENERATE SUBSTRING(mytext,5,7) as myfile, mytext;
STORE B INTO '/output' USING
org.apache.pig.piggybank.storage.MultiStorage('/outpu