Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

using MultiStorage

$
0
0
have a directory with files with somewhat mailformatted logs (NEWLINE
delimited).

I would like to select specific position in each row and use it as a
directory/file name, then store the original content as-is in the files.
Basically re-partition files based on the content.

code below works just fine and does almost what I expect, the problem is
that the substring called "myfile" is now inside of the new file because B
is a tuple, is there a way to store the original relation, in my case A in
the file and use "myfile" as a file name meaning preserve the original
files content as is?

thank you

REGISTER /lib/pig/piggybank.jar;

A = LOAD '/raw/*' USING PigStorage('\n') AS (mytext:chararray);
B = FOREACH A GENERATE SUBSTRING(mytext,5,7) as myfile, mytext;
STORE B INTO '/output' USING
org.apache.pig.piggybank.storage.MultiStorage('/outpu

Viewing all articles
Browse latest Browse all 5648

Trending Articles