I am processing a data source containing log files. Each record contains a number of fields, but one field has a string value that is a dump of the log records for given day. The log entries contained within this string "blob" are not fixed length, but do follow a set pattern. I can break this blob down into individual log entries fairly easily using Java. The basic pattern is [<date>][<component>][<msgtype>] <message text> and a given blob may contain up to 100 such records. Can this be broken down using Pig? I'm looking for records containing specific message types to output.
Can someone point me to any examples where Pig is used to break down a string into substrings based on a pattern?
If I create a UDF using Java to break down the string into substrings, can I return the substrings as a list to Pig?
If so, how do I iterate through the list in Pig?
Ron W.
Can someone point me to any examples where Pig is used to break down a string into substrings based on a pattern?
If I create a UDF using Java to break down the string into substrings, can I return the substrings as a list to Pig?
If so, how do I iterate through the list in Pig?
Ron W.