Hi everyone,
We have data set in the following format:
user1 item1 valueuser2 item1 valueuser3 item1 value……………….user1 item2 valueuser20 item2 valueuser35 item2 value………………user2 item3 valueuser25 item3 value…….
We have around 20 items and millions of users and not all users have entries for all the items. We would like to transform this into
user1 item1 value, item2, value, item3, value….user2 item4 value, item 18 value, item 19 value…..
I can think of a couple of ways for doing this in Pig Latin. For example, one way would be to create a map (where key is item name and value is the associated value) and then fill out that map as you read the data. Then write it out to a file. I am not sure how efficient will that be. I would love to get suggestions for doing this in Pig Latin.
We have data set in the following format:
user1 item1 valueuser2 item1 valueuser3 item1 value……………….user1 item2 valueuser20 item2 valueuser35 item2 value………………user2 item3 valueuser25 item3 value…….
We have around 20 items and millions of users and not all users have entries for all the items. We would like to transform this into
user1 item1 value, item2, value, item3, value….user2 item4 value, item 18 value, item 19 value…..
I can think of a couple of ways for doing this in Pig Latin. For example, one way would be to create a map (where key is item name and value is the associated value) and then fill out that map as you read the data. Then write it out to a file. I am not sure how efficient will that be. I would love to get suggestions for doing this in Pig Latin.