Hi, We are trying calculate ItemSimilarity.
Right now we have 2*10^7 input lines. I do provide input data as raw text
each day to recalculate item similarities. We do get +100..1000 new items
each day.
1. It takes too much time to prepare input data.
2. It takes too much time to convert user_id, item_id to mahout ids
Is there any poissibility to provide data to mahout mapreduce
ItemSimilarity using some binary format with compression?
Right now we have 2*10^7 input lines. I do provide input data as raw text
each day to recalculate item similarities. We do get +100..1000 new items
each day.
1. It takes too much time to prepare input data.
2. It takes too much time to convert user_id, item_id to mahout ids
Is there any poissibility to provide data to mahout mapreduce
ItemSimilarity using some binary format with compression?