Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

mapreduce ItemSimilarity input optimization

$
0
0
Hi, We are trying calculate ItemSimilarity.
Right now we have 2*10^7 input lines. I do provide input data as raw text
each day to recalculate item similarities. We do get +100..1000 new items
each day.
1. It takes too much time to prepare input data.
2. It takes too much time to convert user_id, item_id to mahout ids

Is there any poissibility to provide data to mahout mapreduce
ItemSimilarity using some binary format with compression?

Viewing all articles
Browse latest Browse all 5648

Trending Articles