Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

parallelALS - error when parsing IDs (problem solved)

$
0
0
10 minutes after writing this, I found the answer, but thought I'd share
anyway...

Hi,

I'm attempting to follow the notes here:

http://svn.apache.org/repos/asf/mahout/trunk/examples/bin/factorize-movielens-1M.sh

I can successfully run the splitDataset job, but I get a failure when
running parallelALS on my dataset:

java.lang.NumberFormatException: For input string: "2937047778"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:495)
at java.lang.Integer.parseInt(Integer.java:527)
at
org.apache.mahout.cf.taste.hadoop.TasteHadoopUtils.readID(TasteHadoopUtils.java:61)

I can see that my ID is too large for Integer/parseInt - is that a bug? I'd
think that if the splitDataset, recommenditembased and itemsimilarity jobs
all work fine with Long IDs, then the parallelALS job would as well?

Wait!

--- 10 minutes later, after Googling and finding the source code here:

http://svn.apache.org/repos/asf/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/TasteHadoopUtils.java

I see that there's an additional argument to readID called usesLongIDs,
then I look at the help for parallelALS and see the "--usesLongIDs" option.
I set that to true and bam! Things are working nicely!

Maybe this will help someone else :)

- Matt

Viewing all articles
Browse latest Browse all 5648

Trending Articles