10 minutes after writing this, I found the answer, but thought I'd share
anyway...
Hi,
I'm attempting to follow the notes here:
http://svn.apache.org/repos/asf/mahout/trunk/examples/bin/factorize-movielens-1M.sh
I can successfully run the splitDataset job, but I get a failure when
running parallelALS on my dataset:
java.lang.NumberFormatException: For input string: "2937047778"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:495)
at java.lang.Integer.parseInt(Integer.java:527)
at
org.apache.mahout.cf.taste.hadoop.TasteHadoopUtils.readID(TasteHadoopUtils.java:61)
I can see that my ID is too large for Integer/parseInt - is that a bug? I'd
think that if the splitDataset, recommenditembased and itemsimilarity jobs
all work fine with Long IDs, then the parallelALS job would as well?
Wait!
--- 10 minutes later, after Googling and finding the source code here:
http://svn.apache.org/repos/asf/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/TasteHadoopUtils.java
I see that there's an additional argument to readID called usesLongIDs,
then I look at the help for parallelALS and see the "--usesLongIDs" option.
I set that to true and bam! Things are working nicely!
Maybe this will help someone else :)
- Matt
anyway...
Hi,
I'm attempting to follow the notes here:
http://svn.apache.org/repos/asf/mahout/trunk/examples/bin/factorize-movielens-1M.sh
I can successfully run the splitDataset job, but I get a failure when
running parallelALS on my dataset:
java.lang.NumberFormatException: For input string: "2937047778"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:495)
at java.lang.Integer.parseInt(Integer.java:527)
at
org.apache.mahout.cf.taste.hadoop.TasteHadoopUtils.readID(TasteHadoopUtils.java:61)
I can see that my ID is too large for Integer/parseInt - is that a bug? I'd
think that if the splitDataset, recommenditembased and itemsimilarity jobs
all work fine with Long IDs, then the parallelALS job would as well?
Wait!
--- 10 minutes later, after Googling and finding the source code here:
http://svn.apache.org/repos/asf/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/TasteHadoopUtils.java
I see that there's an additional argument to readID called usesLongIDs,
then I look at the help for parallelALS and see the "--usesLongIDs" option.
I set that to true and bam! Things are working nicely!
Maybe this will help someone else :)
- Matt