Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

Difficulties adding a custom job (analyzer) to Hadoop

$
0
0
All,

I'm having a tough time adding a custom analyzer to Hadoop and making use
of it through Mahout.

I've pruned down the Mahout in Action examples to a sole example which is a
customized Mahout 0.9 MailArchivesClusteringAnalyzer in
https://github.com/momer/MiA/blob/mahout-0.9/src/main/java/mia/clustering/ch09/MoAnalyzer.java

After updating the pom.xml to use Mahout 0.9, running `mvn package` and
moving the `mia-0.7-job.jar` to $HADOOP_HOME/lib, I run into a few issues:

First, I'm unsure how to remove the duplication of dependencies on SLF4J
from the job.jar, and,

Secondly, Hadoop is unable to find the Mahout classes when I'm using my
custom job jar.

Relevant stack traces are available at
https://gist.github.com/momer/52e1e7d2dd7612b26909

I'm admittedly pretty new to Hadoop/Mahout, and would really appreciate any
pointers in the right direction. Pretty much just need to get that Porter
Stemming step out of the analyzer!

Thank you much all for maintaining and keeping the mailing list alive,

Mo

Viewing all articles
Browse latest Browse all 5648

Latest Images

Trending Articles



Latest Images