Several people have experienced problems using Spark 1.0.2 with Mahout so I tried it. Spark 1.0.1 is no longer a recommended version and so is a little harder to get and people seem to be using newer versions. I discovered that Mahout compiles with 1.0.2 in the pom and executes the tests but fails a simple test on a cluster. It has an anonymous function name error, which causes a class not found. This looks like a Scala thing but not sure. At first blush this means we can’t upgrade to Spark 1.0.2 without some relative deep diving so I’m giving up on it for now and trying Spark 1.1.0, the current stable version that actually had an RC cycle. It uses the same version of Scala as 1.0.1
On Spark 1.1.0 Mahout builds and runs test fine but on a cluster I get a class not found for a random number generator used in mahout common. I think it’s because it is never packaged as a dependency in a “job” jar assembly so tried adding it to the spark pom. Not sure if this is the right way to solve this so if anyone has a better idea please speak up.
Getting off the dubious Spark 1.0.1 version is turning out to be a bit of work. Does anyone object to upgrading our Spark dependency? I’m not sure if Mahout built for Spark 1.1.0 will run on 1.0.1 so it may mean upgrading your Spark cluster.
On Spark 1.1.0 Mahout builds and runs test fine but on a cluster I get a class not found for a random number generator used in mahout common. I think it’s because it is never packaged as a dependency in a “job” jar assembly so tried adding it to the spark pom. Not sure if this is the right way to solve this so if anyone has a better idea please speak up.
Getting off the dubious Spark 1.0.1 version is turning out to be a bit of work. Does anyone object to upgrading our Spark dependency? I’m not sure if Mahout built for Spark 1.1.0 will run on 1.0.1 so it may mean upgrading your Spark cluster.