Hello,
I'm having some difficulties trying to run a basic K-means clustering
job from a java project when debugging via NetBeans. I was hunting down
the cause of mkdirs failure for writing the cluster output and realized
the problem is that the configuration I injected when running the job
via ToolRunner isn't the same configuration being used when
ClusterClassifier.java tries to write the output.
Here's how I'm instantiating running the job from my code:
import org.apache.mahout.clustering.kmeans.KMeansDriver;
public void main(String[] args) throws Exception {
KMeansDriver kmeans = new KMeansDriver();
ToolRunner.run(configuration, kmeans, args);
At this point, and all the way through to the call within
buildClusters() in the KMeansDriver.java at line 219 the configuration
properties are as I expect them to be:
public static Path buildClusters(Configuration conf, Path input,
Path clustersIn, Path output,
int maxIterations, String delta, boolean runSequential)
throws IOException,
InterruptedException, ClassNotFoundException {
......
prior.writeToSeqFiles(priorClustersPath);
......
Then in writeToSegFiles in ClusterClassifier.java, line 186, there's
another call to instantiate a new Configuration() which ends up setting
the object with default values, blowing out my config so the write
operations fail:
public void writeToSeqFiles(Path path) throws IOException {
writePolicy(policy, path);
Configuration config = new Configuration();
FileSystem fs = FileSystem.get(path.toUri(), config);
......
I've also noticed in the KMeansDriver.java there are various calls to
getConf() in AbstractJob.java which in turn makes a call to
super.getConf() and here the values I passed during instantiation are
picked up. How has Mahout been designed to get these configuration
values passed into core java classes when run from a ToolRunner?
Has anyone else encountered this issue? I feel I must be missing
something fundamental here, but I can't figure out how to get my config
values to stick.
Thanks for any tips...
Terry
I'm having some difficulties trying to run a basic K-means clustering
job from a java project when debugging via NetBeans. I was hunting down
the cause of mkdirs failure for writing the cluster output and realized
the problem is that the configuration I injected when running the job
via ToolRunner isn't the same configuration being used when
ClusterClassifier.java tries to write the output.
Here's how I'm instantiating running the job from my code:
import org.apache.mahout.clustering.kmeans.KMeansDriver;
public void main(String[] args) throws Exception {
KMeansDriver kmeans = new KMeansDriver();
ToolRunner.run(configuration, kmeans, args);
At this point, and all the way through to the call within
buildClusters() in the KMeansDriver.java at line 219 the configuration
properties are as I expect them to be:
public static Path buildClusters(Configuration conf, Path input,
Path clustersIn, Path output,
int maxIterations, String delta, boolean runSequential)
throws IOException,
InterruptedException, ClassNotFoundException {
......
prior.writeToSeqFiles(priorClustersPath);
......
Then in writeToSegFiles in ClusterClassifier.java, line 186, there's
another call to instantiate a new Configuration() which ends up setting
the object with default values, blowing out my config so the write
operations fail:
public void writeToSeqFiles(Path path) throws IOException {
writePolicy(policy, path);
Configuration config = new Configuration();
FileSystem fs = FileSystem.get(path.toUri(), config);
......
I've also noticed in the KMeansDriver.java there are various calls to
getConf() in AbstractJob.java which in turn makes a call to
super.getConf() and here the values I passed during instantiation are
picked up. How has Mahout been designed to get these configuration
values passed into core java classes when run from a ToolRunner?
Has anyone else encountered this issue? I feel I must be missing
something fundamental here, but I can't figure out how to get my config
values to stick.
Thanks for any tips...
Terry