Hi all,
I'm using mahout 0.7
I'm trying to use KmeansDriver (org.apache.mahout.clustering.kmeans.KMeansDriver) with HDFS and I'm having some issues.
When I use it with my local file system everything seems to be working fine.
However, as soon as I change the Configuration object to use HDFS:
Configuration conf = new Configuration();
conf.addResource(new Path("C:\\hdp-win\\hadoop\\hadoop-1.1.0-SNAPSHOT\\conf\\core-site.xml"));
conf.addResource(new Path("C:\\hdp-win\\hadoop\\hadoop-1.1.0-SNAPSHOT\\conf\\hdfs-site.xml"))
I run into problems
I was looking at the exception I get:
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:215)
I pulled that code (org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:215)and I think is trying to read a file from one of the paths I passed to the method but with a new instance of the configuration object (not the configuration object I passed to the method but one that doesn't have my HDFS configured)
205
public void [More ...] readFromSeqFiles(Configuration conf, Path path) throws IOException {
206
Configuration config = new Configuration();
207
List<Cluster> clusters = Lists.newArrayList();
208
for (ClusterWritable cw : new SequenceFileDirValueIterable<ClusterWritable>(path, PathType.LIST,
209
PathFilters.logsCRCFilter(), config)) {
210
Cluster cluster = cw.getValue();
211
cluster.configure(conf);
212
clusters.add(cluster);
213
}
214
this.models = clusters;
215
modelClass = models.get(0).getClass().getName();
216
this.policy = readPolicy(path);
217
}
any help would be really appreciated :)
Thanks!
Alan
I'm using mahout 0.7
I'm trying to use KmeansDriver (org.apache.mahout.clustering.kmeans.KMeansDriver) with HDFS and I'm having some issues.
When I use it with my local file system everything seems to be working fine.
However, as soon as I change the Configuration object to use HDFS:
Configuration conf = new Configuration();
conf.addResource(new Path("C:\\hdp-win\\hadoop\\hadoop-1.1.0-SNAPSHOT\\conf\\core-site.xml"));
conf.addResource(new Path("C:\\hdp-win\\hadoop\\hadoop-1.1.0-SNAPSHOT\\conf\\hdfs-site.xml"))
I run into problems
I was looking at the exception I get:
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:215)
I pulled that code (org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:215)and I think is trying to read a file from one of the paths I passed to the method but with a new instance of the configuration object (not the configuration object I passed to the method but one that doesn't have my HDFS configured)
205
public void [More ...] readFromSeqFiles(Configuration conf, Path path) throws IOException {
206
Configuration config = new Configuration();
207
List<Cluster> clusters = Lists.newArrayList();
208
for (ClusterWritable cw : new SequenceFileDirValueIterable<ClusterWritable>(path, PathType.LIST,
209
PathFilters.logsCRCFilter(), config)) {
210
Cluster cluster = cw.getValue();
211
cluster.configure(conf);
212
clusters.add(cluster);
213
}
214
this.models = clusters;
215
modelClass = models.get(0).getClass().getName();
216
this.policy = readPolicy(path);
217
}
any help would be really appreciated :)
Thanks!
Alan