I'm doing Canopy clustering with CanopyDriver on a sequence file of
NamedVectors and seem to get the expected set of map and reduce
directories. But when I try to read the part-r-0000 file with a
SequenceFile.Reader, an attempt to iterate over the reader, I immediately
get a NullPointerException ---apparently either in the key or the value; I
don't know where. Here's a pretty minimal exhibit of the code doing little
but attempting to count the clusters. (n practice I need to do more that
that, and ultimately don't care about anything but generating the names of
the vectors in each cluster). Any suggestions are welcome.
I'm using mahout 0.9 and hadoop-core 1.2.1
Thanks
Bob
public class DupDumper2 {
private String datasetDir = null;
public DupDumper2(String datasetDir) {
this.datasetDir = datasetDir;
public int dumpCandidates() throws Exception {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
int outCount=0;
String dumperInputFile="/tmp/bbg/clusters/clusters-0-final/part-r-00000";
//test with constant
SequenceFile.Reader clusterReader = new SequenceFile.Reader(fs,
new Path(dumperInputFile), conf);
IntWritable key = new IntWritable();
VectorWritable value = new VectorWritable();
while (clusterReader.next(key, value)) { //key = clusterid, value = list
of vectorid ?
/* System.out.println(key.toString() + " " +
value.get().asFormatString());*/
outCount++;
clusterReader.close();
return outCount;
public static void main(String[] args) throws Exception {
DupDumper2 dumper = new DupDumper2(args[0]);
int howMany = dumper.dumpCandidates();
System.out.println(howMany);
The Exception trace is below. DupDumper2 .java:29 is the "while" line
Exception in thread "main" java.lang.NullPointerException at
org.apache.mahout.math.VectorWritable.toString(VectorWritable.java:232) at
java.lang.String.valueOf(String.java:2854) at
java.lang.StringBuilder.append(StringBuilder.java:128) at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1936) at
org.filteredpush.duplicates.DupDumper2.dumpCandidates(DupDumper2.java:29)
at org.filteredpush.duplicates.DupDumper2.main(DupDumper2.java:41)
Emeritus Professor of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
Filtered Push Project
Harvard University Herbaria
Harvard University
email: morris.bob [ at ] gmail.com
web: http://efg.cs.umb.edu/
web: http://wiki.filteredpush.org
http://www.cs.umb.edu/~ram
===
The content of this communication is made entirely on my
own behalf and in no way should be deemed to express
official positions of The University of Massachusetts at Boston or Harvard
University.
NamedVectors and seem to get the expected set of map and reduce
directories. But when I try to read the part-r-0000 file with a
SequenceFile.Reader, an attempt to iterate over the reader, I immediately
get a NullPointerException ---apparently either in the key or the value; I
don't know where. Here's a pretty minimal exhibit of the code doing little
but attempting to count the clusters. (n practice I need to do more that
that, and ultimately don't care about anything but generating the names of
the vectors in each cluster). Any suggestions are welcome.
I'm using mahout 0.9 and hadoop-core 1.2.1
Thanks
Bob
public class DupDumper2 {
private String datasetDir = null;
public DupDumper2(String datasetDir) {
this.datasetDir = datasetDir;
public int dumpCandidates() throws Exception {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
int outCount=0;
String dumperInputFile="/tmp/bbg/clusters/clusters-0-final/part-r-00000";
//test with constant
SequenceFile.Reader clusterReader = new SequenceFile.Reader(fs,
new Path(dumperInputFile), conf);
IntWritable key = new IntWritable();
VectorWritable value = new VectorWritable();
while (clusterReader.next(key, value)) { //key = clusterid, value = list
of vectorid ?
/* System.out.println(key.toString() + " " +
value.get().asFormatString());*/
outCount++;
clusterReader.close();
return outCount;
public static void main(String[] args) throws Exception {
DupDumper2 dumper = new DupDumper2(args[0]);
int howMany = dumper.dumpCandidates();
System.out.println(howMany);
The Exception trace is below. DupDumper2 .java:29 is the "while" line
Exception in thread "main" java.lang.NullPointerException at
org.apache.mahout.math.VectorWritable.toString(VectorWritable.java:232) at
java.lang.String.valueOf(String.java:2854) at
java.lang.StringBuilder.append(StringBuilder.java:128) at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1936) at
org.filteredpush.duplicates.DupDumper2.dumpCandidates(DupDumper2.java:29)
at org.filteredpush.duplicates.DupDumper2.main(DupDumper2.java:41)
Emeritus Professor of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
Filtered Push Project
Harvard University Herbaria
Harvard University
email: morris.bob [ at ] gmail.com
web: http://efg.cs.umb.edu/
web: http://wiki.filteredpush.org
http://www.cs.umb.edu/~ram
===
The content of this communication is made entirely on my
own behalf and in no way should be deemed to express
official positions of The University of Massachusetts at Boston or Harvard
University.