Hi All,
I was able to do the clustering and need some help with viewing the result. I get the following problem.
./mahout clusterdump -i /scratch/dummyvectoroutput/clusters-*-final -d /scratch/dummyvectorfinalclusters
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.
Running on hadoop, using /users/p529444/software/hadoop-1.0.3/bin/hadoop and HADOOP_CONF_DIR=/apps/hadoop/hadoop-conf
MAHOUT-JOB: /apps/mahout/trunk/examples/target/mahout-examples-0.9-SNAPSHOT-job.jar
Warning: $HADOOP_HOME is deprecated.
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.clustering.ClusterDumper
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.classifier.sgd.TrainLogistic
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.vectors.lucene.Driver
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.classifier.sgd.RunAdaptiveLogistic
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.SequenceFileDumper
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.classifier.sgd.PrintResourceOrFile
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.text.WikipediaToSequenceFile
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.classifier.ConfusionMatrixDumper
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.regex.RegexConverterDriver
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.text.SequenceFilesFromMailArchives
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.classifier.sgd.TrainAdaptiveLogistic
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.vectors.VectorDumper
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.vectors.RowIdJob
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.clustering.streaming.tools.ClusterQualitySummarizer
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.SplitInput
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.clustering.streaming.tools.ResplitSequenceFiles
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.text.SequenceFilesFromLuceneStorageDriver
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.MatrixDumper
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.text.SequenceFilesFromDirectory
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.classifier.sgd.RunLogistic
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.ConcatenateVectorsJob
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.vectors.arff.Driver
Unknown program 'clusterdump' chosen.
Valid program names are:
baumwelch: : Baum-Welch algorithm for unsupervised HMM training
canopy: : Canopy clustering
cleansvd: : Cleanup and verification of SVD output
clusterpp: : Groups Clustering Output In Clusters
cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes
fkmeans: : Fuzzy K-means clustering
hmmpredict: : Generate random sequence of observations by given HMM
itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
kmeans: : K-means clustering
matrixmult: : Take the product of two matrices
parallelALS: : ALS-WR factorization of a rating matrix
recommendfactorized: : Compute recommendations using the factorization of a rating matrix
recommenditembased: : Compute recommendations using item-based collaborative filtering
rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
seq2encoded: : Encoded Sparse Vector generation from Text sequence files
seq2sparse: : Sparse Vector generation from Text sequence files
spectralkmeans: : Spectral k-means clustering
splitDataset: : split a rating dataset into training and probe parts
ssvd: : Stochastic SVD
streamingkmeans: : Streaming k-means clustering
svd: : Lanczos Singular Value Decomposition
testnb: : Test the Vector-based Bayes classifier
trainnb: : Train the Vector-based Bayes classifier
transpose: : Take the transpose of a matrix
vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors
viterbi: : Viterbi decoding of hidden states from given output states sequence
I was able to do the clustering and need some help with viewing the result. I get the following problem.
./mahout clusterdump -i /scratch/dummyvectoroutput/clusters-*-final -d /scratch/dummyvectorfinalclusters
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.
Running on hadoop, using /users/p529444/software/hadoop-1.0.3/bin/hadoop and HADOOP_CONF_DIR=/apps/hadoop/hadoop-conf
MAHOUT-JOB: /apps/mahout/trunk/examples/target/mahout-examples-0.9-SNAPSHOT-job.jar
Warning: $HADOOP_HOME is deprecated.
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.clustering.ClusterDumper
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.classifier.sgd.TrainLogistic
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.vectors.lucene.Driver
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.classifier.sgd.RunAdaptiveLogistic
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.SequenceFileDumper
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.classifier.sgd.PrintResourceOrFile
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.text.WikipediaToSequenceFile
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.classifier.ConfusionMatrixDumper
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.regex.RegexConverterDriver
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.text.SequenceFilesFromMailArchives
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.classifier.sgd.TrainAdaptiveLogistic
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.vectors.VectorDumper
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.vectors.RowIdJob
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.clustering.streaming.tools.ClusterQualitySummarizer
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.SplitInput
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.clustering.streaming.tools.ResplitSequenceFiles
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.text.SequenceFilesFromLuceneStorageDriver
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.MatrixDumper
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.text.SequenceFilesFromDirectory
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.classifier.sgd.RunLogistic
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.ConcatenateVectorsJob
13/12/20 14:21:56 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.utils.vectors.arff.Driver
Unknown program 'clusterdump' chosen.
Valid program names are:
baumwelch: : Baum-Welch algorithm for unsupervised HMM training
canopy: : Canopy clustering
cleansvd: : Cleanup and verification of SVD output
clusterpp: : Groups Clustering Output In Clusters
cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes
fkmeans: : Fuzzy K-means clustering
hmmpredict: : Generate random sequence of observations by given HMM
itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
kmeans: : K-means clustering
matrixmult: : Take the product of two matrices
parallelALS: : ALS-WR factorization of a rating matrix
recommendfactorized: : Compute recommendations using the factorization of a rating matrix
recommenditembased: : Compute recommendations using item-based collaborative filtering
rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
seq2encoded: : Encoded Sparse Vector generation from Text sequence files
seq2sparse: : Sparse Vector generation from Text sequence files
spectralkmeans: : Spectral k-means clustering
splitDataset: : split a rating dataset into training and probe parts
ssvd: : Stochastic SVD
streamingkmeans: : Streaming k-means clustering
svd: : Lanczos Singular Value Decomposition
testnb: : Test the Vector-based Bayes classifier
trainnb: : Train the Vector-based Bayes classifier
transpose: : Take the transpose of a matrix
vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors
viterbi: : Viterbi decoding of hidden states from given output states sequence