Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

How to analyze K-means clustering result with clusterDump

$
0
0
Hi all,

I am playing with mahout in particular I am trying to get result from clustering algorithms as K-means.
I am using the Hadoop 1.2 implementation on a HDinsight cluster along with Mahout 0.9.
What I am trying to do is getting a set of synthetic data and trying to clustering.
What I am running from the hadoop command line is the following command:

hadoop jar %mahoutdir%\mahout-examples-0.9-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job --input /user/myuser/simulation --output /user/myuser/simulation-output -k 5 -t1 20 -t2 50 -x 20 -ow

The Mapper and Reducer are apparently executed correctly but when I look at the results by running this command:

hadoop jar %mahoutdir%\mahout-examples-0.9-job.jar org.apache.mahout.driver.MahoutDriver clusterdump -i /user/myuser/simulation-output/clusters-5-final/ -of TEXT -o /user/myuser/output/simulation.txt

The result I got is a list of centroids, but this is not what I expect. I expect a set of cluster with all the data in.
I obviously making a mistake in some way, but I do not know how and where.

What am I doing wrong?
Why executing org.apache.mahout.clustering.syntheticcontrol.kmeans.Job I am not able to explicit the -cl option. If I do that I got an error.
Is there any other way to execute the k-means algorithm?

Thank you in advance for the help.
Regards,
Ernesto

Viewing all articles
Browse latest Browse all 5648

Trending Articles