Hi,
I am trying to cluster text with Canopy and K-Means. This is what I have and it works. But I’m curios if I should not somehow run K-Means with Tanimoto and Canopy with Euclidian instead? What is K-Means using in my setup? And why have the parameter for distance measure in KMeansDrivers run method been removed?
//Generate input clusters for K-means (instead of using random K)
CanopyDriver.run(conf,
TFIDF_VECTORS_PATH,
OUTPUT_PATH,
new TanimotoDistanceMeasure(),
t1,
t2,
runClusteringFalse,
clusterClassificationThreshold,
runSequential);
//Generate K-Means clusters
KMeansDriver.run(conf,
TFIDF_VECTORS_PATH,
new Path(OUTPUT_PATH,"clusters-0-final"),
KMEANS_OUTPUT_PATH,
convergenceDelta,
maxIterations,
runClustering,
clusterClassificationThreshold,
runSequential);
Im wondering this since I read that Canopy runs good with a fast distance measure so I was thinking of using Euclidian on Canopy and Tanimoto on K-means. Probably totally wrong but if someone could explain this it would be great.
Thank you!
Best regards,
Nicklas
I am trying to cluster text with Canopy and K-Means. This is what I have and it works. But I’m curios if I should not somehow run K-Means with Tanimoto and Canopy with Euclidian instead? What is K-Means using in my setup? And why have the parameter for distance measure in KMeansDrivers run method been removed?
//Generate input clusters for K-means (instead of using random K)
CanopyDriver.run(conf,
TFIDF_VECTORS_PATH,
OUTPUT_PATH,
new TanimotoDistanceMeasure(),
t1,
t2,
runClusteringFalse,
clusterClassificationThreshold,
runSequential);
//Generate K-Means clusters
KMeansDriver.run(conf,
TFIDF_VECTORS_PATH,
new Path(OUTPUT_PATH,"clusters-0-final"),
KMEANS_OUTPUT_PATH,
convergenceDelta,
maxIterations,
runClustering,
clusterClassificationThreshold,
runSequential);
Im wondering this since I read that Canopy runs good with a fast distance measure so I was thinking of using Euclidian on Canopy and Tanimoto on K-means. Probably totally wrong but if someone could explain this it would be great.
Thank you!
Best regards,
Nicklas