Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

Distance Measure in K-Means Java

$
0
0
Hi,

I am trying to cluster text with Canopy and K-Means. This is what I have and it works. But I’m curios if I should not somehow run K-Means with Tanimoto and Canopy with Euclidian instead? What is K-Means using in my setup? And why have the parameter for distance measure in KMeansDrivers run method been removed?

//Generate input clusters for K-means (instead of using random K)
CanopyDriver.run(conf,
TFIDF_VECTORS_PATH,
OUTPUT_PATH,
new TanimotoDistanceMeasure(),
t1,
t2,
runClusteringFalse,
clusterClassificationThreshold,
runSequential);

//Generate K-Means clusters
KMeansDriver.run(conf,
TFIDF_VECTORS_PATH,
new Path(OUTPUT_PATH,"clusters-0-final"),
KMEANS_OUTPUT_PATH,
convergenceDelta,
maxIterations,
runClustering,
clusterClassificationThreshold,
runSequential);

Im wondering this since I read that Canopy runs good with a fast distance measure so I was thinking of using Euclidian on Canopy and Tanimoto on K-means. Probably totally wrong but if someone could explain this it would be great.

Thank you!

Best regards,
Nicklas

Viewing all articles
Browse latest Browse all 5648

Latest Images

Trending Articles



Latest Images