Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

Problems with K-Means Spectral Clustering on EMR

$
0
0
Hi,

I tried to run Spectral clustering example from mahout website on EMR.

I uploaded to the bucket the following files:
affinity.txt (affinity matrix)
mahout-core-0.9-job.jar
mahout-core-0.9.jar
update-lucene.sh
lucene-4.3.0.tgz

The update-lucene.sh contains the following:

#!/bin/bash
cd /home/hadoop
wget https://s3.amazonaws.com/hellomahout/lucene-4.3.0.tgz
tar -xzf lucene-4.3.0.tgz
cd lib
rm lucene-*.jar
cd ..
cd lucene-4.3.0
find . | grep lucene- | grep jar$ | xargs -I {} cp {} ../lib

The Cluster configuration is the following:

Hadoop Distribution: Amazon, AMI version: 3.2.1

EC" instance types:
Master: m1.large, 1
Core: m1.large, 1
Task: None (m1.medium,1)

Bootstrap Actions:
Custom action, S3 location: s3://hellomahout/update-lucene.sh

Steps:

Custom JAR, JAR location: s3://hellomahout/mahout-core-0.9-job.jar,
Arguments: org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver
--input s3://hellomahout/testdata/affinity.txt --output
s3://hellomahout/testdata/results -d 3 -k 2 -x 10

When I try to run it, I get the following exception:

Exception in thread "main" java.io.FileNotFoundException: No such file
or directory 'hdfs://172.31.1.27:9000/user/hadoop/temp/calculations/unitvectors'
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:759)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:507)
at org.apache.mahout.clustering.kmeans.EigenSeedGenerator.buildFromEigens(EigenSeedGenerator.java:67)
at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:243)
at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:127)
at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:118)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.main(SpectralKMeansDriver.java:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Does anyone know what causes the exception?
Could anyone provide any suggestions about how to run spectral clustering
on EMR?

Thank you.

Niko

Viewing all articles
Browse latest Browse all 5648

Trending Articles