Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

processing compressed files

$
0
0
Hi,

I'd like to use Mahout for clustering and classification where I have tens of
terabytes of data on Amazon's S3 storage service. Each file in my data will
generate one data point where I need to decompress the file and process it
prior to applying machine learning. Is it necessary to have all the files
pre-processed prior to using Mahout or is there a straightforward way to
combine the pre-processing with Mahout? For example, I have a script that
does the preprocessing and I somehow tell Mahout to run the script.

Pre-processing the files prior to running Mahout is simple, but Amazon
charges for the extra storage space the pre-processed files would use.

Thanks.

Eric

Viewing all articles
Browse latest Browse all 5648

Latest Images

Trending Articles



Latest Images