Quantcast
Viewing all articles
Browse latest Browse all 5648

OutOfMemoryError: Java Heap Space in DocumentProcessor.tokenizeDocuments

Hi everyone,

I've been stuck on an OutOfMemoryError when attempting to run a
SparseVectorsFromSequenceFiles() Job in Java. I'm using Mahout 0.9 and
Hadoop 2.2, run in a Maven project. I've tried setting the heap
configurations through Java using a Hadoop Configuration that is passed to
the Job:

CONF.set("mapreduce.map.memory.mb", "1536");
CONF.set("mapreduce.map.java.opts", "-Xmx1024m");
CONF.set("mapreduce.reduce.memory.mb", "1536");
CONF.set("mapreduce.reduce.java.opts", "-Xmx1024m");
CONF.set("task.io.sort.mb", "512");
CONF.set("task.io.sort.factor", "100");

etc., but nothing has seemed to work. My Java heap settings are similar and
configured to "-Xms512m -Xmx1536m" when running the project. The data I'm
using is 100,000 sequence files totally ~250mb. It doesn't fail on a data
set of 63 sequence files ~2mb. Here is an example stack trace:

Exception in thread "Thread-18" java.lang.OutOfMemoryError: Java heap space
at sun.util.resources.TimeZoneNames.getContents(TimeZoneNames.java:205)
at
sun.util.resources.OpenListResourceBundle.loadLookup(OpenListResourceBundle.java:125)
at
sun.util.resources.OpenListResourceBundle.loadLookupTablesIfNecessary(OpenListResourceBundle.java:113)
(this seems to get thrown on different bits of code every time)
......
java.lang.IllegalStateException: Job failed!
at
org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:95)
at
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:257)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)

This is the code I'm running it with in order to pass in my own
Configuration:

SparseVectorsFromSequenceFiles VectorizeJob = new
SparseVectorsFromSequenceFiles();
VectorizeJob.setConf(CONF);
ToolRunner.run(VectorizeJob, args);, where args is a String[] of command
line options

Any suggestions would be greatly appreciated.

Justin Kay

Viewing all articles
Browse latest Browse all 5648

Trending Articles