Hi,
I'm building a ResultSet from a QueryIterator:
List<String> varNames = ...
QueryIterator queryIterator = ...
ResultSet sparqlResultSet = ResultSetFactory.create(queryIterator,
varNames);
String xmlResultString = ResultSetFormatter.asXMLString(sparqlResultSet);
When the query returns more than 70,000 rows, I get the following OOM (I've
haven't changed the default java heap size):
java.lang.OutOfMemoryError: Java heap space
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:133)
at java.lang.StringCoding.decode(StringCoding.java:173)
at java.lang.String.<init>(String.java:443)
at java.lang.String.<init>(String.java:515)
at com.hp.hpl.jena.sparql.resultset.OutputBase.asString(OutputBase.java:35)
at
com.hp.hpl.jena.query.ResultSetFormatter.asXMLString(ResultSetFormatter.java:548)
at
com.hp.hpl.jena.query.ResultSetFormatter.asXMLString(ResultSetFormatter.java:535)
Line 35 of OutputBase is the following:
try { return new String(arr.toByteArray(), "UTF-8") ; }
So it seems that ResultSet has been iterated through ( apply() in
ResultSetApply) and the problem is after that.
Is it fair to assume that because arr.toByteArray() is making another copy,
the memory duplicated, and that is why I'm getting an OOM.
However, if the query returns 200,000 rows, the new OOM error is the
following:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:263)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:106)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:190)
at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111)
at java.io.BufferedWriter.write(BufferedWriter.java:125)
at org.openjena.atlas.io.IndentedWriter.write(IndentedWriter.java:140)
at
org.openjena.atlas.io.IndentedWriter.printOneChar(IndentedWriter.java:135)
at org.openjena.atlas.io.IndentedWriter.print(IndentedWriter.java:99)
at
com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.printLiteral(XMLOutputResultSet.java:232)
at
com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.printBindingValue(XMLOutputResultSet.java:189)
at
com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.binding(XMLOutputResultSet.java:169)
at
com.hp.hpl.jena.sparql.resultset.ResultSetApply.apply(ResultSetApply.java:49)
at com.hp.hpl.jena.sparql.resultset.XMLOutput.format(XMLOutput.java:52)
at com.hp.hpl.jena.sparql.resultset.OutputBase.asString(OutputBase.java:34)
at
com.hp.hpl.jena.query.ResultSetFormatter.asXMLString(ResultSetFormatter.java:548)
at
com.hp.hpl.jena.query.ResultSetFormatter.asXMLString(ResultSetFormatter.java:535)
In this case, we ran out of memory while iterating through the ResultSet.
An obvious thing to do is to increment the java heap size. But the issue is
that the queries I'm running will return over a million rows. For such
queries, I've increased the heap size to 4gb, and I'm still getting an OOM
error.
Is there something I'm doing wrong? Or is the solution to increase even
more the heap space?
Thanks for your pointers!
Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com
I'm building a ResultSet from a QueryIterator:
List<String> varNames = ...
QueryIterator queryIterator = ...
ResultSet sparqlResultSet = ResultSetFactory.create(queryIterator,
varNames);
String xmlResultString = ResultSetFormatter.asXMLString(sparqlResultSet);
When the query returns more than 70,000 rows, I get the following OOM (I've
haven't changed the default java heap size):
java.lang.OutOfMemoryError: Java heap space
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:133)
at java.lang.StringCoding.decode(StringCoding.java:173)
at java.lang.String.<init>(String.java:443)
at java.lang.String.<init>(String.java:515)
at com.hp.hpl.jena.sparql.resultset.OutputBase.asString(OutputBase.java:35)
at
com.hp.hpl.jena.query.ResultSetFormatter.asXMLString(ResultSetFormatter.java:548)
at
com.hp.hpl.jena.query.ResultSetFormatter.asXMLString(ResultSetFormatter.java:535)
Line 35 of OutputBase is the following:
try { return new String(arr.toByteArray(), "UTF-8") ; }
So it seems that ResultSet has been iterated through ( apply() in
ResultSetApply) and the problem is after that.
Is it fair to assume that because arr.toByteArray() is making another copy,
the memory duplicated, and that is why I'm getting an OOM.
However, if the query returns 200,000 rows, the new OOM error is the
following:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:263)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:106)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:190)
at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111)
at java.io.BufferedWriter.write(BufferedWriter.java:125)
at org.openjena.atlas.io.IndentedWriter.write(IndentedWriter.java:140)
at
org.openjena.atlas.io.IndentedWriter.printOneChar(IndentedWriter.java:135)
at org.openjena.atlas.io.IndentedWriter.print(IndentedWriter.java:99)
at
com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.printLiteral(XMLOutputResultSet.java:232)
at
com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.printBindingValue(XMLOutputResultSet.java:189)
at
com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.binding(XMLOutputResultSet.java:169)
at
com.hp.hpl.jena.sparql.resultset.ResultSetApply.apply(ResultSetApply.java:49)
at com.hp.hpl.jena.sparql.resultset.XMLOutput.format(XMLOutput.java:52)
at com.hp.hpl.jena.sparql.resultset.OutputBase.asString(OutputBase.java:34)
at
com.hp.hpl.jena.query.ResultSetFormatter.asXMLString(ResultSetFormatter.java:548)
at
com.hp.hpl.jena.query.ResultSetFormatter.asXMLString(ResultSetFormatter.java:535)
In this case, we ran out of memory while iterating through the ResultSet.
An obvious thing to do is to increment the java heap size. But the issue is
that the queries I'm running will return over a million rows. For such
queries, I've increased the heap size to 4gb, and I'm still getting an OOM
error.
Is there something I'm doing wrong? Or is the solution to increase even
more the heap space?
Thanks for your pointers!
Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com