Hello, all
I`m trying to run BuildForest (like here
mahout.apache.org/users/classification/partial-implementation.html)
with my data on 0.9 and get an error. However, it was successfully
work on 0.7 with exactly the same data 3 days ago.
Error:
INFO mapred.JobClient: Task Id : attempt_201409080941_0365_m_000000_0,
Status : FAILED
java.lang.RuntimeException: org.codehaus.jackson.JsonParseException:
Illegal character ((CTRL-CHAR, code 0)): only regular white space (\r,
\n, \t) is allowed between tokens
at [Source: java.io.StringReader [ at ] 4a661c48; line: 1, column: 2]
at org.apache.mahout.classifier.df.data.Dataset.fromJSON(Dataset.java:375)
at org.apache.mahout.classifier.df.data.Dataset.load(Dataset.java:330)
at org.apache.mahout.classifier.df.mapreduce.Builder.loadDataset(Builder.java:224)
at org.apache.mahout.classifier.df.mapreduce.MapredMapper.setup(MapredMapper.java:61)
at org.apache.mahout.classifier.df.mapreduce.partial.Step1Mapper.setup(Step1Mapper.java:72)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:138)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apac
I compared my data with KDDTest+.arff, using HEX viewer, and found no
structure differences – all delimiters are the same.
There are such "strange" things in my data: "\N" as C and "-100.0" as
N. But it was the same on 0.7 – and it worked!
Sample of data:
GLQBX7h70R575555n3EM,0,\N,-100.0,-100.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
GLQBX7h70R575555n3GB,0,W,29.5,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
What`s my problem?
WBR
Oleg
I`m trying to run BuildForest (like here
mahout.apache.org/users/classification/partial-implementation.html)
with my data on 0.9 and get an error. However, it was successfully
work on 0.7 with exactly the same data 3 days ago.
Error:
INFO mapred.JobClient: Task Id : attempt_201409080941_0365_m_000000_0,
Status : FAILED
java.lang.RuntimeException: org.codehaus.jackson.JsonParseException:
Illegal character ((CTRL-CHAR, code 0)): only regular white space (\r,
\n, \t) is allowed between tokens
at [Source: java.io.StringReader [ at ] 4a661c48; line: 1, column: 2]
at org.apache.mahout.classifier.df.data.Dataset.fromJSON(Dataset.java:375)
at org.apache.mahout.classifier.df.data.Dataset.load(Dataset.java:330)
at org.apache.mahout.classifier.df.mapreduce.Builder.loadDataset(Builder.java:224)
at org.apache.mahout.classifier.df.mapreduce.MapredMapper.setup(MapredMapper.java:61)
at org.apache.mahout.classifier.df.mapreduce.partial.Step1Mapper.setup(Step1Mapper.java:72)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:138)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apac
I compared my data with KDDTest+.arff, using HEX viewer, and found no
structure differences – all delimiters are the same.
There are such "strange" things in my data: "\N" as C and "-100.0" as
N. But it was the same on 0.7 – and it worked!
Sample of data:
GLQBX7h70R575555n3EM,0,\N,-100.0,-100.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
GLQBX7h70R575555n3GB,0,W,29.5,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
What`s my problem?
WBR
Oleg