Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

Building TFIDF Vectors from Solr Index

$
0
0
Hi,
I've been working with Mahout trunk and attempting to build the above from
a Solr 4.3.1 index as follows. I am using Hadoop 1.1.1 to do the processing.

h [ at ] CEE279Law3-Linux:~/Downloads/asf/mahout$ ./bin/mahout lucene.vector --dir
"../solr-4.3.1/example/multicore/e001/data/index/" --idField id --output
"../solr-4.3.1/example/multicore/e001/vector" --field content --dictOut
"file:/home/law/Downloads/asf/solr-4.3.1/example/multicore/e001/dictionary"
--weight TFIDF
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /home/law/Downloads/asf/hadoop-1.1.1/bin/hadoop
and HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/law/Downloads/asf/mahout/examples/target/mahout-examples-0.9-SNAPSHOT-job.jar
Warning: $HADOOP_HOME is deprecated.

Exception in thread "main" java.lang.NullPointerException
at
org.apache.mahout.utils.vectors.lucene.CachedTermInfo.<init>(CachedTermInfo.java:45)
at
org.apache.mahout.utils.vectors.lucene.Driver.dumpVectors(Driver.java:102)
at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:290)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

The content field in the schema.xml looks like this

<field name="content" type="text_general" stored="true" indexed="true"/>

which I think/hope must be the root of the problem. Can someone advise if I
need to add more configuration to this field for vecotrs to be built?

Thank you v much in advance.
Best
Lewis

Viewing all articles
Browse latest Browse all 5648

Trending Articles