Hi All, I'm very new to trying to use lucene2seq so I'm not sure if it's
just user error, but I'm experiencing some unexpected behavior when
running lucene2seq against my solr index (4.7.1). I've tried using both
0.9 and the trunk build of mahout. (And BTW, I have been able to
successfully run Reuters example as a test baseline.)
Here's the command I'm running:
$MAHOUT_HOME/bin/mahout lucene2seq -i
/home/ec2-user/solr/solr-data/solrindex/index -o solr/sequence -id
key_sha1hex -f body -xm sequential -q topics:diabetes -n 500
Excerpts from my solr schema:
<fieldname="content"type="text"stored="false"indexed="true"multiValued="true"/>
<fieldname="body"type="string"stored="true"indexed="false"/>
<!-- Use the indexed/un-stored "content" field for searching
--><copyField source="body" dest="content" />
<!-- field for the QueryParser to use when an explicit fieldname is
absent --><defaultSearchField>content</defaultSearchField>
When I use SolrAdmin and specify fl=body the search handler returns the
'body' field with data as expected. Yet I get the following error when
running lucene2seq and specify '-f body':
/IllegalArgumentException: Field 'body' does not exist in the index/
And if I specify '-f content', lucene2seq runs without errors or
warnings, but seqdumper output shows no values for any key:
/Key class: class org.apache.hadoop.io.Text Value Class: class
org.apache.hadoop.io.Text
Key: 96C4C76CF9D7449C724CA77CB8F650EAFD33E31C: Value:
Key: D6842B81B8D09733B50BEDB4767C2A5C49E43B20: Value:
Key: 61CB95FEE2C6BF0AC6E8A1F7738338CA36F42264: Value:
Key: 0F9903B72A7C9F0373A5171403B3AAEB291B16E1: Value: /
Can anyone give me any suggestions as to how to track down what might be
happening here?
Many thanks,
Terry
just user error, but I'm experiencing some unexpected behavior when
running lucene2seq against my solr index (4.7.1). I've tried using both
0.9 and the trunk build of mahout. (And BTW, I have been able to
successfully run Reuters example as a test baseline.)
Here's the command I'm running:
$MAHOUT_HOME/bin/mahout lucene2seq -i
/home/ec2-user/solr/solr-data/solrindex/index -o solr/sequence -id
key_sha1hex -f body -xm sequential -q topics:diabetes -n 500
Excerpts from my solr schema:
<fieldname="content"type="text"stored="false"indexed="true"multiValued="true"/>
<fieldname="body"type="string"stored="true"indexed="false"/>
<!-- Use the indexed/un-stored "content" field for searching
--><copyField source="body" dest="content" />
<!-- field for the QueryParser to use when an explicit fieldname is
absent --><defaultSearchField>content</defaultSearchField>
When I use SolrAdmin and specify fl=body the search handler returns the
'body' field with data as expected. Yet I get the following error when
running lucene2seq and specify '-f body':
/IllegalArgumentException: Field 'body' does not exist in the index/
And if I specify '-f content', lucene2seq runs without errors or
warnings, but seqdumper output shows no values for any key:
/Key class: class org.apache.hadoop.io.Text Value Class: class
org.apache.hadoop.io.Text
Key: 96C4C76CF9D7449C724CA77CB8F650EAFD33E31C: Value:
Key: D6842B81B8D09733B50BEDB4767C2A5C49E43B20: Value:
Key: 61CB95FEE2C6BF0AC6E8A1F7738338CA36F42264: Value:
Key: 0F9903B72A7C9F0373A5171403B3AAEB291B16E1: Value: /
Can anyone give me any suggestions as to how to track down what might be
happening here?
Many thanks,
Terry