In the wiki page: 'Quick tour of text analysis using the Mahout command
line'.
https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line
At the very bottom it is said that
1. This will generate the 10 most similar docs to each doc in the
collection.
1. Examine the similarity list:
mahout seqdumper -i reuters-matrix/matrix | more
Instead of reuters-matrix/matrix, shouldn't it be reuters-similarity/
part-r-00000 since that is the file of the output of rowsimilarity? Or does
on the contrary the rowsimilarity tool also write to reuters-matrix/?
I would expect to contain the 10 most similar documents for every document
in the reuters' catalogue. Is that correct?
Many thanks.
Juanjo.
line'.
https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line
At the very bottom it is said that
1. This will generate the 10 most similar docs to each doc in the
collection.
1. Examine the similarity list:
mahout seqdumper -i reuters-matrix/matrix | more
Instead of reuters-matrix/matrix, shouldn't it be reuters-similarity/
part-r-00000 since that is the file of the output of rowsimilarity? Or does
on the contrary the rowsimilarity tool also write to reuters-matrix/?
I would expect to contain the 10 most similar documents for every document
in the reuters' catalogue. Is that correct?
Many thanks.
Juanjo.