Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

Wiki - 'Quick tour of text analysis using the Mahout command line' clarification

$
0
0
In the wiki page: 'Quick tour of text analysis using the Mahout command
line'.

https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line

At the very bottom it is said that

1. This will generate the 10 most similar docs to each doc in the
collection.

1. Examine the similarity list:
mahout seqdumper -i reuters-matrix/matrix | more

Instead of reuters-matrix/matrix, shouldn't it be reuters-similarity/
part-r-00000 since that is the file of the output of rowsimilarity? Or does
on the contrary the rowsimilarity tool also write to reuters-matrix/?

I would expect to contain the 10 most similar documents for every document
in the reuters' catalogue. Is that correct?

Many thanks.
Juanjo.

Viewing all articles
Browse latest Browse all 5648

Trending Articles