Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

unexpected results in seqdump of reuters-matrix in quick tour of text analysis

$
0
0
All,

I am a newbie Mahout user and am trying to use the "Quick tour of text
analysis using the Mahout command line" . Thank you to whomever contributed
to that page.

Went all the way from beginning to end of the page with "seemingly" no
hiccups.
At the very end of the "tour", I became confused because the command:

Allowed me to see output (snippet)

Reading through that snippet of data made me think that there exists a
document with rowed 41154 with cosine value of ~0.0658 (the last element in
the snippet).

The problem is that the folder

Only has 21578 files in it. Indeed, my dictionary file (output command
used shown below)

Has a max key of

So I cannot find the document with key value 41154 . What does the 41154
related to????

Obviously I have misunderstood something that I did ­ or need to do ­ in the
tour. Can someone please shine a light on where I strayed? I have scripted
every step that I took and can share them here if desired (I noticed that
some of the output file names changed since the page was written ­ so I made
adjustments).

Regards,

SCott

PS Thanks TD for helping me earlier

Viewing all articles
Browse latest Browse all 5648

Trending Articles