Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

Annotation based vectorizer

$
0
0
Hi all,

I put together a utility which vectorizes plain old Java objects annotated
with @Feature and @Target via Mahout's vector encoders.

See my Github branch:
https://github.com/frankscholten/mahout/tree/annotation-based-vectorizer

and the unit test:
https://github.com/frankscholten/mahout/blob/annotation-based-vectorizer/core/src/test/java/org/apache/mahout/classifier/sgd/AnnotationBasedVectorizerTest.java

Use it like this:

class NewsgroupPost {

@Target
private String newsgroup;

@Feature(encoder = TextValueEncoder.class)
private String newsgroup;

// Getters setters

AnnotationBasedVectorizer<NewsgroupPost> vectorizer = new
AnnotationBasedVectorizer<NewsgroupPost>(new
TypeReference<NewsgroupPost>(){});

Here the vectorizer scans the NewsgroupPost's annotations. Then you can do
this:

NewsgroupPost post = ...

Vector vector = vectorizer.vectorize(post);
int target = vectorizer.getTarget(post);
int numFeatures = vectorizer.getNumberOfFeatures();

Note that vectorize() and getTarget() methods are genericly typed and due
to the type token passed in the constructor we can enforce that only
NewsgroupPosts are accepted.

The vectorizer uses a Dictionary for encoding the target.

Thoughts?

Cheers,

Frank

Viewing all articles
Browse latest Browse all 5648

Latest Images

Trending Articles



Latest Images