Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

Naive Bayes Classifier Sentiment Analysis

$
0
0
Hi , I am trying to develop sentiment analysis on italian tweet from twitter using the naive bayes classifier, but I’ve some trouble.

My idea was to classify a lot of tweet as positive, negative or neautral, and using that as training set for the Classifier. To do that I’ve wrote a sequence file, in the format <Text,Text>, where in the key there is /label/tweetID and in the key the text, and then the text of all the dataset is converted in tfidf vector, using mahout utilities.

Then I’m using the command:

./mahout trainnb and ./mahout testnb to check the classifier, and the score is right (I’ve got nearly 100% because the test set is the same as the train set)

My question is if I want to use a test set that is unlabeled how should it be created? because if the format isn’t like:

key = /label/ the classifier can’t find the label and I’ve got an exception

but in a new dataset, obviously this will be unlabeled because i need to classify that, so I don’t know what put in the key of the sequence file.

I’ve searched online for some example, but the only ones that I’ve found use the split command, on the original dataset, and then testing on part of that, but isn’t my case.

Every idea for developing a better sentiment analysis is welcome, thanks in advance for the help.

Viewing all articles
Browse latest Browse all 5648

Trending Articles