"Sentiment Analysis - Choosing the right process"
Kostas
New Altair Community Member
Dear All,
I am very new in this wonderful world of data mining and I have to say I am more than impressed. I will try to sum up my problem in few words:
I have an excel file with two columns -> column A contains phrases (text) expressing opinion on a certain matter while column B has the character n or p in case the sentiment in the aforesaid phrases is negative or positive respectively. Obviously, p and n have been inserted manually by me.
(e.g.: The site is very helpful->p / All this is awful->n)
What I want to do is to use the above mentioned file as a training set of data and use it to learn a model to apply on other data (that is similar phrases expressing opinion on a specific matter). What I need to know is which operators to use to establish the required process.
Really counting on your support and thanking you in advance,
Kind Regards
I am very new in this wonderful world of data mining and I have to say I am more than impressed. I will try to sum up my problem in few words:
I have an excel file with two columns -> column A contains phrases (text) expressing opinion on a certain matter while column B has the character n or p in case the sentiment in the aforesaid phrases is negative or positive respectively. Obviously, p and n have been inserted manually by me.
(e.g.: The site is very helpful->p / All this is awful->n)
What I want to do is to use the above mentioned file as a training set of data and use it to learn a model to apply on other data (that is similar phrases expressing opinion on a specific matter). What I need to know is which operators to use to establish the required process.
Really counting on your support and thanking you in advance,
Kind Regards
Tagged:
0
Answers
-
look here http://rapid-i.com/rapidforum/index.php/topic,3488.0.html or watch the video tutorial on http://vancouverdata.blogspot.com/0
-
Andk, thank you a lot for your response. The thing is that I had already checked out your post and allthough I can see the similarity, it still is quite different. I will try to explain better what I am aiming to. I have an excel sheet containing two columns, as follows:
A B
EXPRESSION POLARITY
I am sick with this situation n
They are idiots and incapable n
this is extremely useful p
it's getting worse everyday n
I believe it is a good step p
......................................
Once again, p stands for positive and n for negative attitude reflected on the short phrases of column A.
My question is which operator should be used to create a model which would learn from an excel sheet as the above mentioned.
The model in mention will then be used for an excel sheet consisting only of phrases and not sentiment (POLARITY).
Anyone with a piece of advice is more than welcome....
Thank you0 -
hey kostas
this is essentially the same as video 5 in the text analytics series on my blog.
you are trying to classify those phrases as negative or possitive. this is classification, with 2 classes.
you'll want to create a word vector, with a column for each (unique) word or n-gram, then use a classifier such as SVM to learn the model.0 -
Hi there Neil,
As you may have seen, I was so keen on finding a solution that I had sent you an email as well.
I have to say that the hints you gave were quite helpful and I am closer to get where I am aiming to. I am really thankful.
I will try a few things and come back in case any further question arises.
Ofcourse, anyone else's approach on the matter is welcome and I am looking forward to encountering it
0