Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
entropy
rafeena
if i would like to calculate the entropy for each word, during my preprocessing what should i set my word vector to? it would not be advisable to set it to TFIDF right?
Find more posts tagged with
AI Studio
Accepted answers
Telcontar120
In that case, yes, it will affect entropy because the calculation of TFIDF is not simply a linear transformation of frequency. It is impossible to say in advance which would give you better results. As I mentioned before, I would probably start with term occurrences first since that is more representative of the data in its raw form. RapidMiner will allow you to easily do it both ways and compare the results!
All comments
Telcontar120
Can you clarify, what do you mean by calculating the entropy of each word? Vectorization is simple preprocessing of texts in an unsupervised fashion, whereas entropy usually is with respect to a label. So there is no built-in vector metric that would supply anything like a conventional entropy measure. If you are asking which vector you should use if you want to calculate entropy later, then I would think the simple term occurrences would be the appropriate one since that is merely a count of all instances of a given token in a given document.
rafeena
i would like to use entropy and TFIDF as my feature selection method. i would like to know will it effect the entropy result if i set the word vector to TFIDF.
Telcontar120
In that case, yes, it will affect entropy because the calculation of TFIDF is not simply a linear transformation of frequency. It is impossible to say in advance which would give you better results. As I mentioned before, I would probably start with term occurrences first since that is more representative of the data in its raw form. RapidMiner will allow you to easily do it both ways and compare the results!
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups