Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
tfidf and entropy
rafeena
hi i would like to do a feature selection using tfidf and entropy .my question are as below
1. i would like to know do i need to use the generate tfidf or is the tfidf in word vector is enough?
2. for entropy do i use weight by information gain of discretize by entropy
Find more posts tagged with
AI Studio
Accepted answers
Jeff_Mergler
Hi
@rafeena
,
I'm not entirely certain I understand the question, but I think I can help.
1. There are many valid ways of getting the TFIDF scores. You do not need to use any particular operator like Generate TFIDF. I think what you are looking for is a data structure with words as attributes and TFIDF scores as values. If you have that, then it does not matter how you got it. If you do not have that, then please share a sample of what you do have so we can better help.
2. Your goal is feature selection, so the Weight by Information Gain operator would work. This will score the attributes. After getting the weights you may consider using the Select by Weights operator.
Please consider sharing your process with sample data, so we can provide more precise help if you need it.
Jeff
All comments
sgenzer
hi
@rafeena
I'm sorry no one has chimed in here. Is this still an issue? If so I can try to find someone internal who may know the answer.
Scott
rafeena
Hi. Unfortunately it is still an issue, i really want to understand the differences of the methods available... Hope you can help.. Thank you..
Jeff_Mergler
Hi
@rafeena
,
I'm not entirely certain I understand the question, but I think I can help.
1. There are many valid ways of getting the TFIDF scores. You do not need to use any particular operator like Generate TFIDF. I think what you are looking for is a data structure with words as attributes and TFIDF scores as values. If you have that, then it does not matter how you got it. If you do not have that, then please share a sample of what you do have so we can better help.
2. Your goal is feature selection, so the Weight by Information Gain operator would work. This will score the attributes. After getting the weights you may consider using the Select by Weights operator.
Please consider sharing your process with sample data, so we can provide more precise help if you need it.
Jeff
rafeena
hi
jmergler
, thanks for your reply.my project is about cyberbullying detection using term weighting scheme and i planned to use tfidf and entropy. i have attached my sample data and i share my tfidf process first. put put a screenshot of what i did
formspring-project training-2.xlsx
tfidf process.docx
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups