Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
"How to filter out a text so as to keep only words given in a list of words"
barthos
Hello,
I would like to filter out a text so that the operator keeps only the words of the text that are present in a list (provided) (or equally remove all the words that are not in the list). Ideally, the Stopword by dictionnary with an option "invert selection" would be perfect.
As a sided question, I would like to know the purpose of the entry "wor" (I guess it means word) in the Process_Document_from_Data operator.
Thanks,
Barthélémy
Find more posts tagged with
AI Studio
Text Mining + NLP
Filtering
Accepted answers
All comments
colo
Hi Barthélémy,
when I read your post I remembered a similar question posted some time ago. You can find it here:
http://rapid-i.com/rapidforum/index.php/topic,3493.0.html
(did you even search for it?
) But don't expect a fully satisfying solution there. I don't know if the developers have something new at hand today...
What
entry
"wor" do you mean? The input port of the operator??
Regards
Matthias
land
Hi,
if you have a word list and want to count only words that are in this word list, you simply can forward the word list to the "wor" input port of the process documents operator. Only then it is assured that for new texts the representation remains the same as during the training! If you don't do this the set of words can differ and the TF-IDF calculation will be different.
If you need to filter the text for having the text filtered and not a filtered TF-IDF representation, then there's unfortunately no way until now. You could raise a feature request in our bugtracker for that.
With kind regards,
Sebastian
barthos
Thanks a lot !
However, I've tried to make a list of words to pass to the entry "wor" but it looks like I haven't find the way to do it. Is there a special operator to tranform documents or example set into a list of words?
Thanks again,
Barthélémy
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups