"Is it possible to see the stopword list from the extension Text Procession?"
Prentice
New Altair Community Member
I want to see the English word list from the stopword operator. Because I need to filter one stopword out, "no". In my case this is not a stopword and gives crucial information about something. For example: "no fault found" would be converted to "fault found". I've looked online for stop word lists but they differ a lot, so I want to see what the extension uses.
Tagged:
0
Best Answer
-
Hello @Prentice,
For your case, I think it is better to get a list of words, see which ones are stopwords for your case and build a custom dictionary. I haven't been able to find the list of stopwords either, so that's what I do. I suspect that the list of stopwords basically includes all the common words that don't correspond to verbs, adjectives or nouns, and that the use case is to extract context, rather than interpreting meaning.
With context I mean: both sentences "food is good" and "food is not good", after being stripped from "is" and "not", are talking about food quality; Meaning is separating between "food is good" and "food is bad". I explain it here because I'm not a native English speaker and probably it's not the way these words are used.
Hope this helps,
Rodrigo.5
Answers
-
Hello @Prentice,
For your case, I think it is better to get a list of words, see which ones are stopwords for your case and build a custom dictionary. I haven't been able to find the list of stopwords either, so that's what I do. I suspect that the list of stopwords basically includes all the common words that don't correspond to verbs, adjectives or nouns, and that the use case is to extract context, rather than interpreting meaning.
With context I mean: both sentences "food is good" and "food is not good", after being stripped from "is" and "not", are talking about food quality; Meaning is separating between "food is good" and "food is bad". I explain it here because I'm not a native English speaker and probably it's not the way these words are used.
Hope this helps,
Rodrigo.5 -
Thanks a lot. I will do this, I was just wondering if that list may have had some words which I can't find. But I guess this'll have to do.0
-
BTW, I personally do not really use stopword filtering any more but actually use frequency based pruning instead which delivers similar results but I find it easier to tune to keep things in (or out) as desired... That might be another idea to try instead.Cheers,Ingo1