Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Filter Stopwords with Regular Expression
Anna_May1
Hi guys,
I'm currently doing a sentiment analysis in Rapidminer with Knn. I want to count the number of words that are left in the document when removing stopwords. Using the "Filter stopwords" operator inside the "process documents from data operator" only works if I tokenize the data and use the "Nominal to Text" operator first. The issue here is that the output then is as in the image below. I want to be able to count the words that are left after removing the stopwords, so I wonder if there is maybe a regular expression which could be used inside a "Replace" operator or so, to only remove the stopwords without tokenizing it.
Cheers!
Find more posts tagged with
AI Studio
RegEx
Sentiment Analysis
ETL + Data Prep
Accepted answers
All comments
jacobcybulski
@Anna_May1
I am unable to see the image as you have not attached it. However, it would be much easier to deal with stop words, or count words, after you tokenise the text. For example, you can have two streams of text processing, one with and and one without stop words, then for both you can count tokens and find the difference. In fact, when your text representation is by frequency, the counting is very simple - adding those frequencies within columns.
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups