Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
filter stopword operator's result
Mohamad1367
hi.i want to see the result of filter stop words in my data set after applying this operator to my data set but i recieve collection of documents in result view...i put this operator(filter stop word) inside Loop collection operator...what do i do to solve this problem?
Find more posts tagged with
AI Studio
Filtering
Accepted answers
sara20
@Mohamad1367
Hello
Look at the screen please then according to that first you should download the rmp file then import it to your RM.
I hope this helps
sara
All comments
lionelderkrikor
Hi
@Mohamad1367
,
It is the normal behavior. You have to select an element of the collection to see for this selected document the results after applying
Filter Stopwords
operators.
But I guess that your final goal is not just to see your document after applying
Filter Stopwords
operator .. right ?
So it would be more useful to share your data (a priori the example set called "test") and describe explicitly what you want to do in fine.
This way we could help you more efficiently...
Regards,
Lionel
Mohamad1367
thanks for your response
@lionelderkrikor
i describe what i want to reach : i have a data set in persian language to do sentiment analysis on it. each row in my data set has a sentiment lable for example lable=5 means that this sentence is very positive
i want to do some text preprocessing steps on it like : tokenization , stop word filtering, steaming ,etc
for tokenization i install rosette extension that supports persian language
i share my data set here... what operators should i use to achieve this goal and sequence of them?
test.xlsx
lionelderkrikor
@Mohamad1367
,
Unfortunately, I'm not aware of a Stopwords Filters, steaming operators etc. for Persian in Rosette extension or in RapidMiner.
You could take a look at this text processing Python extension :
https://github.com/sobhe/hazm
Regards,
Lionel
sara20
@Mohamad1367
Hello
There is some good posts about persian text mining also there is a stop word for that in RM. I recommend you to search in community. You can find alot of useful posts for that.
Best regards
sara
lionelderkrikor
@Mohamad1367
,
@sara20
is right, you have resources for Persian text processing including stopwords dictionnary (Sorry for my previous post, I have not checked it in the community site...
)
In particular look at this thread including a
@sgenzer
post which explains where to find a dictionary for Persian stopwords :
https://community.rapidminer.com/discussion/55161/persian-dictionary
Hope this helps,
Regards,
Lionel
sara20
Also there is an other stop words here
stopword-per.txt
Mohamad1367
@sara20
@lionelderkrikor
thanks for your respons,, i have stop word dictionary in persian ..i forgott to upload here in previous comment...my problem is when i apply stop word filter operator to my data set i want to see the filtered result in result view but i can't do this
i only for tokenization apply rosette extension for other tasks such as steming , stop word filtering,etc i use text processing extension which is language independent and only needs to a dictionary
lionelderkrikor
@Mohamad1367
,
According to your dataset, I think I understood what you want to achieve : You want in fine create a model to do sentiment classification ? right ?
In this case, you will need the
Process Document from Data
operator and put all your text processing steps (
Tokenize
(again),
Filter stopwords
) INSIDE this operator.
Please check the process in attached file. You will see in exit of this process a word vector with the Stopwords (Persian) filtered.(Don't forget to set the path where your dictionary file for the stopwords is stored...)
From this starting point , you can create a model to perform sentiment classification, by adding a
Set Role
and a model (a classifier) of your choice after the
Process Document from Data
operator .
hope this helps,
Regards,
Lionel
Text_processing_Persian.rmp
Mohamad1367
thanks for your answer
@lionelderkrikor
.... I know that this is clear but please explain more which it is atthached, how can i run it?by drag and drop of the attached file to the design view and only connecting that to the result port?
sara20
@Mohamad1367
Hello
Look at the screen please then according to that first you should download the rmp file then import it to your RM.
I hope this helps
sara
Mohamad1367
@sara20
thank you very much
Mohamad1367
@lionelderkrikor
i run the proces that you are attached in previous post but i recieve only tokenized result.. stop words were not filtered...here i attached the screenshot of my result...can you help me please?
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups