Text filtering problem! Please help!
karhunen
New Altair Community Member
Hey community,
I'm new in working with rapidminer and I try to filter multiple words from different pdf-files.
First I tried to filter just one word after tokenizing the files with the "Filter Tokens (by content)" Module.
I used the condition "contains" and specified my "string". This actually works fine.
Now i want to filter multiple words but i just dont know how to do this.
Can you please help me? I would really appreciate it!
Background:
I'm trying to classify some documents by using a wordlist with positive and negative words.
Rapidminer should analyse the given pdf-files regarding the amount of positive and negative words.
Any ideas?
I'm new in working with rapidminer and I try to filter multiple words from different pdf-files.
First I tried to filter just one word after tokenizing the files with the "Filter Tokens (by content)" Module.
I used the condition "contains" and specified my "string". This actually works fine.
Now i want to filter multiple words but i just dont know how to do this.
Can you please help me? I would really appreciate it!
Background:
I'm trying to classify some documents by using a wordlist with positive and negative words.
Rapidminer should analyse the given pdf-files regarding the amount of positive and negative words.
Any ideas?
Tagged:
0
Answers
-
Hello
You could use a word list to filter the document for those words only.
Here is an example that does more than you need.
http://rapidminernotes.blogspot.co.uk/2013/04/finding-needles-in-text-haystacks.html
You will need to make some changes for what you want.
regards
Andrew0