Text Processing
LucasCu
New Altair Community Member
Hi all,
I'm new to RapidMiner, and have downloaded it for the soul purpose of text processing.
I have followed the introductory tutorial to text processing, and have successfully worked on a text document to look at word occurrences etc.
What I would like to do is use a codebook I have created (approximately 30 words/phrases) to see whether these particular groups of words occur within a text document (e.g. forced labor, compliance, human rights). I am hoping to do this on a number of documents in order to measure the frequency of phrases.
Would someone be able to point me in the direction of a tutorial or video that explains how I apply a specific list of words/phrases in text processing?
Any feedback would be greatly appreciated
Thank you,
Lucas
I'm new to RapidMiner, and have downloaded it for the soul purpose of text processing.
I have followed the introductory tutorial to text processing, and have successfully worked on a text document to look at word occurrences etc.
What I would like to do is use a codebook I have created (approximately 30 words/phrases) to see whether these particular groups of words occur within a text document (e.g. forced labor, compliance, human rights). I am hoping to do this on a number of documents in order to measure the frequency of phrases.
Would someone be able to point me in the direction of a tutorial or video that explains how I apply a specific list of words/phrases in text processing?
Any feedback would be greatly appreciated
Thank you,
Lucas
Tagged:
0
Answers
-
Hi, Lucas,
I, myself am new, but what I think you want to do is the following.
#1: Load your word list into a csv or excel file.
#2: run that through "process documents from data", leaving all of the options at default.
#3: within the process, simply connect the input and output ports
#4: go back up to your main process
If you pay attention, you will see that the process documents has two output ports. The second one is "word list". Now,
#1: setup your text mining process as you usually would
#2: connect the word list port (per the above) to the incoming wordlist port in your original process
This will most definitely solve your problem. In fact, as your experience increases, you will find this type of setup becomes increasingly more essential to your work: step 1, generate a word list and/or model; step 2, apply to another data set.
I hope this helps!
0