Hello!
I'm new to Rapidminer, and my main focus is to use it for text analysis for social media posts. I have a CSV file with several columns, and each row is a post/document. One of the columns is the text/body of the document. How can I select only that specific column for text analysis, but, at the same time, keep all other columns for further analysis, since they are still relevant?
Right now I have a process like:
Read CSV -> Select Atributes (to select only body column) -> Data to Documents -> Process Documents (Tokenize, Transform cases, N-Grams etc) -> WordList to Data
This works to see the list of most common words/n-grams, but now I lost all the related data for each document. I would like to, for example, filter the documents containing a specific n-gram or word. Any tip would be helpful.
Thanks!
Gustavo Velho