Hello together,
I started with RapidMiner a few weeks agos (a fine tool btw) an I have the following situation: I have documents which I want to split into sentences and then count the number of words in each sentence so that I can check against a criteria (15 to 20 words) if the setence matches it - resulting in a positive or negative mark.
Later I will use a model and separate datasets (all documents vs. a new one) to check for the differences, i.e. compare the length of sentences against the training data. Here I though of k-NN and automatic classification, yet I'm not that far...
So far I managed to split my documents into sentences and I have them in a repository - so far, so good. However I can't find a way how to tell RapidMiner to count the words in each row (each row contains a sentence) - I mostly end up with getting the word frequency over all documents, which is not what I would like to have.
Does somebody know, how this can be achieved?
Thanks in advance.
Oliver