Hello,
Perhaps this is a simple question with a simple answer.
I am building a predictive model. As input I have several attributes, two of which are actually lists of words. For example, one attribute is called "keywords", and it contains a variable number of key terms.
I'm wondering if this attribute, which is really a list of terms, is being treated as a single text string/blob, rather than being parsed into individual words/tokens. RapidMiner's Auto Model suggests that this attribute is NOT helpful to the predictive modeling process, but I think that is because it is treating this attribute - which is actually a list of terms - as a single text string.
Thus, my questions are:
1) I assume that most/all models will treat quite differently a field such as this if it is treated a single text string vs. a list of individual keywords?
2) I don't know how to parse/tokenize this attribute so that what the model sees is a list of individual keywords rather than a single text string/blob.
Thanks in advance for any assistance or clarification.
- Adam