Perform text classification with seperate test/train splits ?
kashif_khan
New Altair Community Member
Hi,
i am a newbie and dealing with text classification in rapid miner. I have seperate test/train splits and i want to select top k features with respect to information gain(for e.g with high information gain). In general(without feature selection) we need to provide output of "Process Documents From Files" (wordlist) used for train set to "Process Documents From Files" (wordlist) which is used for loading test set but how can we do the same if we need to apply feature selection to train set and provide the reduced feature as a vocabulary to test split ??
Kindly help i searched alot on internet but all have done with n-fold cross validation and i could'nt figure out how to use it with dedicated test/train splits
i am a newbie and dealing with text classification in rapid miner. I have seperate test/train splits and i want to select top k features with respect to information gain(for e.g with high information gain). In general(without feature selection) we need to provide output of "Process Documents From Files" (wordlist) used for train set to "Process Documents From Files" (wordlist) which is used for loading test set but how can we do the same if we need to apply feature selection to train set and provide the reduced feature as a vocabulary to test split ??
Kindly help i searched alot on internet but all have done with n-fold cross validation and i could'nt figure out how to use it with dedicated test/train splits
Tagged:
0
Answers
-
I figured it out myself ... by acquiring some help from stack overflow ...0