Perform text classification with seperate test/train splits ?

kashif_khan
kashif_khan New Altair Community Member
edited November 5 in Community Q&A
Hi,

i am a newbie and dealing with text classification in rapid miner. I have seperate test/train splits and i want to select top k features with respect to information gain(for e.g with high information gain). In general(without feature selection) we need to provide output of  "Process Documents From Files" (wordlist) used for train set to "Process Documents From Files" (wordlist) which is used for loading test set but how can we do the same if we need to apply feature selection to train set and provide the reduced feature as a vocabulary to test split ??

Kindly help i searched alot on internet but all have done with n-fold cross validation and i could'nt figure out how to use it with dedicated test/train splits 


Answers

  • kashif_khan
    kashif_khan New Altair Community Member
    I figured it out myself ... by acquiring some help from stack overflow ...