Apply Model: Testing & Training Sets Differ

New Altair Community Member

Jul 2, 2020

Updated Nov 5, 2024 by Jocelyn

Hi
I am using Sentiment 140 as my training and testing data. They have already split the data into two sets. I am performing training, cross validation and testing all separately. Training and CV on the training set and testing on the testing set. The problem I have is that after text preprocessing, the features in the test set don't align with those of the training set and therefore I can't apply the trained model. In text preprocessing, my end product is a matrix where texts are the examples and the features are aligned to the term frequencies which will be different for the training and test sets.
Do I somehow merge both sets so that the features are aligned and TF = 0?
Thanks

Find more posts tagged with

AI Studio

Sort by:

1 - 3 of 31

Telcontar120

New Altair Community Member

Accepted Answer

Jul 10, 2020

The word list elements will be constrained but the TF-IDF values will be recalculated on the new sample in Process Documents.

View in context

jacobcybulski

New Altair Community Member

Accepted Answer

Jul 11, 2020

Be careful here, if your text processing in training uses pruning, make sure that in testing not only you use your saved word list to constrain the terms used in TF-IDF vector, as suggested by @Telcontar120, but you must switch off pruning, or else your word list may be shrunk in the pruning process thus rendering the two sets incompatible when applying the model to a test data.

View in context

jacobcybulski

New Altair Community Member

Accepted Answer

Jul 12, 2020

I have noticed now that you reduce dimensionality with weight-select method, in which case pass the list of weights to your testing branch, in which you do not need the weighing operator and you use the select using the weights from training.

View in context

Apply Model: Testing & Training Sets Differ

Find more posts tagged with

Quick Links