Why does RapidMiner delete datarows when automatic feature selection is applied?

New Altair Community Member

Nov 29, 2019

Updated Nov 5, 2024 by Jocelyn

Maybe a very stupid question, but my input consists of 15577 data rows, my output only consists of 4500 data rows when I apply auto feature selection in data preparation.
In addition to that, can I reliably compare the confusion matrices of the baseline model (with 15577 rows) and the RapidMiner model (with +/- 4500 rows) when sizes differ but data is the same?

Find more posts tagged with

AI Studio

Feature Selection

Sort by:

1 - 1 of 11

lionelderkrikor

New Altair Community Member

Accepted Answer

Nov 29, 2019

Updated Nov 29, 2019 by lionelderkrikor

Hi @SanderMEs,

No, it's not a stupid question : :
AutoModel is splitting your dataset in 2 parts:
- 60% of the data is used to train the model
- 40% of the data is used to test the model (it is a hold out set).

Then on your test set AutoModel remove 2/7 of your data in your test set.
Your output data are the predictions and the associated confusion matrix and are based on this final test set, that's why your ouput files should represent 4500 rows (15577 x 40% x 5/7 rows)
Regards,

Lionel

View in context

Why does RapidMiner delete datarows when automatic feature selection is applied?

Find more posts tagged with

Quick Links