Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Why does RapidMiner delete datarows when automatic feature selection is applied?
SanderMEs
Maybe a very stupid question, but my input consists of 15577 data rows, my output only consists of 4500 data rows when I apply auto feature selection in data preparation.
In addition to that, can I reliably compare the confusion matrices of the baseline model (with 15577 rows) and the RapidMiner model (with +/- 4500 rows) when sizes differ but data is the same?
Find more posts tagged with
AI Studio
Feature Selection
Accepted answers
lionelderkrikor
Hi
@SanderMEs
,
No, it's not a stupid question : :
AutoModel is splitting your dataset in 2 parts:
- 60% of the data is used to train the model
-
40%
of the data is used to test the model (it is a hold out set).
Then on your test set AutoModel remove 2/7 of your data in your test set.
Your output data are the predictions and the associated confusion matrix and are based on this final
test set
, that's why your ouput files should represent 4500 rows (15577 x 40% x 5/7 rows)
Regards,
Lionel
All comments
lionelderkrikor
Hi
@SanderMEs
,
No, it's not a stupid question : :
AutoModel is splitting your dataset in 2 parts:
- 60% of the data is used to train the model
-
40%
of the data is used to test the model (it is a hold out set).
Then on your test set AutoModel remove 2/7 of your data in your test set.
Your output data are the predictions and the associated confusion matrix and are based on this final
test set
, that's why your ouput files should represent 4500 rows (15577 x 40% x 5/7 rows)
Regards,
Lionel
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups