🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Why does RapidMiner delete datarows when automatic feature selection is applied?

User: "SanderMEs"
New Altair Community Member
Updated by Jocelyn
Maybe a very stupid question, but my input consists of 15577 data rows, my output only consists of 4500 data rows when I apply auto feature selection in data preparation.
In addition to that, can I reliably compare the confusion matrices of the baseline model (with 15577 rows) and the RapidMiner model (with +/- 4500 rows) when sizes differ but data is the same?

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "lionelderkrikor"
    New Altair Community Member
    Accepted Answer
    Updated by lionelderkrikor
    Hi @SanderMEs,

    No, it's not a stupid question : :  
    AutoModel is splitting your dataset in 2 parts: 
     - 60% of the data is used to train the model
     - 40% of the data is used to test the model (it is a hold out set).

    Then on your test set AutoModel remove 2/7 of your data in your test set.
    Your output data are the predictions and the associated confusion matrix and are based on this final test set, that's why your ouput files should represent 4500 rows (15577 x 40% x 5/7 rows)
    Regards,

    Lionel