Predicting ints on titanic dataset.

VenomSwitch
VenomSwitch New Altair Community Member
edited November 5 in Community Q&A
Excuse my noob level of understanding please, I'm brand new.
I am trying to predict mortality on the titanic dataset using cross-validation and linear regression. As you can only use numbers with linear regression, I have converted selected attributes (such as survived) using the 'nominal to numerical' operator. I can see it is working most of the time from looking at the data and rounding it to 1 or 0 however the predicted value is coming back as a double so it's showing as 0 correct predictions.

I suppose my question is how do I make rapidminer return an int instead of a double? I have tried using the 'real to integer' operator but it doesn't like me putting it anywhere!
Open to any suggestions.

Best Answer

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    Hi,

    First, if you use a GLM operator it can handle binominal data. it uses the same trick you are doing here but without any hazzle for you.

    Then, why exactly do you want a int over a double?

    Anyway, one way to do it is to use Generate Attributes with

    round([prediction(Survived=Yes)])

    Best,
    Martin

Answers

  • VenomSwitch
    VenomSwitch New Altair Community Member
    edited March 2020
    When I say 'double' I actually mean 'real'.
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    Hi,

    First, if you use a GLM operator it can handle binominal data. it uses the same trick you are doing here but without any hazzle for you.

    Then, why exactly do you want a int over a double?

    Anyway, one way to do it is to use Generate Attributes with

    round([prediction(Survived=Yes)])

    Best,
    Martin
  • VenomSwitch
    VenomSwitch New Altair Community Member
    Hi Martin,

    This is brilliant, GLM is exactly what I needed!
    I wanted the integer because it was giving me values with the decimal point and becuase they didn't exactly match the '1'/'0'  in the survived column it just told me every one was wrong with 0% accuracy (as it wasn't rounded to the '1'/'0' format in the dataset).
    I couldn't figure out where to place the generate attributes operator but it doesn't really matter as GLM has sorted out my problem.

    A very handy operator.

    Cheers!
    Joel
  • MartinLiebig
    MartinLiebig
    Altair Employee
    you would basically put it after each and every apply model operator you are using. Great that the GLM worked.

    Best,
    Martin
  • VenomSwitch
    VenomSwitch New Altair Community Member
    I have got it working but now it seems to have a 100% accuracy rate which seems suspicious. I'm just going to stick with the GLM process I had before I think! If it isn't broke, don't fix it haha.
    You've helped me out today though! :smiley:

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Are you sure you applied the round on the prediction and not on the label attribute? That would explain it :)
  • VenomSwitch
    VenomSwitch New Altair Community Member
    Here is my current process using generate attributes with linear regress instead of GLM.
    My label is 'Survived = Yes'.
    I tried using the same operator inside the cross-val aswell but same result; 100% correct prediction.
  • MartinLiebig
    MartinLiebig
    Altair Employee
    then the other idea, are you sure that Survived = No is not part of the training ? That would also explain good results
    cheers,
    Martin