Predicting ints on titanic dataset.

VenomSwitch · March 2020

Excuse my noob level of understanding please, I'm brand new.

I am trying to predict mortality on the titanic dataset using cross-validation and linear regression. As you can only use numbers with linear regression, I have converted selected attributes (such as survived) using the 'nominal to numerical' operator. I can see it is working most of the time from looking at the data and rounding it to 1 or 0 however the predicted value is coming back as a double so it's showing as 0 correct predictions.

I suppose my question is how do I make rapidminer return an int instead of a double? I have tried using the 'real to integer' operator but it doesn't like me putting it anywhere!

Open to any suggestions.

Image: https://us.v-cdn.net/6030995/uploads/editor/m7/ruy87i7a1pi2.png

MartinLiebig · March 2020

Hi,

First, if you use a GLM operator it can handle binominal data. it uses the same trick you are doing here but without any hazzle for you.

Then, why exactly do you want a int over a double?

Anyway, one way to do it is to use Generate Attributes with

round([prediction(Survived=Yes)])

Best,

Martin

VenomSwitch · March 2020

When I say 'double' I actually mean 'real'.

MartinLiebig · March 2020

Hi,

First, if you use a GLM operator it can handle binominal data. it uses the same trick you are doing here but without any hazzle for you.

Then, why exactly do you want a int over a double?

Anyway, one way to do it is to use Generate Attributes with

round([prediction(Survived=Yes)])

Best,

Martin

VenomSwitch · March 2020

Hi Martin,

This is brilliant, GLM is exactly what I needed!

I wanted the integer because it was giving me values with the decimal point and becuase they didn't exactly match the '1'/'0' in the survived column it just told me every one was wrong with 0% accuracy (as it wasn't rounded to the '1'/'0' format in the dataset).

I couldn't figure out where to place the generate attributes operator but it doesn't really matter as GLM has sorted out my problem.

A very handy operator.

Cheers!

Joel

MartinLiebig · March 2020

Hi @VenomSwitch ,

you would basically put it after each and every apply model operator you are using. Great that the GLM worked.

Best,

Martin

VenomSwitch · March 2020

@mschmitz

I have got it working but now it seems to have a 100% accuracy rate which seems suspicious. I'm just going to stick with the GLM process I had before I think! If it isn't broke, don't fix it haha.

You've helped me out today though!

MartinLiebig · March 2020

Are you sure you applied the round on the prediction and not on the label attribute? That would explain it

VenomSwitch · March 2020

Here is my current process using generate attributes with linear regress instead of GLM.

My label is 'Survived = Yes'.

I tried using the same operator inside the cross-val aswell but same result; 100% correct prediction.

Image: https://us.v-cdn.net/6030995/uploads/editor/b2/y88zwrksk0wd.png

MartinLiebig · March 2020

Hi @VenomSwitch ,

then the other idea, are you sure that Survived = No is not part of the training ? That would also explain good results

cheers,

Martin

Predicting ints on titanic dataset.

Welcome!

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories