Rapidminer Go: Different Correlations displayed based on ordering of Data?

cramsden
cramsden New Altair Community Member
edited November 2024 in Community Q&A
When I upload a data set in rapid miner go it will rank my variables by correlation and suggest which ones I include or not.  I was trying to redo some of my previous projects and was getting very different results.  After a bunch of experimenting I am seeing if I upload the EXACT same data set but ordered differently it is displaying different correlations and suggestions to me.

Is this supposed to happen?  If so why?

Is it just estimating the correlations based off of the first 'x' number of rows instead of the entire data set?


Thank you
Tagged:

Best Answer

  • alebo
    alebo New Altair Community Member
    Answer ✓
    Hi Chris, 
    In case of input evaluation, we take a sample set of the uploaded data to return potential problems quickly. Therefore the order actual does matter. It also matters when it comes to model training, due to the way we split the data into training and test sets.
    Regards,
    Andras

Answers

  • alebo
    alebo New Altair Community Member
    Answer ✓
    Hi Chris, 
    In case of input evaluation, we take a sample set of the uploaded data to return potential problems quickly. Therefore the order actual does matter. It also matters when it comes to model training, due to the way we split the data into training and test sets.
    Regards,
    Andras
  • cramsden
    cramsden New Altair Community Member
    That's great to know, thanks so much!  

     How does it split the data?  I'm trying to figure out if my most recent data should be in the first rows of the spreadsheet or vice versa

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.