Advice kindly sought from any "seasoned" data predictors / miners out there.
I have created an experiment within Rapidminer to iterate through different inputs and modelling configurations, attempting to find the "best prediction fit" for my data.
The data : consists of 3100 rows of learning data and 300 rows of unseen testing data.
Each dot on the graph below represents an individual model plotted at its learning performance vs testing performance. ( the scale is not relevant )
My question is : which model should I choose to produce the most reliable and robust prediction of new "unseen" data?
- Choose a model from the ORANGE area where the training performance was very good, but the testing performance was poor.
- Choose a model from the BLUE area where the training performance was good, but the testing performance was good.
- Choose a model from the GREEN area where the training performance was poor, but the testing performance was very good.
Ask any questions, and thank you in advance for your help.