Hi,
I have a data set with both nominal and numerical attributes and a numerical label.
I'm trying to fit some regression tree on this set.
I would like to use the regression tree as an aggregation / compression of the data set rows and not as a forecast. Concretely, my regression tree is not going to be applied/shown to unseen data! So, the overfitting would not be problem in this case! Of course, I should avoid ending up with so many tree leaves as rows in the data set (that wouldn't be an aggregation anymore

)
The goal is, however, that the trained model (the regression tree) "predicts / reflects" as much as possible the training data.
Would the regression tree (Weka W-M5P) be the best solution for this problem? If yes, how shall I choose the algorithm's parameters?
I think it would be better if I select the option "no-prunning" ...
Any ideas?
Thanks!