Aggregation / compression instead of forecast / prediction

Question

Hi,

I have a data set with both nominal and numerical attributes and a numerical label.

I'm trying to fit some regression tree on this set.

I would like to use the regression tree as an aggregation / compression of the data set rows and not as a forecast. Concretely, my regression tree is not going to be applied/shown to unseen data! So, the overfitting would not be problem in this case! Of course, I should avoid ending up with so many tree leaves as rows in the data set (that wouldn't be an aggregation anymore ;)  )

The goal is, however, that the trained model (the regression tree) "predicts / reflects" as much as possible the training data.

Would the regression tree (Weka W-M5P) be the best solution for this problem? If yes, how shall I choose the algorithm's parameters?

I think it would be better if I select the option "no-prunning" ...

Any ideas?

Thanks!