Aggregation / compression instead of forecast / prediction

nicugeorgian
nicugeorgian New Altair Community Member
edited November 5 in Community Q&A
Hi,

I have a data set with both nominal and numerical attributes and a numerical label.

I'm trying to fit some regression tree on this set.

I would like to use the regression tree as an aggregation / compression of the data set rows and not as a forecast. Concretely, my regression tree is not going to be applied/shown to unseen data! So, the overfitting would not be problem in this case! Of course, I should avoid ending up with so many tree leaves as rows in the data set (that wouldn't be an aggregation anymore ;) )

The goal is, however, that the trained model (the regression tree) "predicts / reflects" as much as possible the training data.

Would the regression tree (Weka W-M5P) be the best solution for this problem? If yes, how shall I choose the algorithm's parameters?

I think it would be better if I select the option "no-prunning" ...

Any ideas?

Thanks!
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi,
    if the regression tree is the best algorithm depends on your needs. If you want an understandable model, choose it. Otherwise different alternatives are possible and possibly better. But you might to have to transform your data then, because LinearRegression or SVMs don't support nominal values.

    The best parameters for learners depend on your data, so you have to try it out.

    Greetings,
      Sebastian