Build a regression model FOR EACH example

Hi,
Following my previous question (https://community.rapidminer.com/discussion/55089), I'm posting a different question regarding my next step.

I got a data set in which each row is a series of [value, date] points.
My goal is to build a linear regression model for each row.

Is it possible?...

Thanks,
Avihay

Find more posts tagged with

AI Studio

Linear Regression

Accepted answers

All comments

Telcontar120

Of course it is possible. You can use a Loop Examples to do so and then simply put your ML algorithm inside (and no need for split or cross validation if you are doing one at a time). But of course the related question would be why this is necessary? The variance of models produced on a single example is extraordinarily high and would probably not be robust. Plus you would have a high number of models to manage.
If you want something similar, you can also check out the "leave one out" cross-validation approach. This builds a model on n-1 examples (where n is your total example counts) and then validates that model on each example separately.

leviavihay

Hi @Telcontar120
I will go over the "Loop Examples" operator info, thanks.

Regarding you comment about whether it's even necessary - in this case each row is a different device. For each device I got different reads in different dates. I wish to build a linear regression model (for now) for each one to predict when it will go over a certain threshold (different one for each device)

Telcontar120

So an alternative approach to building separate models would be to include device type as a potential predictor, and then use LOO cross-validation as noted above. Basically, if you believe that similar predictive patterns should hold across devices, then you could use a combined model to make your prediction.
I would certainly at least check the performance of such a combined model before I went down the road of building and managing many separate models.
Another significant problem with your approach is that it will be very difficult to measure or assess the accuracy of the approach over time, since you will only have one record for which you can validate the model in the future (presumably, although if you have multiple time periods from the same device then you might be able to increase your sample in that way).