Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
prediction with svm
mines
Does anyone know how to make a prediction for the next ten days with the svm algorithm in rapidminer?
Find more posts tagged with
AI Studio
Accepted answers
All comments
BalazsBaranyRM
Hi!
Do you want to make a prediction for each of the next ten days, or just for the tenth day?
In the first case you would build a loop with ten iterations, filtering your data accordingly. Essentially, you build a data structure where the value of the selected day is the target variable (label), and you make sure to only use data 10 days before that. For example different averages (7 day, 30 day, year ago, ...) to get different aspects of the data.
The "tenth day prediction" is just a special case of this without the loop.
Note: this is what you have to do if you insist on using SVM. There are multiple more or less automatic time series prediction algorithms that do exactly what you want with a lot less effort.
Regards,
Balázs
mines
Hello
@BalazsBarany
!
I want to make a prediction for each os the next ten days. Can you explain to me how to create a loop in the rapidminer or if there any information about that ?
Regards
BalazsBaranyRM
Hi,
if you look at the operators under Utility/Process Control/Loops, you'll see a lot of different ones.
For this use case I would use Loop Values. It takes an example set with the nominal values (these would be your dates in a textual representation). The current value is available as a macro inside the loop, so you can easily select the data according to it.
Regards,
Balázs
mines
@BalazsBarany
But i should use that after apply a model or should do that in cross validation?
Thank you.
BalazsBaranyRM
Hi,
filtering the data for building the models happens before you build the model. You then apply the model to today's data.
E. g. if you want a prediction for the 7th day from now, you would filter out data from the last 6 or 7 days (depending on when you get the value for the current day) and build the model from that, with "today" being the target (label). This model can be applied to the unfiltered data up until today and it gives you the prediction for today + 7 days.
The point is to throw away data that you can't know yet for your prediction. You know the history and possibly today's value (maybe only in the afternoon, depending on the use case). You don't know tomorrow or the day after tomorrow, but you'd like to predict a future value. So you build the model from what you *can* know at the time of the model application, and you do that by filtering the past data accordingly.
Regards,
Balázs
mines
@BalazsBarany
thank you for your help. But i use a loop value and i should use the column data (which have all my dates) or choose the column that i want to predict? Because my goal is to make a prediction with svm algorithms and i want to predict de number of cases in a disease for the next 10 days.
Best regards
BalazsBaranyRM
Hi,
usually you would use the time series operators to build columns from the data history.
You probably have something like this:
Date | Cases
2021-05-13 | 13
2021-05-14 | 12
...
With the time series operators you can build moving averages over 3, 7, 14, 30 etc. days, or take the value before 10 days etc. You might have a seasonality in the data, in that case you would also care for the values 1 or 2 years before. But probably not with a new disease. And combinations between the values are also interesting to get a trend.
So the modeling datase would be something like this:
Date | Cases date-1 | Cases date-2 | Avg 7 days | Avg 14 days | Avg14 - Avg7 | etc.
You would then use the loop to filter data in a way I described: for the 10 days prediction you would use the most recent data as the label, but all the data that go into the model are filtered 10 days back in time.
Cheers,
Balázs
mines
But need to use svm algorithm, i can use both to the prediction?
Best regards,
BalazsBaranyRM
Yes, SVM works well with a large number of attributes.
I described the preprocessing necessary for creating the data structure that you use for modeling and validation. The modeling algorithm is your choice.
Regards,
Balázs
mines
I build a model and use the optimize grid and again apply a model and my dataset have 136 rows and in final output lost various data. But I don't understand why, can you help me
@BalazsBarany
?
Best regards,
BalazsBaranyRM
Hi,
you can set breakpoints (after or before execution) on operators to see what goes into them and what comes out of them. That way you can easily see where you lose data.
Regards,
Balázs
timothy_rij
@mines
, did you end up getting this to work? I am trying to do something similar but there are no tutorial videos on using loops or setting up a similar process.
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups