Hi,
as a part of my diploma thesis I am trying to predict visit rate at various sport centres. I have a data set in a form:
date month day visit
My goal is to predict visit rate at least 30 days into the future. Because visit rate is highly seasonal variable, I added month (1-12) and day in a week (1-7) to help the prediction.
For about a week I am trying to setup the process in Rapidminer, find out which operators to use (I read almost every thread on this forum about time series prediction), but prediction trend accuracy is still too low (about 60-70%) and the actual predictions don't look good

I am a beginner in a field of data mining (and RM), so I don't know if the problem is in the process or quality of data.
Process:
<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="C:\Users\Excalibur\Desktop\hall1.csv"/>
<parameter key="id_column" value="1"/>
</operator>
<operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
<parameter key="horizon" value="30"/>
<parameter key="window_size" value="20"/>
<parameter key="label_attribute" value="visit"/>
</operator>
<operator name="SlidingWindowValidation" class="SlidingWindowValidation" expanded="yes">
<parameter key="create_complete_model" value="true"/>
<parameter key="training_window_width" value="120"/>
<parameter key="training_window_step_size" value="1"/>
<parameter key="test_window_width" value="60"/>
<parameter key="horizon" value="30"/>
<operator name="LibSVMLearner" class="LibSVMLearner">
<parameter key="svm_type" value="epsilon-SVR"/>
<parameter key="gamma" value="0.0010"/>
<parameter key="C" value="10.0"/>
<list key="class_weights">
</list>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="ForecastingPerformance" class="ForecastingPerformance">
<parameter key="horizon" value="30"/>
<parameter key="prediction_trend_accuracy" value="true"/>
</operator>
<operator name="ProcessLog" class="ProcessLog">
<list key="log">
<parameter key="trend_accuracy" value="operator.ForecastingPerformance.value.prediction_trend_accuracy"/>
<parameter key="performance1" value="operator.SlidingWindowValidation.value.performance"/>
<parameter key="performance2" value="operator.SlidingWindowValidation.value.performance2"/>
</list>
</operator>
</operator>
</operator>
<operator name="ModelWriter" class="ModelWriter">
<parameter key="model_file" value="C:\Users\Excalibur\Desktop\hall1.mod"/>
<parameter key="overwrite_existing_file" value="false"/>
</operator>
</operator>
1) Should the parameter horizon match up in all operators?
2) I am not sure, if I fully understand sliding window validation - in first iteration model is trained on 120 examples (1-120), then apllied on a test set (examples 151-210, because we skipped 30 (horizon) examples?).Prediction is calculated for each example in a test set and compared with the value 30 days ahead (e.g. prediction for example 151 is compared with example 181?)
3) As suggested in one thread, I tried to use WindowExamples2ModelingData (right after MultivariateSeries2WindowExamples) to increase prediction accuracy. I set label_name_stem to 'visit', but I was confused with the parameter horizon. There was an error, when I set horizon to 30: The value '30' for the parameter 'horizon' cannot be used: the horizon has to be larger than the window width.
But horizon (30) is larger than window width (20), or window width is meant to be something else in this context? I got it to work only if window_size in MultivariateSeries2WindowExamples was larger than horizon in WindowExamples2ModelingData.
4)I found only one way to get those 30 predictions (and I think it is not the right one) - added 30 new examples (date,month,day,blank visit) into example set, applied same preprocessing steps as I applied by creating the model, apllied the model and got new attribute prediction(label). But it seemed, that values of this attribute were shifted by 30 rows - e.g prediction for 1.1. was in fact prediction for 31.1. Is there any operator, which can shift the values horizon steps, so that I can easily compare real and predicted values?
Thanks in advance,
Dusan