"Optimizing Parameters for SVM"
Cleo
New Altair Community Member
Using various optimization operators, in combination with cross validation and performance operators I want to improve the performance of my SVM. I have tried all the available kernels, and try different values for C.
Are there any “rules of thumb” of what ranges the parameter “C” can be? Are there any other parameters you would recommend varying?
Thanks,
Cleo
Are there any “rules of thumb” of what ranges the parameter “C” can be? Are there any other parameters you would recommend varying?
Thanks,
Cleo
0
Answers
-
According to "A Practical Guide to Support Vector Classification"
http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
It is best to use a loose grid search on C = 2^-5, 2^-3 .... 2^15 and Gamma = 2^-15, 2^-13 ....2^3, then once a region is deterimined use a tighter grid. Is this correct?
Also they recomend the libSVM, using the RBG Kernal.
Another paper suggests using a hybrid system.
"The nonlinear SVM model is applied right after the
linear SVM to forecast the nonlinear data pattern of residuals from the linear SVM
model."
Could this be accomblished with the "Stacking" operator?
Thanks,
Cleo0 -
Hi,
In principle, yes. There is, however, no general correct range for C (which is what I elaborate in my PhD about 50 pages )
then once a region is deterimined use a tighter grid. Is this correct?
About predicting the residuals with a non-linear model after performing the global one first: This can help - sometimes. Stefan Rüping discussed this "global" vs. "local" model approach in his PhD but in general I did not get the feeling that it has to help in terms of accuracy but more in understandability. The non-linear model is likely to get the basic linear model as well. It's more about risk for overfitting (which should not happen with correct parameters) and that people understand linear models better.
Cheers,
Ingo0 -
Hello Ingo,
Thanks for the response and congratulations on the nomination for the dissertation award. On February 9, 2010 I took the “Financial Data Mining with RapidMiner” course with Ralf Klinkenberg and have unsuccessfully trying to duplicate the results he presented.
The first process that Ralf Klinkenberg demonstrated used the closing price of the S&P 500 as the only input and I have made a very simple 5.0 version based on his 4.6 version.
The problem I think I have is it seems every data point becomes a support vector, which leads me to believe the model memorizes the data instead of learning any patterns. I have tried adding an optimize parameter operator both a grid and evolutionary to adjust the window size, and the kernel type, c and epsilon value.
I then plan to add inputs, several papers suggest moving averages of different lengths and wavelet transformation, and Ralf Klinkenberg suggested using Fourier transformation.
How would you recommend improving this model?
Cheers,
Cleo
Data
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="418" width="433">
<operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="49" y="28">
<parameter key="file_name" value="C:\Projects\RM5\timeseries\data\daily_sap.csv"/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="182" y="16">
<parameter key="name" value="label"/>
<parameter key="target_role" value="label"/>
</operator>
<operator activated="true" class="series:windowing" expanded="true" height="76" name="Windowing" width="90" x="112" y="165">
<parameter key="horizon" value="1"/>
<parameter key="window_size" value="20"/>
<parameter key="create_label" value="true"/>
<parameter key="label_attribute" value="label"/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role (2)" width="90" x="45" y="120"/>
<operator activated="true" class="series:sliding_window_validation" expanded="true" height="112" name="Validation" width="90" x="246" y="165">
<parameter key="training_window_width" value="400"/>
<parameter key="training_window_step_size" value="200"/>
<parameter key="test_window_width" value="50"/>
<process expanded="true" height="400" width="172">
<operator activated="true" class="support_vector_machine" expanded="true" height="112" name="JMySVMLearner" width="90" x="45" y="75">
<parameter key="C" value="1.0"/>
</operator>
<connect from_port="training" to_op="JMySVMLearner" to_port="training set"/>
<connect from_op="JMySVMLearner" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true" height="400" width="172">
<operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="47" y="147">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="76" y="337"/>
<operator activated="false" class="performance_regression" expanded="true" height="76" name="RegressionPerformance" width="90" x="26" y="260">
<parameter key="absolute_error" value="true"/>
<parameter key="relative_error" value="true"/>
<parameter key="root_relative_squared_error" value="true"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Windowing" to_port="example set input"/>
<connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="model" to_port="result 2"/>
<connect from_op="Validation" from_port="training" to_port="result 3"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
http://dl.dropbox.com/u/3978768/daily_sap.csv
0 -
Hi Cleo,
Thanks! :-*
Thanks for the response and congratulations on the nomination for the dissertation award.
For regression problems this is quite normal and often all training points ends up as support vectors so this is nothing to worry about in principle. This especially holds for all non-linear kernel functions. The important thing is if overfitting actually occured which can only be tested by evaluation the model on an independent test set.
The problem I think I have is it seems every data point becomes a support vector, which leads me to believe the model memorizes the data instead of learning any patterns. I have tried adding an optimize parameter operator both a grid and evolutionary to adjust the window size, and the kernel type, c and epsilon value.
In general, I would suggest to optimize the appropriate kernel parameters as well, for example, gamma (or sigma) for a radial basis function kernel function. Those parameters in combination with C are often much more important than all other SVM parameters. Taking the window size into account is also recommended.
And that's the important point: I would also recommend to shift your focus on extracting additional features and consider this to be much more important than the actual learning scheme. Appropriate feature plus a simple linear regression often perform much better than highly optimized SVM or neural nets. On the other hand, using those more complex non-linear learning schemes often add not much more accuracy on a well-optimized feature space.
I then plan to add inputs, several papers suggest moving averages of different lengths and wavelet transformation, and Ralf Klinkenberg suggested using Fourier transformation.
All mentioned additional features can help, I would also consider additional single features taken from the frequency space and even from the phase space.
Cheers,
Ingo
0