"[Solved] SVM parameter optimization of C and epsilon"

Question

Dear all,

I have never been in the favorable situation to attend lectures on data mining. Thus, I lack the mathematical details on how they actually work. Nevertheless, I'd like to have at least an understanding of what is being influenced by the parameters.

To start into this topic I read some papers and searched the forum. Promising results were:
- "A User's Guide to Support Vector Machines" by Ben-Hur and Weston
- http://www.svms.org/parameters/
- http://rapid-i.com/rapidforum/index.php/topic,5863.0.html

In my case I have the "SVM (linear)" with a linear kernel. In the above mentioned post it is said that for this type of SVM only the parameter C has to be optimized. However, there are yet other parameters that can be set.

So I was wondering
- What effect do the parameters "C", "convergence epsilon" and "epsilon" have on the model at all?
- What are reasonable threshholds for these parameters for a (logarithmic grid) optimization?

I'd appreciate if someone could help with these questions or give a link for an "easy to understand" introduction to SVMs.

Best regards
Sachs

qwertz · Answer

You are marvelous! Thanks again!

Best regards
Sachs

MariusHelf · Answer

As described above, the SVM internally optimizes a hyperplane. In each iteration, the costs (or so-called loss) is calculated, i.e. wrongly classified examples are "counted" in a way. In the next iteration the hyperplane is adjusted according to the errors discovered in the last iteration. The algorithm stops when the hyperplane is "good enough" according to some constraints, the so-called KKT-constraints. "Good enough" here is influenced by the convergence_epsilon. 
The other epsilon values specify a delta around the true value of each example for which a prediction is still considered as "correct".

But as I said, usually you do not need to touch any of these parameters.

Best regards,
Marius

qwertz · Answer

Hi Marius,

Thank you very much for taking the time to give such helpful and detailed explanation!!
Your support in this forum is as outstanding as the Rapidminer software itselt!

May I ask a last question on SVMs? The "SVM (Linear)" operator has got two parameters: "convergence epsilon" and just "epsilon". What is the difference then?

Best regards
Sachs