Support vector machine

Pallab
Pallab New Altair Community Member
edited November 5 in Community Q&A
Hello all machine learning experts, I am naive in machine learning topics. My data have six features(6 regular attributes) and 2 labels(1 special attribute)(true and false)(hope I used right term). I want to combine those features which has to be trained by SVM. Data looks like that:-

ZDis       ZAnch     ZSurf     Zval     ZDom     ZEntropy  Top5
0.48659   -0.20412  1.19243   0.15374  0.59667   1.34151   False
-0.10067  4.89898   -0.73677  0.22506  0.59667   1.34151   True
2.24837   -0.20412  -2.02291  0.22455  0.59667   1.34151   False
0.48659   -0.20412  1.19243   -0.06352 0.59667   1.34151   False
-0.68793  -0.20412  1.19243   0.12405  0.59667   1.34151   False
-2.02698  -0.40825  1.86371   0.07348  1.3272    -0.1242   False
-0.1807   2.44949   0.17865   0.07345  0.9401    0.1505    False
1.66557   2.44949   -1.50641  0.07381  0.9401    1.30135   False
1.11169   -0.40825  0.34716   0.07381  0.9401    -0.20225  True
1.5337    -0.40825  -0.01393  0.07381  -0.9954   0.53144   False
-0.01945  -0.48348  -1.16128  0.11035  2.02339   0.90237   False
-1.52944   3.23556  0.23428   0.11093  1.22613   -0.12973  False
0.43354   -0.48348  -2.20795  0.11093  1.22613   2.25734   False
2.84953   -0.48348  -2.20795  0.11093  1.49189   3.07609   True
So I want to do here total = X1*ZDis+X2*ZAnch+X3*ZSurf+X4*Zval+X5*ZDom+X6*ZEntropy where X1..X6 are weighted value which should come from SVM. I used rapidminner to to get this weight value for my 40 examples of training set and result is below:-

Total number of Support Vectors: 40
Bias (offset): -1.055
w[ZDis] = 0.076
w[ZAnch] = -0.058
w[ZSurf] = 0.057
w[Zval] = 0.010
w[ZDom] = 0.073
w[ZEntropy] = 0.077
I am not sure I did the correct approach or not so I need your kind help. Thanks in advance.
Tagged:

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    Hi,

    first of all I see only one label in your data. I suppose that is because you removed the other one to focus on the current label. This is good and the way you should do it in RapidMiner.

    Then in general your approach is fine. However to get the best out of an SVM it must be validated and optimized. I suggest to watch the tutorial videos on our website to learn how to validate models. After you have understood that you should try different values for the parameter C of the SVM. Values worth to try are 0.000001, 0.00001, ...., 0.01, 0.1, 1, 10. You can do it either manually by editing the parameters, or automatically with the Optimize Parameters operator.

    To get better results you should also increase the size of your data set. 40 examples is a relatively low amount.

    In any case start with the tutorial videos at http://rapid-i.com/content/view/189/212/lang,en/

    Best regards,
    Marius