different validation performance parameters in LOG?
hi,
I have several performance parameters for validation to choose in the log operator, see screenshot:
can someone explain to me where the difference is in the different performance operators? because I have only 1 performance operator in my design..
and can someone please tell me, if I should use the normal Performance operator for k-nn, or some cluster-performance operator? which is better?
I would like to see possible cluster outliers and be able to tag my data points with the label class color... but my dataset has 20+ attributes, is that still possible to visualize k-nn somehow?
Answers
-
Dear Fred,
i think there are various things mixed in one question.
The various performance things in X-Val
Are i think only placeholders if you use more than 1
Placement of the Log operator
Please be sure that the log operator is AFTER the operator it should log - in your case the X-Val. If you put it inside it cannot access the latest result of X-Val
Performance
You use k-NN to classify, so you should use one of the Performance Operators for classification. The key which measure to use is of course driven by your problem. Using a clustering measure does not make sense if you do classification
Vizualizing 20 Dimensions
It is simply not possible to have a look at 20 dimensions at once. You would need to reduce dimensions with techniques like a PCA, SOM or t-SNE.
~Martin
0 -
Regarding the different performance values:
performance is the value of the main criterion, which you select in the performance operator inside the X-Validation.
deviation is the standard deviation of this main criterion.
performance1 to performance3 are referencing to the first three performance criterions selected in the Performance operator. So if you check accuracy and error in Performance (Classification), performance1 references accuracy and performance2 the error, as accuracy is the first checked criterion and error the second in the list.
This is a major pain point in any training course I have given so far, so can't hurt to be precise here
Greetings,
Sebastian
0 -
ok thanks, that helped a bit, but I am still confused..
I am using a optimize parameter Grid, and inside a Backward elimination, an inside that a x-validation with W-REPTree for numeric dataset :
where should I use the log operator now? I Used one after the x-validation, another on after the backward-elimination, and another one after the optimize-grid operator...
secondly, I still don't really understand the result of the log operator, regarding things like performance1, performance2, etc. because those are not the same as accuracy, classification error and so on:
my log(3) operator, the one after the backward-elimination, puts out different results than that after the x-val, of course:
but how does that work, after which loop will an entry be made in the log(3) operator?
0 -
Hi Fred,
it always depend on what you want to do. In your case, you would like to log the performance the optimize is working on. So you log on the optimize returned by Backwards Elemination. This is the one to log.
Be careful with overtraining!
~Martin
0 -
ok but I want to test the 3 Parameters M,V,N in REPTree against eachother, because I want to achieve a high accuracy in X-Validation...
I am now a bit confused, which of the logged performance values, or accuracy or kappa-value should I use to see the best performance?
the first line has accuracy of 82.8%, but performance is only 77.6%, what is performance now? I thought thats the main criterion, which is accuracy?
and performance1 is 77.6%, that should be the same as accuracy because thats the first case to choose in the performance(classification) operator?
0 -
Which accuracy did you log there? Backwards Elmination?
~Martin
0 -
yes, log(3) is backward elimination, log(2) is x-validation
0 -
What if someone wants to log more than 3 performance values? i.e., has checked more than 3 metrics and wants to log all of them, not only first 3.
0 -
In that (very rare) case you can still use Performance to Data to transform the performance into a data set and handle it yourself. You could attach the current parameter settings using Generate Attributes param function and collect all the data sets in one of the usual ways.
We usually use the Indexed Collections of our Jackhammer extension, that not only collect the objects but also indexing them with an arbitrary number of attribute/value pairs, so that you can access a specific object later by providing its index values. But also gut to have a match between parameters -> performance.
Greetings,
Sebastian
1