How is Jaccard / Dice similarity defined for numerical variables?
hi,
as stated here: http://www.stata.com/manuals13/mvmeasure_option.pdf
Jaccard is TP/(TP+FP+FN)... for as it seems binary variables...
but how is it defined for numerical values?? as it can be chosen e.g as numerical distance measure in k-NN operator..
and similar how is it defined for Dice similarity?
edit: I found the implementation here: https://github.com/rapidminer/rapidminer-studio/tree/master/src/main/java/com/rapidminer/tools/math/similarity/numerical
edit2: ok it seems its simply 2 * x*y / x+y
where X and Y are two vectors with attributes x_i and y_i,
2 * wxy / (wx + wy);
where wxy is the product of the corresponding attributes of the two vectors summed up,
and wx , wy is just the sum of the attributes values of x or y respectively...
looks like some weird distance measure to me, don't know if that makes a lot of sense...
Comments
-
Hi,with this definition, both Jaccard and Dice can have lower similarity for identical vectors than for different vectors. [1,0] is more similar to [2,0] than to [1,0].It looks like a bug, the computation for the nominal similarity is used for numerics. But the correct definition for numerical Dice similarity would be 2 * |x y| / (|x|^2 + |y|^2).You can apply the numerical definition for binary vectors but not vice versa.0