How is Jaccard / Dice similarity defined for numerical variables?

Fred12
Fred12 New Altair Community Member
edited November 2024 in Altair RapidMiner

hi,

as stated here: http://www.stata.com/manuals13/mvmeasure_option.pdf

Jaccard is TP/(TP+FP+FN)... for as it seems binary variables...

but how is it defined for numerical values?? as it can be chosen e.g as numerical distance measure in k-NN operator..

 

and similar how is it defined for Dice similarity?

edit: I found the implementation here: https://github.com/rapidminer/rapidminer-studio/tree/master/src/main/java/com/rapidminer/tools/math/similarity/numerical

 

edit2: ok it seems its simply 2 * x*y / x+y

where X and Y are two vectors with attributes x_i and y_i,

2 * wxy / (wx + wy);

where wxy is the product of the corresponding attributes of the two vectors summed up,

and wx , wy is just the sum of the attributes values of x or y respectively...

 

looks like some weird distance measure to me, don't know if that makes a lot of sense...

Comments

  • amei
    amei New Altair Community Member
    Hi,
    with this definition, both Jaccard and Dice can have lower similarity for identical vectors than for different vectors. [1,0] is more similar to [2,0] than to [1,0].
    It looks like a bug, the computation for the nominal similarity is used for numerics. But the correct definition for numerical Dice similarity would be 2 * |x y| / (|x|^2 + |y|^2).
    You can apply the numerical definition for binary vectors but not vice versa.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.