"The difference between using weighted vote and not using weighted vote learner"

amy
amy New Altair Community Member
edited November 5 in Community Q&A
I found that there is a k Nearest Neighbor learner in Group: Learner.Supervised.Lazy.
There is a parameter named weighted vote. I am not sure what's the difference between the weighted vote KNN and the KNN without weighted vote.
Would you like to let me know what's the difference between them? Where can I find some information on it?
It seems that there is a class named WeightedObject which has the weight. But how is the weight calculated?
I'd be very grateful if you give me a hint.
Thanks a million.

Regards

Amy
Tagged:

Answers

  • TobiasMalbrecht
    TobiasMalbrecht New Altair Community Member
    Hi Amy,

    well the answer is pretty easy. The parameter specifies whether the distance of the nearest neighbors should be considered in the voting decision during prediction. If it is not considered, every nearest neighbor has the same influence on the prediction. If the parameter is enabled, neighbors which have a lower distance to the example for which a prediction is made will get a higher influence than those with a higher distance.

    Regards,
    Tobias
  • amy
    amy New Altair Community Member
    Hi Tobias,
    Thank you so much for your kind reply. I have some ideas of it now.
    May I ask some further questions here?
    I found this topic here http://rapid-i.com/rapidforum/index.php/topic,249.0.html. It talked something about how  the weight being implemented.
    May I ask some further questions?
    You talked about weighting by the distance, how about other similarities which is not distance like cosine similarity? How is the weight calculated? What formula is used if the measure is not distance but cosine similarity?

    Thanks a million.

    Amy
  • TobiasMalbrecht
    TobiasMalbrecht New Altair Community Member
    Hi Amy,

    of course you can ask questions. That is the intention of this forum ... ;)

    The weight is calculated in the following lines in the class [tt]com.rapidminer.operator.learner.lazy.KNNClassificationModel[/tt]:

    // finding next k neighbours and their distances
    Collection<Tupel<Double, Integer>> neighbours = samples.getNearestValueDistances(k, values);
    for (Tupel<Double, Integer> tupel: neighbours) {
    totalDistance += tupel.getFirst();
    }

    double totalSimilarity = 0.0d;
    if (totalDistance == 0) {
    totalDistance = 1;
    totalSimilarity = k;
    } else {
    totalSimilarity = Math.max(k - 1, 1);
    }

    // counting frequency of labels
    for (Tupel<Double, Integer> tupel : neighbours) {
    counter[tupel.getSecond()] += (1d - tupel.getFirst() / totalDistance) / totalSimilarity;
    }
    The weight calculation is pretty straightforward and should be easily understandable from the source code. In principal, the weighting scheme should also be the same for every distance/divergence measure.

    Kind regards,
    Tobias