🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

How to add weights to k-NN [SOLVED]

User: "michaelhecht"
New Altair Community Member
Updated by Jocelyn
Hello,

I just sitting in a Rapid-I course I had the question how to add weights to the attributes for an k-NN operator. Now one could answer this satisfying.

So how to weight attributes (e.g.numeric ones) for a weighted distance for the k-NN operator?

Find more posts tagged with

Sort by:
1 - 7 of 71
    User: "MariusHelf"
    New Altair Community Member
    Hey,

    just let me know when your next break starts and I can explain it to you personally :)

    See you later!

    Marius
    User: "michaelhecht"
    New Altair Community Member
    OP
    Ok, meanwhile I understood what to do. One could normalize the attributes and then scale by weights.

    But what can I do with nominal attributes?
    User: "MariusHelf"
    New Altair Community Member
    Well, for nominal attributes it is not possible to apply weights directly. You have to convert them to a numerical representation beforehand and then use the same technique as for numerical attributes, i.e. scaling the values. For the conversion you can e.g. use Nominal to Numerical with dummy_coding.

    Best regards,
    Marius
    User: "michaelhecht"
    New Altair Community Member
    OP
    I just discussed this with Ralf Klinkenberg. Here is what I implemented this night in the hotel  :)

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.007">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="retrieve" compatibility="5.3.007" expanded="true" height="60" name="Retrieve Golf" width="90" x="45" y="30">
           <parameter key="repository_entry" value="//Samples/data/Golf"/>
         </operator>
         <operator activated="true" class="normalize" compatibility="5.3.007" expanded="true" height="94" name="Normalize" width="90" x="179" y="30">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="|Humidity|Temperature"/>
         </operator>
         <operator activated="true" class="nominal_to_binominal" compatibility="5.3.007" expanded="true" height="94" name="Nominal to Binominal" width="90" x="313" y="30">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="|Wind|Play|Outlook"/>
         </operator>
         <operator activated="true" class="nominal_to_numerical" compatibility="5.3.007" expanded="true" height="94" name="Nominal to Numerical" width="90" x="447" y="30">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="|Wind|Play|Outlook = sunny|Outlook = rain|Outlook = overcast"/>
           <list key="comparison_groups"/>
         </operator>
         <operator activated="true" class="normalize" compatibility="5.3.007" expanded="true" height="94" name="Normalize (2)" width="90" x="581" y="30">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="|Wind = true|Wind = false|Outlook = sunny = true|Outlook = sunny = false|Outlook = rain = true|Outlook = rain = false|Outlook = overcast = true|Outlook = overcast = false"/>
         </operator>
         <operator activated="true" class="weight_by_information_gain_ratio" compatibility="5.3.007" expanded="true" height="76" name="Weight by Information Gain Ratio" width="90" x="45" y="210"/>
         <operator activated="true" class="scale_by_weights" compatibility="5.3.007" expanded="true" height="76" name="Scale by Weights" width="90" x="179" y="210"/>
         <operator activated="true" class="k_nn" compatibility="5.3.007" expanded="true" height="76" name="k-NN" width="90" x="313" y="210">
           <parameter key="k" value="2"/>
         </operator>
         <operator activated="true" class="apply_model" compatibility="5.3.007" expanded="true" height="76" name="Apply Model" width="90" x="447" y="210">
           <list key="application_parameters"/>
         </operator>
         <operator activated="true" class="performance" compatibility="5.3.007" expanded="true" height="76" name="Performance" width="90" x="581" y="210"/>
         <connect from_op="Retrieve Golf" from_port="output" to_op="Normalize" to_port="example set input"/>
         <connect from_op="Normalize" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
         <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
         <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Normalize (2)" to_port="example set input"/>
         <connect from_op="Normalize (2)" from_port="example set output" to_op="Weight by Information Gain Ratio" to_port="example set"/>
         <connect from_op="Weight by Information Gain Ratio" from_port="weights" to_op="Scale by Weights" to_port="weights"/>
         <connect from_op="Weight by Information Gain Ratio" from_port="example set" to_op="Scale by Weights" to_port="example set"/>
         <connect from_op="Scale by Weights" from_port="example set" to_op="k-NN" to_port="training set"/>
         <connect from_op="k-NN" from_port="model" to_op="Apply Model" to_port="model"/>
         <connect from_op="k-NN" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
         <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
         <connect from_op="Performance" from_port="performance" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>

    This performs better than direct application of k-NN with two neighbours, i.e. seems to work as expected.

    Nevertheless, what is missing (in my opinion) is a weight input for at least k-NN and Bayes-Operators (since weighting is e.g. a part of the Weka WAODE method, i.e. makes sense there and possibliy also for other operators) to apply attribute weighting in a "natural" way.
    User: "wessel"
    New Altair Community Member
    Marius wrote:

    Hey,

    just let me know when your next break starts and I can explain it to you personally :)

    See you later!

    Marius
    Where do you guys hang out?
    Dortmund University?
    User: "MariusHelf"
    New Altair Community Member
    michaelhecht wrote:


    This performs better than direct application of k-NN with two neighbours, i.e. seems to work as expected.
    Just for the record: to know if it really performs better you have to validate the model in a proper way, e.g. with a cross validation. Try setting k to 1 and you'll always get an accuracy of 100% on the training data :)

    Where do you guys hang out?
    Dortmund University?
    Rapid-I Headquarters, Dortmund :)
    User: "michaelhecht"
    New Altair Community Member
    OP
    Well, I added SOLVED to the topic, but finally the difference to a weighted k-NN is, that the weights are taken squared if e.g. euclidean distance is applied. So it isn't a real solution but a workaround to the missing weight input. ;)