"Retrieve KNN Distance Results"

michaelgloven
michaelgloven New Altair Community Member
edited November 2024 in Community Q&A

Hi, is there an operator to extract distance results from application of KNN lazy learner to labeled and scored data? I would like to see the underlying data driving the predictions.

Welcome!

It looks like you're new here. Sign in or register to get started.

Best Answer

  • michaelgloven
    michaelgloven New Altair Community Member
    Answer ✓

    good ideas,  looks like I can get what I'm looking for thru data to similarity operator. Also, the graph outputs (spring) are especially helpful in visualizing the KNN method.

Answers

  • kypexin
    kypexin New Altair Community Member

    Hi @michaelgloven

    Not really sure 100% in my guessing, but maybe 'Cross Distances' operator might help you in this case?

    I have never used it myself on real data but it seems it has same distance measures as k-NN does. 

  • JEdward
    JEdward New Altair Community Member

    Which distance are you looking for?  The distances to the k nearest neighbors themselves would be fine for a k of 1 to 3, but will look pretty messy when you reach k=50+.

     

    Here's a sample process, personally I'm not too keen. 

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="136">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="split_data" compatibility="8.2.000" expanded="true" height="103" name="Split Data" width="90" x="179" y="136">
    <enumeration key="partitions">
    <parameter key="ratio" value="0.7"/>
    <parameter key="ratio" value="0.3"/>
    </enumeration>
    </operator>
    <operator activated="true" class="k_nn" compatibility="8.2.000" expanded="true" height="82" name="k-NN" width="90" x="313" y="238">
    <parameter key="k" value="3"/>
    </operator>
    <operator activated="true" class="cross_distances" compatibility="8.2.000" expanded="true" height="103" name="Cross Distances" width="90" x="380" y="85">
    <parameter key="only_top_k" value="true"/>
    <parameter key="k" value="3"/>
    </operator>
    <operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="514" y="34">
    <list key="aggregation_attributes">
    <parameter key="distance" value="average"/>
    </list>
    <parameter key="group_by_attributes" value="request"/>
    </operator>
    <operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="581" y="289">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join" width="90" x="648" y="34">
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="id" value="request"/>
    </list>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="715" y="136">
    <parameter key="attribute_name" value="average(distance)"/>
    <parameter key="target_role" value="distance_measure"/>
    <list key="set_additional_roles"/>
    </operator>
    <connect from_op="Retrieve Iris" from_port="output" to_op="Split Data" to_port="example set"/>
    <connect from_op="Split Data" from_port="partition 1" to_op="k-NN" to_port="training set"/>
    <connect from_op="Split Data" from_port="partition 2" to_op="Cross Distances" to_port="request set"/>
    <connect from_op="k-NN" from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_op="k-NN" from_port="exampleSet" to_op="Cross Distances" to_port="reference set"/>
    <connect from_op="Cross Distances" from_port="result set" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Cross Distances" from_port="request set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Join" to_port="left"/>
    <connect from_op="Join" from_port="join" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

     
  • michaelgloven
    michaelgloven New Altair Community Member
    Answer ✓

    good ideas,  looks like I can get what I'm looking for thru data to similarity operator. Also, the graph outputs (spring) are especially helpful in visualizing the KNN method.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.