A program to recognize and reward our most engaged community members
Hi, is there an operator to extract distance results from application of KNN lazy learner to labeled and scored data? I would like to see the underlying data driving the predictions.
good ideas, looks like I can get what I'm looking for thru data to similarity operator. Also, the graph outputs (spring) are especially helpful in visualizing the KNN method.
Hi @michaelgloven
Not really sure 100% in my guessing, but maybe 'Cross Distances' operator might help you in this case?
I have never used it myself on real data but it seems it has same distance measures as k-NN does.
Which distance are you looking for? The distances to the k nearest neighbors themselves would be fine for a k of 1 to 3, but will look pretty messy when you reach k=50+.
Here's a sample process, personally I'm not too keen.
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="136"> <parameter key="repository_entry" value="//Samples/data/Iris"/> </operator> <operator activated="true" class="split_data" compatibility="8.2.000" expanded="true" height="103" name="Split Data" width="90" x="179" y="136"> <enumeration key="partitions"> <parameter key="ratio" value="0.7"/> <parameter key="ratio" value="0.3"/> </enumeration> </operator> <operator activated="true" class="k_nn" compatibility="8.2.000" expanded="true" height="82" name="k-NN" width="90" x="313" y="238"> <parameter key="k" value="3"/> </operator> <operator activated="true" class="cross_distances" compatibility="8.2.000" expanded="true" height="103" name="Cross Distances" width="90" x="380" y="85"> <parameter key="only_top_k" value="true"/> <parameter key="k" value="3"/> </operator> <operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="514" y="34"> <list key="aggregation_attributes"> <parameter key="distance" value="average"/> </list> <parameter key="group_by_attributes" value="request"/> </operator> <operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="581" y="289"> <list key="application_parameters"/> </operator> <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join" width="90" x="648" y="34"> <parameter key="use_id_attribute_as_key" value="false"/> <list key="key_attributes"> <parameter key="id" value="request"/> </list> </operator> <operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="715" y="136"> <parameter key="attribute_name" value="average(distance)"/> <parameter key="target_role" value="distance_measure"/> <list key="set_additional_roles"/> </operator> <connect from_op="Retrieve Iris" from_port="output" to_op="Split Data" to_port="example set"/> <connect from_op="Split Data" from_port="partition 1" to_op="k-NN" to_port="training set"/> <connect from_op="Split Data" from_port="partition 2" to_op="Cross Distances" to_port="request set"/> <connect from_op="k-NN" from_port="model" to_op="Apply Model" to_port="model"/> <connect from_op="k-NN" from_port="exampleSet" to_op="Cross Distances" to_port="reference set"/> <connect from_op="Cross Distances" from_port="result set" to_op="Aggregate" to_port="example set input"/> <connect from_op="Cross Distances" from_port="request set" to_op="Apply Model" to_port="unlabelled data"/> <connect from_op="Aggregate" from_port="example set output" to_op="Join" to_port="right"/> <connect from_op="Apply Model" from_port="labelled data" to_op="Join" to_port="left"/> <connect from_op="Join" from_port="join" to_op="Set Role" to_port="example set input"/> <connect from_op="Set Role" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator></process>